implementing live voice commands?

cvp

@daltonb see here
and mostly last post of @JonB

daltonb

Thanks @cvp! I should have mentioned I did look at that thread, but it doesn't really solve my problem. The issue with the ping-ponging recordings (or in my case staggered sampling windows) is that you need to run speech recognition on each one concurrently in order to keep up. The runtime error I'm getting seems to indicate that the underlying SFSpeechRecognizer only supports one active instance, and that instead I need to register a callback to handle partial speech results (enabled via shouldReportPartialResults).

Is there anyone with objc_util experience who's played around with SFSpeechRecognizer, or would be willing to help me get started? (@omz @JonB @dgelessus @zrzka @Brun0oO @mikael @scj643 @shaun-h @filippocld)

JonB

it would be possible to set up the audiosession, and feed the speechrecognizer the way it is meant to be used. your link has the code, just needs to be converted to objc_util. sadly, im using an older ios version so wouldnt be able to try it

daltonb

Ok thanks @JonB! The code I linked to is in Swift right? Even though I selected Obj C in the dropdown. Trying to produce the Obj C equivalent of the example would be a lot for me.. any chance you could take an untested stab at it, and then I could iterate from there? Something to start with would be super helpful. I know that's a lot to ask and no worries if you don't have the time or inclination.

cvp

@daltonb I could try to translate it to Objectivec in Pythonista but not sure of the result...nor the delay 😢

daltonb

@cvp that would be awesome.. even if it doesn’t work out I’d love to see a partial result

JonB

https://github.com/yao23/iOS_Playground/blob/master/SpeechRecognitionPractice/SpeechRecognitionPractice/ViewController.m
is an objc implementation.

The tricky bit obviously is getting those blocks implemented in objc_util

cvp

@daltonb I'm really sorry but I think it would be too complex for me, as even @jonb says it is difficult.

mikael

@daltonb, I am tempted to give it a try, but not this week.

cvp

@mikael and @daltonb First part, see here

mikael

@cvp, the man is fast! :-D

cvp

@mikael He, that is not my code 😂, just found there, I just begin to try to modify it...

cvp

First part (enough for today)

	AVAudioEngine = ObjCClass('AVAudioEngine').alloc().init()
	AVAudioSession = ObjCClass('AVAudioSession')
	AVAudioRecorder = ObjCClass('AVAudioRecorder')
	
	shared_session = AVAudioSession.sharedInstance()
	category_set = shared_session.setCategory_mode_options_error_(ns('AVAudioSessionCategoryRecord'), ns('AVAudioSessionModeMeasurement'),ns('AVAudioSession.CategoryOptionsDuckOthers'),None)

	setActiveOptions = 0	# notifyOthersOnDeactivation
	shared_session.setActive_withOptions_error_(True,setActiveOptions,None)

	inputNode = AVAudioEngine.inputNode()
	```

daltonb

Wow good stuff, thanks @JonB and @cvp!! I will keep an eye on this thread and do some of my own tinkering.

cvp

2nd part and really enough for today

AVAudioEngine = ObjCClass('AVAudioEngine').alloc().init()
	AVAudioSession = ObjCClass('AVAudioSession')
	AVAudioRecorder = ObjCClass('AVAudioRecorder')
	
	shared_session = AVAudioSession.sharedInstance()
	category_set = shared_session.setCategory_mode_options_error_(ns('AVAudioSessionCategoryRecord'), ns('AVAudioSessionModeMeasurement'),ns('AVAudioSession.CategoryOptionsDuckOthers'),None)

	setActiveOptions = 0	# notifyOthersOnDeactivation
	shared_session.setActive_withOptions_error_(True,setActiveOptions,None)

	inputNode = AVAudioEngine.inputNode()
	
	# Configure the microphone input.
	recordingFormat = inputNode.outputFormatForBus_(0)
	
	def handler(_cmd,obj1_ptr,obj2_ptr):
		# param1 = AVAudioPCMBuffer
		# 				 	The buffer parameter is a buffer of audio captured 
		#						from the output of an AVAudioNode.
		# param2 = AVAudioTime
		#						The when parameter is the time the buffer was captured					
		if obj1_ptr:
			obj1 = ObjCInstance(obj1_ptr)
			#self.recognitionRequest?.append(buffer)
	
	handler_block = ObjCBlock(handler, restype=None, argtypes=[c_void_p, c_void_p, c_void_p])
		
	inputNode.installTapOnBus_bufferSize_format_block_(0,1024,recordingFormat, handler_block)

	AVAudioEngine.prepare()
	err_ptr = c_void_p()
	AVAudioEngine.startAndReturnError_(byref(err_ptr))
	if err_ptr:
		err = ObjCInstance(err)
		print(err)
		
	# Create and configure the speech recognition request.
	recognitionRequest = ObjCClass('SFSpeechAudioBufferRecognitionRequest').alloc()
	print(dir(recognitionRequest))
	recognitionRequest.setShouldReportPartialResults_(True)

And

Fatal Python error: Bus error

Thread 0x000000016fb67000 (most recent call first):

No error if I comment the line

AVAudioEngine.startAndReturnError_(byref(err_ptr))

JonB

This post is deleted!

JonB

you had some errors on one of your constants (the audiosession options should have been 0x2 for the duckothers option -- this is a mask, not a string)

here is a minor mod -- i verified the handler gets called, but i dont have speech recogognize to test against
https://gist.github.com/ad17f52c8944993092f537d963ce1963

cvp

@JonB Thanks, I'll try to continue today...

cvp

@JonB Really need help now:

segmentation fault if no underscore before appendAudioPCMBuffer_(obj1)
segmentation fault in last line not commented

from objc_util import *

AVAudioEngine = ObjCClass('AVAudioEngine').alloc().init()
AVAudioSession = ObjCClass('AVAudioSession')
AVAudioRecorder = ObjCClass('AVAudioRecorder')

shared_session = AVAudioSession.sharedInstance()
category_set= shared_session.setCategory_withOptions_error_(
	ns('AVAudioSessionCategoryRecord'), 
	0x2, #duckothers
	None)
shared_session.setMode_error_(ns('AVAudioSessionModeMeasurement'),None)

setActiveOptions = 0# notifyOthersOnDeactivation
shared_session.setActive_withOptions_error_(True,setActiveOptions,None)

inputNode = AVAudioEngine.inputNode()

# Configure the microphone input.
recordingFormat = inputNode.outputFormatForBus_(0)

# Create and configure the speech recognition request.
recognitionRequest = ObjCClass('SFSpeechAudioBufferRecognitionRequest').alloc()
print(dir(recognitionRequest))
recognitionRequest.setShouldReportPartialResults_(True)
retain_global(recognitionRequest)

@on_main_thread
def handler_buffer(_cmd,obj1_ptr,obj2_ptr):
	print('handler_buffer')
	# param1 = AVAudioPCMBuffer
	#   The buffer parameter is a buffer of audio captured 
	#   from the output of an AVAudioNode.
	# param2 = AVAudioTime
	#   The when parameter is the time the buffer was captured  
	if obj1_ptr:
		obj1 = ObjCInstance(obj1_ptr)
		#print(str(obj1._get_objc_classname()))	# AVAudioPCMBuffer
		#print(str(obj1.frameLength()))					# 4410
		# segmentation in next line if no "_" before appendAudioPCMBuffer
		recognitionRequest._appendAudioPCMBuffer_(obj1)

handler_block_buffer = ObjCBlock(handler_buffer, restype=None, argtypes=[c_void_p, c_void_p, c_void_p])

inputNode.installTapOnBus_bufferSize_format_block_(0,1024,recordingFormat, handler_block_buffer)

AVAudioEngine.prepare()
err_ptr = c_void_p()
AVAudioEngine.startAndReturnError_(byref(err_ptr))
if err_ptr:
	err = ObjCInstance(err)
	print(err)

@on_main_thread
def handler_recognize(_cmd,obj1_ptr,obj2_ptr):
	print('handler_recognize')
	# param1 = result
	# 					The object containing the partial or final transcriptions
	#						of the audio content.		 	
	# param2 = error
	#						An error object if a problem occurred. 
	#						This parameter is nil if speech recognition was successful.
	if obj1_ptr:
		obj1 = ObjCInstance(obj1_ptr)
		#print(str(obj1))
		
handler_block_recognize = ObjCBlock(handler_recognize, restype=None, argtypes=[c_void_p, c_void_p, c_void_p])
				
SFSpeechRecognizer = ObjCClass('SFSpeechRecognizer').alloc().init()
recognitionTask = SFSpeechRecognizer.recognitionTaskWithRequest_resultHandler_(recognitionRequest, handler_block_recognize)

JonB

recognitionRequest = ObjCClass('SFSpeechAudioBufferRecognitionRequest').alloc()

Missing .init()?

By the way, you will want AVAudioEngine.stop() handy.
For instance you might want to create a ui.View with a will_close, so that when you are experimenting, you can just close the view to kill the engine. Anyway you will eventually need to show the recognized words.