Welcome!
This is the community forum for my apps Pythonista and Editorial.
For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.
Amazon Lex using AudioRecorder
-
I am trying to capture audio on my iPad to submit to AWS Lex bot using Audio recorder. This is the code that records the audio:
settings = {ns('AVFormatIDKey'):ns(1633772320), ns('AVSampleRateKey'):ns(16000), ns('AVNumberOfChannelsKey'):ns(2), ns('AVLinearPCMBitDepthKey'):ns(16), ns('AVLinearPCMIsBigEndianKey'):ns(0),ns('AVLinearPCMIsFloatKey'):ns(0)}
output_path = os.path.abspath(FileName)
out_url = NSURL.fileURLWithPath_(ns(output_path))
recorder = AVAudioRecorder.alloc().initWithURL_settings_error_(out_url, settings, None)However, Lex bot does not "understand" the submitted audio captured using above code.
I understand that Lex needs a Linear PCM but I am unsure what settings to use in the above code to achieve that.
Can somebody point me in the right direction?
-
Try just using
sound.Recorder('somefile.wav')
This takes care of all the objc for you, and records in linearpcm (a.k.a wave file) -
Undocumented? http://omz-software.com/pythonista/docs/ios/sound.html
-
@JonB thank you for your prompt response. What you suggested works: A wav file gets created.
Unfortunately, AWS lex still does not seem to understand it once I submit it.
Is there a way to capture a MPEG audio? That works as I receive MPEG response from Lex bot which I have then submitted right back to it and it worked like charm, bot can "understand it and response shows that in inputTranscript text that gets returned.
-
@jovica Try sound.recorder('file.m4a'), it's MPEG4
-
@jovica Linear PCM is basically WAV, technically a WAV file has a short header on it that describes the format then its linear PCM data, you should be fine sending a wav to Lex as it will just ignore the header as a small glitch of data.
The important thing is that Lex needs the audio in 16bit 16Khz format but it looks like you hav that in the above format.
I'm pretty new to pythonista on ios but I've done a fair bit with Lex (and Alexa) in python for some code that submits audio to Lex using python & requests have a look at https://github.com/Nexmo/lex-connector/blob/master/server.py#L102-L109
Thats part of a larger application but should give you some pointers -
Hi,
you're using wrong
AVFormatIDKey
value. The correct one for PCM is1819304813
(your one is MPEG4AAC). Here's the working code for AWS Lex & Pythonista (I did install awscli & boto3 via pip via StaSh).from objc_util import * import boto3 import os import sound import console import uuid def record(file_name): AVAudioSession = ObjCClass('AVAudioSession') NSURL = ObjCClass('NSURL') AVAudioRecorder = ObjCClass('AVAudioRecorder') shared_session = AVAudioSession.sharedInstance() category_set = shared_session.setCategory_error_(ns('AVAudioSessionCategoryPlayAndRecord'), None) settings = { ns('AVFormatIDKey'): ns(1819304813), ns('AVSampleRateKey'):ns(16000.0), ns('AVNumberOfChannelsKey'):ns(1), ns('AVLinearPCMBitDepthKey'):ns(16), ns('AVLinearPCMIsFloatKey'):ns(False), ns('AVLinearPCMIsBigEndianKey'):ns(False) } output_path = os.path.abspath(file_name) out_url = NSURL.fileURLWithPath_(ns(output_path)) recorder = AVAudioRecorder.alloc().initWithURL_settings_error_(out_url, settings, None) if recorder is None: console.alert('Failed to initialize recorder') return None started_recording = recorder.record() if started_recording: print('Recording started, press the "stop script" button to end recording...') try: while True: pass except KeyboardInterrupt: print('Stopping...') recorder.stop() recorder.release() print('Stopped recording.') return output_path def main(): console.clear() path = record("{}.pcm".format(uuid.uuid4().hex)) if path is None: print('Nothing recorded') return sound.play_effect(path) recording = open(path, 'rb') session = boto3.Session(profile_name='lex') client = session.client('lex-runtime') r = client.post_content(botName='BookTrip', botAlias='$LATEST', userId=uuid.uuid4().hex, contentType='audio/l16; rate=16000; channels=1', accept='text/plain; charset=utf-8', inputStream=recording) print(r) os.remove(path) if __name__ == '__main__': main()
And here's the console output when I said book a car.
Recording started, press the "stop script" button to end recording... Stopping... Stopped recording. {'slots': {'PickUpDate': None, 'DriverAge': None, 'ReturnDate': None, 'PickUpCity': None, 'CarType': None}, 'intentName': 'BookCar', 'slotToElicit': 'PickUpCity', 'dialogState': 'ElicitSlot', 'ResponseMetadata': {'HTTPStatusCode': 200, 'RetryAttempts': 0, 'HTTPHeaders': {'x-amzn-requestid': '33a2a3f2-6c5b-11e7-a3f3-d1ffbce58259', 'connection': 'keep-alive', 'x-amz-lex-slots': 'eyJQaWNrVXBEYXRlIjpudWxsLCJSZXR1cm5EYXRlIjpudWxsLCJEcml2ZXJBZ2UiOm51bGwsIkNhclR5cGUiOm51bGwsIlBpY2tVcENpdHkiOm51bGx9', 'date': 'Wed, 19 Jul 2017 08:21:02 GMT', 'x-amz-lex-input-transcript': 'book a car', 'content-length': '0', 'x-amz-lex-message': 'In what city do you need to rent a car?', 'content-type': 'text/plain;charset=utf-8', 'x-amz-lex-intent-name': 'BookCar', 'x-amz-lex-slot-to-elicit': 'PickUpCity', 'x-amz-lex-dialog-state': 'ElicitSlot'}, 'RequestId': '33a2a3f2-6c5b-11e7-a3f3-d1ffbce58259'}, 'contentType': 'text/plain;charset=utf-8', 'message': 'In what city do you need to rent a car?', 'inputTranscript': 'book a car', 'audioStream': <botocore.response.StreamingBody object at 0x108a4d4a8>}
Lex predefined BookTrip bot is used in this case.
HTH,
Zrzka -
I am also having the same problem as @jovica's.
I have tested on .raw, .wav, and .pcm files.
The files are me saying some valid sample utterance of my bot.
While the Lex console recognize what I am saying every time(so I think the issue is that my pronunciation.), the response from boto3 post_content, seems it doesn't know what I was saying.
(The wav file is me saying "go to the kitchen", however, the 'inputTranscript' returned is 'a a allen')
Can someone tell me what I've done wrong? Thanks.
Mine code is the following.import boto3
client = boto3.client('lex-runtime')WAVE_OUTPUT_FILENAME = "File.wav"
f = open(WAVE_OUTPUT_FILENAME, 'rb')
lex_response = client.post_content(
botName = 'ProtoBot',
botAlias = 'ProtoBotFeb',
userId = "12345678910",
inputStream = f,
accept='text/plain; charset=utf-8',
contentType="audio/l16; rate=16000; channels=1"
)
print lex_response -
This post is deleted!