omz:forum

    • Register
    • Login
    • Search
    • Recent
    • Popular

    Welcome!

    This is the community forum for my apps Pythonista and Editorial.

    For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.


    Amazon Lex using AudioRecorder

    Pythonista
    8
    9
    6800
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • jovica
      jovica last edited by

      I am trying to capture audio on my iPad to submit to AWS Lex bot using Audio recorder. This is the code that records the audio:

      settings = {ns('AVFormatIDKey'):ns(1633772320), ns('AVSampleRateKey'):ns(16000), ns('AVNumberOfChannelsKey'):ns(2), ns('AVLinearPCMBitDepthKey'):ns(16), ns('AVLinearPCMIsBigEndianKey'):ns(0),ns('AVLinearPCMIsFloatKey'):ns(0)}

      output_path = os.path.abspath(FileName)
      out_url = NSURL.fileURLWithPath_(ns(output_path))
      recorder = AVAudioRecorder.alloc().initWithURL_settings_error_(out_url, settings, None)

      However, Lex bot does not "understand" the submitted audio captured using above code.

      I understand that Lex needs a Linear PCM but I am unsure what settings to use in the above code to achieve that.

      Can somebody point me in the right direction?

      1 Reply Last reply Reply Quote 0
      • JonB
        JonB last edited by

        Try just using sound.Recorder('somefile.wav')
        This takes care of all the objc for you, and records in linearpcm (a.k.a wave file)

        jovica 1 Reply Last reply Reply Quote 0
        • ccc
          ccc last edited by

          Undocumented? http://omz-software.com/pythonista/docs/ios/sound.html

          1 Reply Last reply Reply Quote 0
          • jovica
            jovica @JonB last edited by

            @JonB thank you for your prompt response. What you suggested works: A wav file gets created.

            Unfortunately, AWS lex still does not seem to understand it once I submit it.

            Is there a way to capture a MPEG audio? That works as I receive MPEG response from Lex bot which I have then submitted right back to it and it worked like charm, bot can "understand it and response shows that in inputTranscript text that gets returned.

            cvp 1 Reply Last reply Reply Quote 0
            • cvp
              cvp @jovica last edited by

              @jovica Try sound.recorder('file.m4a'), it's MPEG4

              1 Reply Last reply Reply Quote 0
              • sammachin
                sammachin last edited by

                @jovica Linear PCM is basically WAV, technically a WAV file has a short header on it that describes the format then its linear PCM data, you should be fine sending a wav to Lex as it will just ignore the header as a small glitch of data.
                The important thing is that Lex needs the audio in 16bit 16Khz format but it looks like you hav that in the above format.
                I'm pretty new to pythonista on ios but I've done a fair bit with Lex (and Alexa) in python for some code that submits audio to Lex using python & requests have a look at https://github.com/Nexmo/lex-connector/blob/master/server.py#L102-L109
                Thats part of a larger application but should give you some pointers

                1 Reply Last reply Reply Quote 0
                • zrzka
                  zrzka last edited by

                  Hi,

                  you're using wrong AVFormatIDKey value. The correct one for PCM is 1819304813 (your one is MPEG4AAC). Here's the working code for AWS Lex & Pythonista (I did install awscli & boto3 via pip via StaSh).

                  from objc_util import *
                  import boto3
                  import os
                  import sound
                  import console
                  import uuid
                  
                  def record(file_name):
                  	AVAudioSession = ObjCClass('AVAudioSession')
                  	NSURL = ObjCClass('NSURL')
                  	AVAudioRecorder = ObjCClass('AVAudioRecorder')
                  	shared_session = AVAudioSession.sharedInstance()
                  	category_set = shared_session.setCategory_error_(ns('AVAudioSessionCategoryPlayAndRecord'), None)
                  	
                  	settings = {
                  		ns('AVFormatIDKey'): ns(1819304813),
                  		ns('AVSampleRateKey'):ns(16000.0),
                  		ns('AVNumberOfChannelsKey'):ns(1),
                  		ns('AVLinearPCMBitDepthKey'):ns(16),
                  		ns('AVLinearPCMIsFloatKey'):ns(False),
                  		ns('AVLinearPCMIsBigEndianKey'):ns(False)
                  	}
                  	
                  	output_path = os.path.abspath(file_name)
                  	out_url = NSURL.fileURLWithPath_(ns(output_path))
                  	recorder = AVAudioRecorder.alloc().initWithURL_settings_error_(out_url, settings, None)
                  	if recorder is None:
                  		console.alert('Failed to initialize recorder')
                  		return None
                  			
                  	started_recording = recorder.record()
                  	if started_recording:
                  		print('Recording started, press the "stop script" button to end recording...')
                  	try:
                  		while True:
                  			pass
                  	except KeyboardInterrupt:
                  		print('Stopping...')
                  		recorder.stop()
                  		recorder.release()
                  		print('Stopped recording.')
                  	return output_path
                  
                  
                  def main():	
                  	console.clear()
                  	
                  	path = record("{}.pcm".format(uuid.uuid4().hex))
                  	
                  	if path is None:
                  		print('Nothing recorded')
                  		return
                  
                  	sound.play_effect(path)
                  					
                  	recording = open(path, 'rb')	
                  	session = boto3.Session(profile_name='lex')
                  	client = session.client('lex-runtime')			
                  		
                  	r = client.post_content(botName='BookTrip', botAlias='$LATEST', userId=uuid.uuid4().hex,
                  		contentType='audio/l16; rate=16000; channels=1',
                  		accept='text/plain; charset=utf-8',
                  		inputStream=recording)
                  	print(r)
                  	
                  	os.remove(path)
                  						
                  if __name__ == '__main__':
                  	main()
                  

                  And here's the console output when I said book a car.

                  Recording started, press the "stop script" button to end recording...
                  Stopping...
                  Stopped recording.
                  {'slots': {'PickUpDate': None, 'DriverAge': None, 'ReturnDate': None, 'PickUpCity': None, 'CarType': None}, 'intentName': 'BookCar', 'slotToElicit': 'PickUpCity', 'dialogState': 'ElicitSlot', 'ResponseMetadata': {'HTTPStatusCode': 200, 'RetryAttempts': 0, 'HTTPHeaders': {'x-amzn-requestid': '33a2a3f2-6c5b-11e7-a3f3-d1ffbce58259', 'connection': 'keep-alive', 'x-amz-lex-slots': 'eyJQaWNrVXBEYXRlIjpudWxsLCJSZXR1cm5EYXRlIjpudWxsLCJEcml2ZXJBZ2UiOm51bGwsIkNhclR5cGUiOm51bGwsIlBpY2tVcENpdHkiOm51bGx9', 'date': 'Wed, 19 Jul 2017 08:21:02 GMT', 'x-amz-lex-input-transcript': 'book a car', 'content-length': '0', 'x-amz-lex-message': 'In what city do you need to rent a car?', 'content-type': 'text/plain;charset=utf-8', 'x-amz-lex-intent-name': 'BookCar', 'x-amz-lex-slot-to-elicit': 'PickUpCity', 'x-amz-lex-dialog-state': 'ElicitSlot'}, 'RequestId': '33a2a3f2-6c5b-11e7-a3f3-d1ffbce58259'}, 'contentType': 'text/plain;charset=utf-8', 'message': 'In what city do you need to rent a car?', 'inputTranscript': 'book a car', 'audioStream': <botocore.response.StreamingBody object at 0x108a4d4a8>}
                  

                  Lex predefined BookTrip bot is used in this case.

                  HTH,
                  Zrzka

                  1 Reply Last reply Reply Quote 0
                  • Max.Shih
                    Max.Shih last edited by

                    I am also having the same problem as @jovica's.
                    I have tested on .raw, .wav, and .pcm files.
                    The files are me saying some valid sample utterance of my bot.
                    While the Lex console recognize what I am saying every time(so I think the issue is that my pronunciation.), the response from boto3 post_content, seems it doesn't know what I was saying.
                    (The wav file is me saying "go to the kitchen", however, the 'inputTranscript' returned is 'a a allen')
                    Can someone tell me what I've done wrong? Thanks.
                    Mine code is the following.

                    import boto3
                    client = boto3.client('lex-runtime')

                    WAVE_OUTPUT_FILENAME = "File.wav"
                    f = open(WAVE_OUTPUT_FILENAME, 'rb')
                    lex_response = client.post_content(
                    botName = 'ProtoBot',
                    botAlias = 'ProtoBotFeb',
                    userId = "12345678910",
                    inputStream = f,
                    accept='text/plain; charset=utf-8',
                    contentType="audio/l16; rate=16000; channels=1"
                    )
                    print lex_response

                    1 Reply Last reply Reply Quote 0
                    • Yilia
                      Yilia last edited by

                      This post is deleted!
                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      Powered by NodeBB Forums | Contributors