-
daltonb
@biob probably not what you're looking for but omz posted a demo of using CoreML for image classification: https://gist.github.com/omz/a7c5f310e1c8b829a5a613cd556863d4
You could certainly play around with more CoreML libraries that way.. probably not the easiest route for learning though.
I think your best bet for now is to stick with numpy.. you'll have to do more of your own implementation that way than using something like sklearn but you'll probably end up understanding more in the end. Do you have any end goals in mind?
-
daltonb
Hey @JonB sorry for the slow response, this did help me get over a hump though. I think my frameCapacity is less than yours which is apparently the upper limit for frameLength.. setting sample size to 2048 worked well. I'm planning to post a first crack at a live speech recognition module soon.
-
daltonb
MONEY. This works great for accessing the sound data!!
Sadly, upon further testing, setting the frame length to 1024 makes the speech recognition results very poor. Not sure why.. any ideas? Do you think the speech recognizer is expecting the original frame length somehow? For instance I say "Hello" and it outputs "LOL" sometimes, so maybe the input is getting clipped.
My frame length on my phone is actually 4410 by default which is ok, but I guess this is a platform specific number.
-
daltonb
Ok, I'll try to figure out the right casting call in the meantime. Yeah that is annoying; however from that stackoverflow post I've verified that calling
buffer.setFrameLength(1024)
succeeds in speeding up the sampling rate significantly after the first long (0.375s) sample.. haven't checked yet to see if I can update that before the first sample, but shouldn't matter too much for my purposes. -
daltonb
No worries man, appreciate the help. Any chance you could just post that casting snippet for now? That sound tricky for me.
As to the second point.. any reason not to just add the processing code to the tap block that updates the RecognitionRequest? (instead of adding a mixer)
-
daltonb
I'm trying to implement volume level metering using
AVAudioEngine
as described here: https://stackoverflow.com/questions/30641439/level-metering-with-avaudioengine. However, I'm not sure how to access the standalone functionvDSP_meamgv()
(as opposed to a class/instance method).I assume I would first need to call
load_framework('Accelerate')
, but after that I'm trying to figure out the equivalent ofvDSP_meamgv = ObjCFunction('vDSP_meamgv')
. -
-
daltonb
@JonB just to double-check, do I need to roll my own audio filtering/metering algorithm if I want to use
SFSpeechRecognizer
? (e.g. https://stackoverflow.com/questions/30641439/level-metering-with-avaudioengine). From my quick research it seems the metering algorithm is only provided withAVAudioRecorder
, which I couldn't figure out how to make compatible withSFSpeechRecognizer
since (according to my rudimentary understanding) they both try to install a tap on the mic input node. As a quick check I tried using a sound.Recorder() along with theSFSpeechRecognizer
implementation and got this crash:
com.apple.coreaudio.avfaudio: required condition is false: IsFormatSampleRateAndChannelCountValid(format)
(which is informing my interpretation that both are trying to tap the mic input.. not that I necessarily understand what that means). Thanks for any commentary! -
daltonb
Thanks @JonB, I do like how it turned out! (and any suggestions on the effect are welcome). Nice chance to brush up on my math a bit. I grabbed the offset and the squared exponent in the "envelope function" from this excellent Quora exchange: https://www.quora.com/What-function-is-Arctic-Monkeys-album-cover
-
Yessir, that's my next goal... for me it was easier to get this MVP working before messing with objective C code. It will be nicer that way for sure though.. for starters I've had a hard time getting files to close cleanly with threaded code. Just a heads up the issue affects your ping pong example from a while back as well.. at least for me after running it, whichever .m4a file was last active keeps growing after exit.
-
Oof.. I didn't know the objc interop was inherently unreliable.. that's a bit sad to hear. Do you know any details about why or specifically what scenarios? I love working in Python, but this may end up being my gateway drug to actual iOS development, ha. Thanks for the example code!
-
-
daltonb
@mikael Magic- I think that worked! It seemed to be running more smoothly but than I got those errors again.. though on further trial-and-error I think it was due to another script I ran which had similar issues. Maybe the other script left behind some thready detritus on close?