Real time audio buffer synth/Real time image smudge tool

Mederic

This topic is about investigating ways to achieve real time in Pythonista in situations that are extremely real time dependent.
These situations are generally found in two fields. Audio (synth, filters, sound processing), and Image (smudge tool). What makes these situations difficult is that, as opposed to usual real-time applications, the computer doesn’t just need to have your ui input and a few variables to update the internal state of the program and play/display whatever it should, it instead needs the actual data that is being played/displayed (the screen image or the audio output). It’s a lot more data than a few hidden state variables. So while you‘re hearing the audio/seeing the screen and deciding what to do next, the computer is actually also working directly on the current audio/screen data in order to compute the next thing you will hear/see, the challenge is then to deal with these often large amount of data fast enough to deliver it in a seamless fashion to you.

Here are the advances we made in the audio department (see the end of the post for the image part)
Real time audio processing in Pythonista:
At the lowest level, real time audio processing is achieved with a processing (also called circular) audio buffer (a kind of array). Basically, the code iteratively fills this buffer with the next bit of sound and send it to your ears when you’re finished hearing the current one.
Most audio processing effects (filters, for instance) needs the very data that you’re currently hearing to compute the next bit of sound. So before sending it to your ears, they will copy that data and start working on refilling the buffer with the next part before you’re done with the current one and come back for more sound.
@JonB AudioRenderer wrapper (see below) is a great solution to have access to such an audio buffer functionality in Pythonista.
https://gist.github.com/0db690ed392ce35ec05fdb45bb2b3306

Here are my current modifications of @JonB files to get an antialiased sawtooth instead of a sine wave, a 4 poles filter you can control, unison, vibrato, chords (with several fingers) and a delay, all in one buffer/render method:

https://gist.github.com/medericmotte/d8e81b7e0961006d7026f16cc195682c

It’s set up to play one chord with number of notes = number of fingers on the screen (up to 4). You can control the filter by moving up and down the first finger having touched the screen.

It’s an inefficient implementation because all the work is asked in perfect real time as if the control parameters were changing on a sample by sample basis. It’s at the edge of glitching (to see it, just set the filterNumbers to 16 and notice the glitches when creating filter sweeps). The solution for that is to compute audio elsewhere with bigger anticipation chunks of audio corresponding to, say, a 60Hz rate (because that’s pretty much the rate at which touch_moved are received anyway) and then progressively fill the circular buffer with the computed data.

Real time image smudge tool in Pythonista:
A real time smudge tool (See also my second post) works similarly to a filter or a reverb, only it processes regions of an image along a brush stroke rather than bits of sound along a sound stream. @JonB ‘s IOSurfaceWrapper (see below, and thanks to him) made it easy for me to code a real time smudging tool (lots of comments in there as well):

https://gist.github.com/medericmotte/37e43e477782ce086880e18f5dbefcc8

It can be interesting to take a look at my previous approach, especially to compare their speeds:

https://gist.github.com/medericmotte/a570381ca8adfcec6149da2510e81da2

The difference, on my device at least, seems small at first glance, but when you smudge in a very fast circular fashion (around a small circle) you will notice that with my previous approach, the blue cursor can’t keep up and ends up being on the opposite side of your finger’s circular motion, while with the current approach, the cursor is always perfectly in line with your finger.

JonB

so, it actually is possible to use the underlying AudioUnit parts of coreaudio, using ctypes.
https://gist.github.com/0db690ed392ce35ec05fdb45bb2b3306

This has been something I have wanted to do for a long time.
It is a very rough first cut, but you can override the render method of AudioRender to do what you want. ios calls this method at a high rate (on my ipad, about 40 fps), you are provided a buffer pointer, and a number of samples (ios decides how long it should be), and you fill in the samples.

in this example, i create tones based on finger location, and despite horribly inefficient code, it manages to keep up in real time and provide gapless audio.

Mederic

Thanks a lot! I am very impressed and I had no idea it was possible!

Just in case, as I mentioned real time image in my post, I actually coded a smudge tool in Pythonista and it is kind of real time but still laggy.

Basically I use a 2D-numpy array representation of my image and constantly “blend” the portion around the cursor’s prev_location on the portion around the cursor’s location. Then, at a given rate, I update the image of my ImageView by converting the numpy array to an ui.image.

From what I tested the lag seems to mostly come from the conversion.

I tried doing that with PIL Images instead of numpy but it wasn’t really faster. I also tried directly doing it in an ImageContext and it was actually slower.

My question is, is there a way to use Metal (or some other gpu computing API) with objc_utils and ctypes to do that? And will it be faster?

JonB

what were you using to render the image?

I have never found a great way to do realtime image updates. It may be possible with some low level library, but up until now, but ios doesnt really allow access to screen buffers directly, except maybe in some low level video libraries (where you are passed a buffer, and expected to fill it). That deserves another look.

A few things I have found:

Converting to low res jpg is a lot faster than, say, bit accurate png. I use that method in my first attempt at a matplotlib pinch/pan view:
https://github.com/jsbain/objc_hacks/blob/master/MPLView.py (see updateplt). in that method, i have a thread that updates the ImageView using a reused BytesIO (reusing it saves a little time). updateplt is done in a thread, and has some locks to know when the conversion is complete, and basically tosses other calls, so that it updates as as fast a rate without affecting ui responsivity. Also, for the updates while moving, I use a very low resolution jpg for rendering, which is much much faster than doing bit accurate png. That allows for a pretty responsive feel, and maybe 20 fps or something, i forget. then it renders the full dpi after touch ends.
While working on porting an appleii simulator, i experimented with several ways of rendering to a ui.Image from a numpy array, very similar to what you want. here's an example speed compare:
https://gist.github.com/jsbain/1df982ee81e78ae8958b073fa7194a9c

At the time, I think I found that matplotlib.imsave was faster than PIL Image.fromarray, which is what is implemented i the screen.py -- though I think in the latest pythonista version, Pillow's fromarray is much faster. In the speedtest, a 300x300 is rendered to ui.Image at about 12 fps on my old Ipad3 with Image.fromarray, versus 6fps using matplotlib.imsave. I used a similar system to basically renders as fast as possible, and calls to update get queued/grouped if there is already an update in progress (you may be able to use screen.py directly, though probably would want to switch over to the faster method). I am actually using a custom view draw rather than an imageview, though I forget if there was a good reason for that.

For your smudge application, one thing you might consider is to have an imageview, with a custom view on top (similar to the applepy screen above) that only renders the portion of the view that has been touched. i.e you keep track which pixels are dirty, and only render the bounding box of those dirty pixels. you would keep track of the corner of that bounding box, so that you can then use .draw() with the right pixel offsets. Then in the background, you would be rendering the "big" image, maybe when the finger lifts, and reset the dirty pixels.

A variation on this would be to divide the image up into small chunks, and render ui.Images for a chunk only when that section has been affected, the. your draw() method would always ui.Image.draw all of the chunks at the proper locations. that avoids having to ever render the big image.

JonB

Hmm, looks like IOSurface backing a CALayer might be an easy way to do what we want here, without going through an intermediate image... that will be tonight's experiments.

Mederic

I will clean my code a little bit and post a link later.

I had already try that trick with the small imageView around the cursor. The thing is, when doing big strokes with the smudge tool, the small imageView isn’t big enough (and making it bigger with time ends up causing lag like when there is no small imageView), so I had to test when the cursor leaves the small imageView area during the stroke and update the big image when that happens before moving the small imageView back to the cursor. For some reason, it didn’t really improve anything compared to not using a small imageView. Somehow, the big image updates were still expensive, and although I could make them happen less often by making the small imageView bigger, the small imageView updates would then cost more, and in the end, no real improvement.

However, instead of a small imageView, directly using a custom view allowed me to ask the code to “convert the dirty portion of the numpy array and draw it” in one line in the draw def. Somehow it improved things a lot, but still caused time glitches/lag when the big image updates were needed.

Now I am experimenting with several small custom views, basically relaying each other when the cursor leaves their respective areas, so that I only have to update the big image when the last small view has been used. I am using 8 views and It’s almost perfect.

I will try your variation though. It could definitely be perfect as well.

Btw, I use fromarray to render the image ;)

And I don’t know anything about IOSurface and CALayer, I am (kind of) new to this kind of librairies

Mederic

So I cleaned up my code. I did my best but probably didn’t respect some conventions...
I wrote a lot of comments and explanations though.

For my IPad Pro 12.9, it is close to real time, although not as reactive as the Procreate Smudge tool, which I find fantastic, but, to my opinion (and taste), still better than a lot of smudge tools I tried in different apps, so I am kind of happy with it :)

You can use the Apple Pencil by setting applePencil=True
You can see the debug mode by setting debug=True

https://gist.github.com/medericmotte/a570381ca8adfcec6149da2510e81da2

By the way, I tried the method where you split the canva in several sub views in a grid, it seemed like having too many views at the same time is also causing lag.

enceladus

May be try to use scene and shader.

Mederic

It might work but then I’d still have to reload the texture as the numpy array changes. But maybe it woul be faster.

To avoid the constant reloading I would have to compute the smudge effect directly in the OpenGL code, but it has two issues for me:

The texture would have to be stored with float data because smudging int8 causes some ugly spot around the white areas.
I don’t know how I would change the texture in real time directly within the OpenGL code. Do you know a way to do that? I thought they were read-only here, but I do remember hearing about OpenGL image buffers, is it possible in Pythonista?

Mederic

There might be a simple way to do it with the render_to_texture function. I don’t know how fast it would be but I am gonna give it a try today.

enceladus

Look at Examples/games/BrickBreaker.py (particularly wavy option)

enceladus

FWIW my GitHub directory contains few basic examples on scene and shader. https://github.com/encela95dus/ios_pythonista_examples

Mederic

Yeah I’ll try that, but again, it’s the speed of render_to_texture() that will tell if it’s enough for real time.Because a function like wavy needs a texture of the image at frame n to display the image at frame n+1, but then I need to render that image to a texture so that the shader can process it and display the image at frame n+2, etc

Mederic

Actually, now I think about it, the problem is that scene and shaders compute their display only at 60 fps, and I think it’s not enough because for fast strokes you need to compute more often than that (otherwise you will have holes or irregularities between the smudge spots).

In my code I use a while(true) loop to compute the smudging (outside of the ui class) and its rate is only limited by the (very short) time numpy takes to add arrays.

By the way, somehow I now that it’s not good to use while(True) loops that way, but I don’t know what is the good practice to do the equivalent, at the same speed. Because of that loop, for example, right now when I close the ui window it doesn’t stop the code, and I need to do it manually with the cross in the editor. What should I do about that?

Mederic

@JonB :

So back to the topic of real time audio, I modified your code to have a sawtooth instead of a sine, and then implemented a simple lowpass filter. There is an unwanted vibrato sound happening in the background for high frequencies, which is probably an aliasing behavior due to the inability of the program to keep a perfect rate? I am not sure. If I set the sampleRate to 44100, the vibrato seems less important (which kind of supports my aliasing assumption? Again, not sure) but still noticeable. Interestingly, I tried sampleRate= 88200 and the unwanted vibrato was gone. The thing is, when one changes the sampleRate, the filter actually behaves differently. Basically, taking a higher sampleRate with the same filter algorithm will tend to make its cutoff higher, so, for the comparison to be “fair”, with a 88200 sampleRate I replaced the 0.9 in the render method below by 0.95, and unfortenately, the unwanted vibrato was back :(

I also thought maybe it was a problem with the data precision and error accumulation so I tried scaling up the data in the render method and renormalizing it in the end for the buffer but that didn’t fix the issue.

To hear the unwanted vibrato with a 11000 sampleRate, all you need to do is add an attribute

self.z=[0,0]

in the AudioRenderer class and then change the render method this way (to have a filtered sawtooth):

def render(self, buffer, numFrames, sampleTime):
		'''override this with a method that fills buffer with numFrames'''
		#print(self.sounds,self.theta,v.touches)
		#The scale factor was to try to win some precision with the data. Scale=1 means it doesn’t scale
		scale=1
		z=self.z
		for frame in range(numFrames):
			b=0
			for t in self.sounds:
				f,a=self.sounds[t]
				theta=self.theta[t]
				#dTheta=2*math.pi*f/self.sampleRate
				dTheta=(f*scale)/self.sampleRate
				#b+=math.sin(theta) * a
				b+=((theta%scale)*2-scale)*a
				theta += dTheta
				#self.theta[t]=theta %(2*math.pi)
				self.theta[t]=theta%scale
			z[0]=0.9*z[0]+0.1*b
			z[1] = 0.9*z[1]+0.1*z[0]
			buffer[frame]=self.z[1]/scale
		self.z=z
		return 0

JonB

@Mederic Re: rendering numpy arrays, iosurface/calayer is amazingly fast:

Here is an iosurface wrapper that exposes a numpy array (w x h x 4 channels) and a ui.View:
https://gist.github.com/87d9292b238c8f7169f1f2dcffd170c8

See the notes regarding using .Lock context manager, which is required.
Just manipulate the array inside a with s.Lock(), and it works just like you would hope.

On my crappy ipad3, I get > 100 fps when updating a 50x50 region, which is probably plenty fast.

edit: i see you are using float arrays. conversion from float to uint8 is kinda slow, so that is a problem.

JonB

@Mederic regarding while True:

doing while v.on_screen:
or at least checking on_screen is a good way to kill a loop once the view is closed.

Mederic

Ok thank you.

I ran your code and it is very fast but I have a question (and as I am still not familiar with the libraries you use, it might take a while to figure out the answer on my own):

The printed fps is around 1000 on my IPad Pro.

Now, I computed the fps of my PythoniSmudge code and I realize it’s important to have two fps data here:

The computation fps of my while(True) loop was around 300
The fps of my Views (computed by incrementing an N every time a draw function is over) was 40

That is important because the first fps makes sure the smudge tool is internally computed continuously enough to avoid having irregularities and holes in the path on the final image (nothing to do with lag), (which is the case with computation fps = 300), and the second fps makes sure that my eye doesn’t see lag on the screen (which is the case as soon as view fps>30)

My question is, what does your fps=1000 compute exactly? It seems to only be the computation fps but maybe I am wrong and it somehow includes the view fps as a part of it, but I would really need to isolate the view fps because that is really what causes the sensation of lag.

If really 1000 IS the view fps, then it’s more than enough.

JonB

I believe it is the actual view FPS but you might want to increase N to get better timing. The redraw method should effectively block while data is copied over.

What you would do is have a single view, from the iosurface. You could try s.array[:,:,0]=imageArray, but that may be slow since it must copy the entire image.

Better would be to determine the affected box each touch_moved, then only copy those:

with s.Lock():
    
s.array[rows,cols,0]=imageArray[rows,cols]

(Where rows And cols are indexes to affected pixels)

To keep monochrome, you would want your imageArray to be sized (r,c,1)
to allow broadcasting to work

with s.Lock(): 
s.array[rows,cols,0:3]=imageArray[rows,cols]

This way you only copy over and convert the changed pixels each move.

JonB

By the way... You might get acceptable performance with your original code if you use pil2ui with a jpeg instead of png format during touch_moved, then switch over to the png during touch_ended.
Also, you might eek out some performance by using a single overlay view, but rendering N ui.images, that are drawn during the view's draw method. That way you don't have the overhead of multiple views moving around. You would keep track of the pixel locations. See ui.Image.draw, which let's you draw into an image content. I think draw itself is fast, if you have the ui.Images already created.

That said, the iosurface approach should beat the pants off these methods.