Real time audio buffer synth/Real time image smudge tool

Mederic

@JonB Ok now I realize the issue was actually there even without the filter. I went back to my non-real-time synth, took out all the effects, played a simple high pitch sawtooth and the unwanted vibrato/“ghost frequency” was there. I am pretty sure now it is aliasing. I solved the problem using the PolyBLEP method, see Phelan Kane’s article and the code sample at the end:
http://metafunction.co.uk/all-about-digital-oscillators-part-2-blits-bleps/

Here are my current modifications of @JonB audio processing buffer files, that you can get at:
https://gist.github.com/87d9292b238c8f7169f1f2dcffd170c8

Here are the attributes to add in the AudiorRenderer class:

filtersNumber=16
self.filtersBuffers= [0.0]*filtersNumber
# cutoffParam=1 will mean no filtering
self.cutoffParam=1

Here is the anti-aliased sawtooth and the 16 filters in the render method:

def render(self, buffer, numFrames, sampleTime):
		'''override this with a method that fills buffer with numFrames'''
		#print(self.sounds,self.theta,v.touches)
		fb=self.filtersBuffers
		for frame in range(numFrames):
			b=0.0
			cut=self.cutoffParam
			for touch in self.sounds:
				f,a=self.sounds[touch]
				t=self.theta[touch]
				#replace 110.0 by f if you want to control the frequency and see there is no aliasing.
				dt=110.0/self.sampleRate
				t+= dt
				t=t%1
				saw=2*t-1
				self.theta[touch]=t
				if (t < dt):
					t /= dt
					saw-= t + t - t * t - 1.0
				elif (t > 1.0 - dt):
					t = (t - 1.0) / dt
					saw-= t * t + t + t + 1.0
				b+=saw*1
				#the a control (from 0 to 1) is used to change the cutoff by setting:
				lerpfact=0.2
				cut=(1-lerpfact)*cut+ lerpfact*a
			#setting the first filter input to b=sawtooth wave
			input= b
			#start the loop around filters:
			for f in range(len(fb)):
				fb[f]=(1-cut)*fb[f]+cut*input
				input = fb[f]
			buffer[frame]=input
		self.filtersBuffers=fb
		self.cutoffParam=cut
		return 0

It’s set up to a have a fat filtered 110 Hz saw bass (showing the filter works in real time, despite a few glitches), but if you replace 110 by f to control the frequency you can check there is no aliasing even at high frequencies.

You can find the modified files here:
https://gist.github.com/medericmotte/b523acbc1c446ca889e7471afa5a9b2f

Mederic

@JonB How would I go about getting a stereo buffer with your code? Most of my sound is mono but in the very end I like to add two delay effects for each ear with different parameters to give a sensation of space, so I would like to end the render loop with something like
Buffer[frame]=[out1, out2] or something similar.

JonB

here is a cleaned up idea, whereby there are different generators. Each generator has an amplitude and base frequency, and an internal numpy buffer, which is used as a circular buffer. So we can make use of numpy rather than filling one sample at a time, should be more performant.

Samples get buffered based on a time stamp -- and get generated in touch_moved or update. But phase always increments on a per sample basis -- so if we fill the buffer, we just fill what we can. Then render method of the audio unit is simply popping the number of requested samples out of the circular buffer, and writing into the audio unit buffer, so that part should never overrun.

I still get some dropouts, though you can pre-charge the buffer more in touch_began. The debug text is showing number of buffer underruns, overruns, and current number of buffered samples.

This approach may result in some latency, but it's better at preventing tearing.

I offer a sinewave, sawtooth and triangle wave, then my plan was to implement a filtet decorator/function that let's you define filter coeffs, but that is not done yet.
https://gist.github.com/jsbain/ed6a6956c43f3d8fd40092e93e49a007

The buffer, filter and mixing is all done as doubles, conversion to float is at end. I try to maintain phase continuity (though in retrospect I might be doing it wrong).

JonB

@Mederic for stereo,
streamFormat.mChannelsPerFrame = 1;

Would be set to 2 instead of 1. Some of the other fields would then get multiplied by 2, see
https://developer.apple.com/documentation/coreaudio/audiostreambasicdescription

In my more recent code, above, buffer in render_callback, would be cast to pointer to c_float2inNumberFrames, in which case, when converting to array, b in render could be accessed as b[idxChannel,:] for one channel.

You could either create the generators in stereo, or have different generators filling reach channel.

Mederic

@JonB Thanks. I noticed you haven’t antialiased your sawtooth, hence I still hear the unwanted vibrato/ghost frequency in the high frequencies (although maybe less dominant).

Here is my previous code with only the antialiased sawtooth (you will notice that the ghost frequency is gone):

https://gist.github.com/medericmotte/d99357919ce0ed658e5fa6e3b9d82121

And here is my current code with antialiased sawtooth, vibrato, unison, chords, filter, and delay all in one render method:

https://gist.github.com/medericmotte/d8e81b7e0961006d7026f16cc195682c

With this inefficient “all computation in the render method” implementation, on my device, I am able to play one lfo vibrato, 4 notes at a time (4 fingers), each triggering 4 unison voices (detuned), so 16 antialiased sawtooths in total plus one lfo sine, at the same time, plus 4 one pole iir filters in serie, and a 1 second delay with feedback, all in one render method, with sampleRate=44100.

It works glitchless with this setting, but if I set the filtersNumber attribute to, say, an unusually high 16 poles, some glitches can be heard while changing the cutoff.

My initial idea, when creating this topic, was to fill the circular buffer with the numbers computed by my already programmed non-real-time synth, as they come (by 60Hz chunks). As you said, it may or may not introduce a bit of latency depending on the complexity of the synth, but it will at least prevent the glitching as long as I set the latency high enough. That’s what I am going to do next.

Regarding the use of numpy in this context, before a created this thread I actually had implemented two versions of my non real time synth. One standard serialized, and the other using numpy in parallel.

For the parallel one, the saw/chord/unison part was easy to parallelized, but the trick was to get a parallel algorithm for IIR filters and feedback delays. The problem is that their output depend on its own past values so they can’t, as is, be parallelized. I managed to get a parallel algorithm by truncating their transfer function’s infinite development. For instance, by approximating 1/(1-az ) = 1+az +a^2z^2+a^3z^3... +a^nz^n+... by 1+az +a^2z^2+a^3z^3+ ... + a^10 z^10. Then it becomes a (10 order) fir filter and can be implemented in parallel with numpy. A fast implementation of this approximation is the (classic):

For i in range(approximationOrder):
{OutputVector= 1+a*InputVector
InputVector=OutputVector}

Here is a test code for my serialized synth (the saw hasn’t been antialiased yet in this code):

https://gist.github.com/medericmotte/5330028059e3e94198a14b4f87a9189e

Here is the equivalent for my parallelized synth:

https://gist.github.com/medericmotte/85b0d3f9eb7bb30c03f87e5c0eb16322

On my device, to compute a 5 second sound (3*4 sawtooths + 16 one pole filters), the serialized synth take 2.9 seconds and the parallelized synth take 1.2 seconds (with a 10 order FIR approximation of each IIR).

So they are kind of faster than real time with this simple setup, which makes me think that latency may not even be a problem in most cases.

Mederic

@JonB My last post has changed quite a lot during the day so I hope you read the last version :) Sorry about that.

JonB

Your previous anti alias sawtooth wouldn't even play on my old device, whereas my numpy version basically plays with very few glitches (I need to push an updated version, I forgot to reset the buffer in touch ended, and the version I posted might have been playing only one buffer instead of adding. ). You could also do numpy within the render method, sounds like maybe that is what you have done.

I have not tried it, but I was thinking it might be acceptable to over sample, then use a half-band type FIR filter (convolve should be the fastest way to implement filters in numpy), then decimate. You'd have to store enough old samples to pre charge the filter.

In my mental model, one would have generators, filters, mixers, that do the work on their own circular buffers, then output samples on demand.

JonB

By the way, numpy arrays have a roll method that I think so the same as your shift, but should be a bit faster because done in c. Both make a complete copy, which probably might be inefficient.

I think y=numpy.convolve(h,x,'valid') where h has the FIR filter coeffs, should be a lot faster than a python loop. The filter would need to internally keep Len(h) original samples from the end of the previous signal. Also, depending on size, using fft might beat convolve. Though maybe not for these really small chunks.

Mederic

Ok I downloaded my code and ran it and it still works on my device so it could be a matter of device but I would be very surprised if it were because it’s a very small additional computation. In my code I set sampleRate=44100. Maybe that’s the issue on your device. Just to check, could you take your working audiounittest.py file and, keeping sampleRate=11000, just replace

dTheta=2*math.pi*f/self.sampleRate
b+=math.sin(theta) * a
theta += dTheta
self.theta[t]=theta %(2*math.pi)

By

dTheta=2*f/self.sampleRate
theta+=dTheta
theta=theta%1
saw=2*theta-1
self.theta[t]=theta
if theta < dTheta:
	theta /= dTheta
	saw-=theta+theta-theta*theta-1.0
elif theta > 1.0-dTheta:
	theta=(theta-1.0)/dTheta
	saw-=theta *theta +theta+theta+1.0
b+= saw *a

because it was the only real difference. With a 11000Hz sample rate the sawtooth will miss high harmonics and sound filtered but there shouldn’t be additional inharmonic frequencies due to aliasing

Also, the aliasing issue isn’t the same thing as the glitch issue. It’s really a second inharmonic frequency playing in the background but only for high frequencies, and it has nothing to do with overhead since I could hear it on my non real time synth (which first records my finger’s motion on the screen in an array, and only then computes the whole sound at once).

Yeah I thought about fft and I agree, especially for very high order filters and large audio chunks. Regarding convolution, I also agree, I don’t know numpy very well and wasn’t aware of this numpy.convolve method.

Mederic

Your mental model is the modular environment model and also the one used by Apple’s Core Audio high level audio processing system.

One of my goals was to kind of bypass this model to have the feeling of working directly on the audio data. But now I think about it, what really bothered me initially was that in visual modular environments you can’t easily have aliases and you can’t do for loops. However, implementing your model would totally allow to do loops and aliases in the code when creating and patching the modules, so I think I would dig it.
Also, it’s even more interesting since there are no text-based environment like Chuck or Supercollider on the iPad, only visual modular environments.

About the numpy way. I tested my two versions with a smaller duration. Instead of 5 seconds I used 1/60s. Here are the average results

serialized: 0.00995s
numpy parallel, order 10 approximation (without « convolve » for the moment): 0.00814s

So on small chunks, the numpy approach still beats the serialized one even when convolve isn’t used.

JonB

no, your code works, my point was that on old hardware, it is much much much slower, such that all of the filtering and non-buffered it was constantly overrunning (though it might have also just been my pythonista had gotten all screwy). when you were getting 300-1000 fps in the graphics, i was getting <60.

zin my current approach, I am thinking it might also be a good idea to only compute samples to fill the buffer inside update, so that we can use as long as possibly numpy buffers, minimizing the overhead. though you could also do this in the render of the audiounit -- there are ways, i think, to specify the minimum and maximum number of samples that the audiounit can ask for.

Mederic

Ok I was more referring to the antialiasing algorithm itself rather than the way I did it in the render method, I mean that if you use this algorithm in your numpy approach it would probably kill the aliasing ;)

Edit: About numpy, with an order 10 approximation (truncating the transfer function at the z^10 term), it still beats the serialized approach by 0.002s on a 1/60s chunk (see above). But with an order 30 it looses (without the convolve or roll methods, which I will try).

Mederic

I think that for pure oscillator modules it might be simpler to store a wave table in the AudioRenderer and have the buffer loop around it by skipping every self.freq index in the table. The filter (only for this oscillator) could then simply be the program updating the wave table (using numpy for instance).

JonB

I have not run your latest stuff.. will try this weekend.

A few thoughts. First, you might consider generating sawtooth via weighted sum of sines, up to Nyquist. (Using coefficients from the Fourier expansion). Thus, you can perfectly bandlimit, with a small number of additions and without the headache or distortion of filtering. i.e for 440Hz fundamental, summing through 6th harmonic gets you to 14480, 7th would be 28960 which would fold over if sampling at 44100. So, you get pure tone by effectively adding 6 numbers, probably faster than trying to do a 20 tap filter, though at the expense of 6x the number of sin calls.

I suspect much of the aliasing you are getting is this sort that you can't filter after sampling. When you go to 88kHz, I'm guessing the hardware has limits in the 40kHz range, so your filtering does help aliasing.

A low frequency sawtooth requires more terms obviously, than a high frequency, so this is not a constant time operation, but that's probably ok. You can do this as a list comprehension which accelerates this, or you could generate t as matrix, then do multiply the row vector amplitude weighting times the sin(matrix) which then does the weight and sum in one efficient numpy step.

If you are then trying to modulate (multipling by some low frequency tone), there might be other considerations.

For arbitrary signals that you get out of nonlinear operations, one thought is you might be able to fine tune some filtering by oversampling, say 2 or 4x. Since numpy is efficient, that might still be small compared to filter overhead. And this would eliminate any real aliasing for signals higher than Nyquist of the output audio.

With numpy.convolve you'd end up throwing away samples, so you could do a custom convolve using a downsampled toeplitz matrix. For oversample factor q, and FIR filter coeffs in h

T=np.zeros(N,N*q)
for i in range (N):
    T[i, q*i:(q*i+len (h))]=h 
xf=T*xs

(You may need to ensure T is a numpy matrix, not array, so maybe need an .asmatrix before multiply.. I forget how that works in numpy)

Thus, the filtering op can be expressed as a Nx(qN) times qNx1 matrix multiply which should be efficient and vectorized in numpy. Your modified toeplitz can be precomputed if the number of samples is fixed, which might be possible to force the audiounit to do.

You have manage the filter transients by saving len(h) samples of the end of the previous signal.

I think there is a rule of thumb that Ntaps=A/(22×B) where B=(fstop-fpass)/fs. So in this case, if we wanted 60 dB attenuation at fs/4

Mederic

@JonB Sorry, not sure if you have seen my latest post since you posted at the same time:

I think that for pure oscillator modules it might be simpler to store a wave table in the AudioRenderer and have the buffer loop around it by skipping every self.freq index in the table. The filter (only for this oscillator) could then simply be the program updating the whole wave table (using numpy for instance).

JonB

I thought about that.. you would be limited to specific frequencies, f=M/N*fs for integers M,N. If M is high enough, then yeah you eliminate all the sin calls.

Mederic

I like the sum of sines method because this was my first understanding of sound synthesis and for a long time I wanted to implement everything that way and with gpu computing. Then I learned about DSP and the whole poles and zeros thing and kind of fell in love with it. But I might try the additive thing again with numpy.

But as you said, it is a lot of sines for low frequencies, especially when you realize that sawtooth are the most beautiful in this range (for bass sounds etc), and with unison it multiplies the number of sines to add together, and with chords it’s even worst.

JonB

Ok, I see what you mean.. yes each generator can produce an antialiased signal, either by direct synthesis (sum of sin's) or by precomputing a filtered oversampled version that can work at multiple fixed freqs.

For linear mixing, just adding signals, that would be enough to produce clean sound, though maybe one has to watch out for clipping.

For x(t)*y(t) type mixing, you get frequency multiplication so need to either prefilter the antialiased signals again, or over sample, mix, then filter/decimate. Or some combo... If you antialias first, you may miss real content such as multipling a 44000 sine with a 44444 sine produces a real tone at 444 and 88444, but if you antialias first you would get silence.

Mederic

Also, I am not totally sure sum of sines is equivalent to DSP filters. I mean, in a stationary state it IS equivalent, but when you modulate the cutoff the output signal is in a non stationary state and I don’t think the additive synthesis is then equivalent to the DSP.

I am sure I wrote that but I must have deleted it:

The polyBLEP method can really be seen as adding to a hard angle sawtooth the right residu to kill the hard angle and put a soft angle in its place. It’s also what filtering does but here it’s not really that. I mean that it’s an analytic method, so it can be formally/analytically computed for any frequency without storing anything. It’s just a 2 order polynomial added to the sawtooth.

Saw being the hard angle sawtooth, theta being its phase, and dTheta the frequency/sampleRate:

if theta < dTheta:
    theta /= dTheta
    saw-=theta+theta-theta*theta-1.0
elif theta > 1.0-dTheta:
    theta=(theta-1.0)/dTheta
    saw-=theta *theta +theta+theta+1.0

And you’re right about the x(t)*y(t) part. But it’s kind of an extreme case you took. Most of the time, this kind of multiplication happens between an osc and an lfo.

Personally, for my usage, I’d like to stay minimalistic so not complicate the code too much to take into account extreme/rare situations. My goal is to tweak the code creatively during the music composing process, so I need a base-code that is simple and tweakable.

I will focus on the numpy pre-buffering idea and the convolve method. Right now I am worried about the fact that I would need to compute the impulse response each time the filter parameters are changed...

If I use my 16 poles filter with a 10 order approx of each IIR one pole filter in it, it means I have to compute a 160 coefs impulse response everytime a parameter is changed. I wonder if it’s really manageable for fast filter sweeps. One solution might be to compute these 160 coefs for each value of the filter’s parameters, so a 160xN matrix if only cutoff, but a 160xNxM matrix if cutoff+resonance...

I’ll probably end up not using convolve because of that. But I am planning to add an additive synthesis functionality so that a special additive generator can emulate a sawtooth+filter. Even if it’s not mathematically equivalent when the cutoff is modulated, it’s probably musically interesting ;)

Mederic

And for what you said about only having frequencies of the form M/N * fs , a solution is to keep track of the phase theta (as a precise and continuous number, not necessecarily a multiple of 1/fs) and then the render method looks at the wavetable, either at the closest index to theta (so [theta*fs], or better by interpolating between this index and the following value in the table. But anyway keeping track of the exact phase should give you any frequency. The table is just a lookup table.