Welcome!
This is the community forum for my apps Pythonista and Editorial.
For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.
Real time audio buffer synth/Real time image smudge tool
-
Ok I was more referring to the antialiasing algorithm itself rather than the way I did it in the render method, I mean that if you use this algorithm in your numpy approach it would probably kill the aliasing ;)
Edit: About numpy, with an order 10 approximation (truncating the transfer function at the z^10 term), it still beats the serialized approach by 0.002s on a 1/60s chunk (see above). But with an order 30 it looses (without the convolve or roll methods, which I will try).
-
I think that for pure oscillator modules it might be simpler to store a wave table in the AudioRenderer and have the buffer loop around it by skipping every self.freq index in the table. The filter (only for this oscillator) could then simply be the program updating the wave table (using numpy for instance).
-
I have not run your latest stuff.. will try this weekend.
A few thoughts. First, you might consider generating sawtooth via weighted sum of sines, up to Nyquist. (Using coefficients from the Fourier expansion). Thus, you can perfectly bandlimit, with a small number of additions and without the headache or distortion of filtering. i.e for 440Hz fundamental, summing through 6th harmonic gets you to 14480, 7th would be 28960 which would fold over if sampling at 44100. So, you get pure tone by effectively adding 6 numbers, probably faster than trying to do a 20 tap filter, though at the expense of 6x the number of sin calls.
I suspect much of the aliasing you are getting is this sort that you can't filter after sampling. When you go to 88kHz, I'm guessing the hardware has limits in the 40kHz range, so your filtering does help aliasing.
A low frequency sawtooth requires more terms obviously, than a high frequency, so this is not a constant time operation, but that's probably ok. You can do this as a list comprehension which accelerates this, or you could generate t as matrix, then do multiply the row vector amplitude weighting times the sin(matrix) which then does the weight and sum in one efficient numpy step.
If you are then trying to modulate (multipling by some low frequency tone), there might be other considerations.
For arbitrary signals that you get out of nonlinear operations, one thought is you might be able to fine tune some filtering by oversampling, say 2 or 4x. Since numpy is efficient, that might still be small compared to filter overhead. And this would eliminate any real aliasing for signals higher than Nyquist of the output audio.
With numpy.convolve you'd end up throwing away samples, so you could do a custom convolve using a downsampled toeplitz matrix. For oversample factor q, and FIR filter coeffs in h
T=np.zeros(N,N*q) for i in range (N): T[i, q*i:(q*i+len (h))]=h xf=T*xs
(You may need to ensure T is a numpy matrix, not array, so maybe need an .asmatrix before multiply.. I forget how that works in numpy)
Thus, the filtering op can be expressed as a Nx(qN) times qNx1 matrix multiply which should be efficient and vectorized in numpy. Your modified toeplitz can be precomputed if the number of samples is fixed, which might be possible to force the audiounit to do.
You have manage the filter transients by saving len(h) samples of the end of the previous signal.
I think there is a rule of thumb that Ntaps=A/(22×B) where B=(fstop-fpass)/fs. So in this case, if we wanted 60 dB attenuation at fs/4
-
@JonB Sorry, not sure if you have seen my latest post since you posted at the same time:
I think that for pure oscillator modules it might be simpler to store a wave table in the AudioRenderer and have the buffer loop around it by skipping every self.freq index in the table. The filter (only for this oscillator) could then simply be the program updating the whole wave table (using numpy for instance).
-
I thought about that.. you would be limited to specific frequencies, f=M/N*fs for integers M,N. If M is high enough, then yeah you eliminate all the sin calls.
-
I like the sum of sines method because this was my first understanding of sound synthesis and for a long time I wanted to implement everything that way and with gpu computing. Then I learned about DSP and the whole poles and zeros thing and kind of fell in love with it. But I might try the additive thing again with numpy.
But as you said, it is a lot of sines for low frequencies, especially when you realize that sawtooth are the most beautiful in this range (for bass sounds etc), and with unison it multiplies the number of sines to add together, and with chords it’s even worst.
-
Ok, I see what you mean.. yes each generator can produce an antialiased signal, either by direct synthesis (sum of sin's) or by precomputing a filtered oversampled version that can work at multiple fixed freqs.
For linear mixing, just adding signals, that would be enough to produce clean sound, though maybe one has to watch out for clipping.
For x(t)*y(t) type mixing, you get frequency multiplication so need to either prefilter the antialiased signals again, or over sample, mix, then filter/decimate. Or some combo... If you antialias first, you may miss real content such as multipling a 44000 sine with a 44444 sine produces a real tone at 444 and 88444, but if you antialias first you would get silence.
-
Also, I am not totally sure sum of sines is equivalent to DSP filters. I mean, in a stationary state it IS equivalent, but when you modulate the cutoff the output signal is in a non stationary state and I don’t think the additive synthesis is then equivalent to the DSP.
I am sure I wrote that but I must have deleted it:
The polyBLEP method can really be seen as adding to a hard angle sawtooth the right residu to kill the hard angle and put a soft angle in its place. It’s also what filtering does but here it’s not really that. I mean that it’s an analytic method, so it can be formally/analytically computed for any frequency without storing anything. It’s just a 2 order polynomial added to the sawtooth.
Saw being the hard angle sawtooth, theta being its phase, and dTheta the frequency/sampleRate:
if theta < dTheta: theta /= dTheta saw-=theta+theta-theta*theta-1.0 elif theta > 1.0-dTheta: theta=(theta-1.0)/dTheta saw-=theta *theta +theta+theta+1.0
And you’re right about the x(t)*y(t) part. But it’s kind of an extreme case you took. Most of the time, this kind of multiplication happens between an osc and an lfo.
Personally, for my usage, I’d like to stay minimalistic so not complicate the code too much to take into account extreme/rare situations. My goal is to tweak the code creatively during the music composing process, so I need a base-code that is simple and tweakable.
I will focus on the numpy pre-buffering idea and the convolve method. Right now I am worried about the fact that I would need to compute the impulse response each time the filter parameters are changed...
If I use my 16 poles filter with a 10 order approx of each IIR one pole filter in it, it means I have to compute a 160 coefs impulse response everytime a parameter is changed. I wonder if it’s really manageable for fast filter sweeps. One solution might be to compute these 160 coefs for each value of the filter’s parameters, so a 160xN matrix if only cutoff, but a 160xNxM matrix if cutoff+resonance...
I’ll probably end up not using convolve because of that. But I am planning to add an additive synthesis functionality so that a special additive generator can emulate a sawtooth+filter. Even if it’s not mathematically equivalent when the cutoff is modulated, it’s probably musically interesting ;)
-
And for what you said about only having frequencies of the form
M/N * fs
, a solution is to keep track of the phase theta (as a precise and continuous number, not necessecarily a multiple of1/fs
) and then the render method looks at the wavetable, either at the closest index to theta (so[theta*fs]
, or better by interpolating between this index and the following value in the table. But anyway keeping track of the exact phase should give you any frequency. The table is just a lookup table. -
Maybe I misunderstood your filter.. are you high pass filtering or lowpass?
I am not a digital music person, but I see now you are going for "supersaw" type effects, as opposed to "pure" sawtooth.
http://www.ghostfact.com/jp-8000-supersaw/
So high pass filtering to keep the low frequencies clean is what your are after -- meaning you need to change the filter as frequency changes (or have a bank of filters that you select depending on frequency). Blep is more about shaping things in the time domain.?Still, doing things vectorised, an antialiased supersaw would be 7 tones × mabe 10 octaves at most, might still be okay if you do it as matrix based ( no loops)
Another approach would be to use GPU processing. It wouldn't be too hard to write a shader that computes hundreds of filters in parallel, and you just select the one that you need, though moving data in/out now becomes the limiter.
-
Ì am lowpass filtering. Not a huge fan of high frequencies in general. Love the warm low ends.
What might be confusing in my code is that I don’t use standard algorithms. I have experimented a lot to find my own preferences in filtering and everything :) So my algo probably doesn’t correspond to standard stuff, but is still based on the zeros/poles theory.
There are several components in my personal synth:
- A pure antialiased sawtooth.
- A unison : basically duplicating 4 or 6 times the mentioned sawtooth and detuning/dephasing it.
- A filter. I actually like the unconventional 16 poles because of the way you can get back an almost perfect sine by setting a low cutoff.
- A slight vibrato applied to every frequencies.
- A stereo delay which, combined with the vibrato, creates a very minimalist but beautiful and pure reverb effect (I don’t like convolution reverbs, too resonant).
I am not necessarily going for a supersaw like the ones you hear in techno. I tend to play some parts with no delay/unison and just a dry low sawtooth with filter cutoff modulation, and some other parts with delay+unison to get the feeling of a big space and ensemble.
Not a fan of complex patches. I like the brut minimalistic sensation of electric tension vibrating in wires. The sawtooth provides that feeling. The filter allows me to stay on the edge between « too soft » and « too aggressive » in a human expressive way to get that « breaking through the air » sound and that tension :)
So basically I don’t want to make the sound more interesting by the way its built but more by the way I play it and modulate it with expressivity. That’s why real time is important to me.
My musical genre is closer to movie music than techno, so not aggressive, lots of low ends and long chords. Kind of an orchestral synth feeling.
Actually, I’ve been doing sound synthesis in modular environments for a long time and I know now what I like. The only part I was really missing was the access to a circular buffer. And now I am just trying to polish the process to have as few glitches as possible and no aliasing, given what I like. I think it will have to do with a bit of precomputing/latency and well placed numpy gpu computing, with the kind of ideas you mentioned.
There is something beautiful in the zeros/poles theory of DSP. I like to think about filters in terms of placing the zeros and poles rather than in terms of actual cutoff frequency and resonance. In that approach, I am not really interested in the actual cutoff frequency, because I tend to modulate it a lot anyway... so not a huge fan of standardized lowpass filter algorithms.
But I do like the idea of defining the shape of the frequency response I want and summing sines according to it, because it’s another kind of freedom and because I also like additive synthesis anyway (organ, pads stuffs).
-
@Mederic I realized in my last post, iwas an idiot an confused harmonic vs octave. So, obviously 10 tones is insufficient for a 20 hz sawtooth...
Haven't read your latest post yet, but I just realized this...
-
Take a look at the generator code, you could still do all of your work there in one place, to take advantage of buffering.
One thing I don't quite get in your approach.. if you're low pass filtering after sampling, that should not affect aliasing -- you are already aliased with a sawtooth, and that will rollover and produce low frequency garbage. If you added noise before filtering, I suppose it would help with quantization artifacts, but for 32 bit those are down reasonably low I think. .The sawtooth smoothing with the polynomial would help with actual sampling aliasing. The filter is shaping your sound certainly, but is not antialiasing... Am I missing something?
-
@JonB I never said I used the filter for antialiasing. Maybe I was confusing because I started talking about the aliasing problem at the same time as I talked about the filter, but they have nothing to do with each other in my code.
The only thing I use to antialiase is the polyBLEP method (the polynomial residu), and it doesn’t just help with antialiasing, it works really well (although not technically killing frequencies > Nyquist , it weakens them enough to be inaudible).
That being said, theoretically speaking there is actually one (theoretically perfect) filter that WOULD antialiase a sound with base frequency=freq: it’s, by definition, the one with frequency response
g(n*freq)=(n*freq<Nyquist)
. The corresponding signal the filter would need to convolve its input with is thenIR(t)=sin(freq*t)+sin(2*freq*t)+sin(3*freq*t)+...+sin(Nyquist*t)
(Here I assumed that Nyquist is a multiple of freq, and I omitted the 2pi factors) This function can be computed using
sin(x)=Im(exp(ix))
and the geometric sum formula. I am gonna omit the 2pi factors here again.IR(t)=sin((freq+Nyquist)*t/2)*sin(Nyquist*t/2)/sin(freq*t/2)
Anyway, it is a continuous time function. In other words, you would have to convolve the filter’s input with a continuous time function as opposed to the discrete impulse response DSP filters use. That’s why DSP filters can’t do (perfect) antialiasing, and that’s why I am not counting on it, even if oversampling them would make them more time continuous, they would still be time discrete by nature (also, the form of their transfer functions shows that their frequency response loops around the complex circle so any lowpass DSP filter ends up not killing some frequencies if you go high enough).
-
And regarding the sum of sines, for a 100Hz note you would need to go up to
Nyquist=22050
so approximately 220 sines to sum just for one note. If you want 7 detuned saws playing one note in unison, you would need7*220= 1540
sines. If you want to play a chord of 4 notes with it and in that range, you would need approximately1540*4= 6160
sines to sum at the same time, each multiplied element-wise by some frequency response function to evaluate in parallel on a 6160 sized vector (the frequencies), all that knowing that one might play notes in a fast way so you can’t compute things too much in advance. Do you think it could work? You know more than me when it comes to computational time (I have a mathematical background and have done a lot of programming but only as a hobby so still a lot to learn). -
@JonB I think I need to take a step back and put in practice all this ideas we mentioned. There is a lot of information in there, and I have a lot to learn (c_types, objc_utiles , more about numpy, Apple Core Audio) and a lot of ideas to try (Standard DSP, parallel approximation, convolution, sum of sines, pre-buffering) and I feel like I need to catch up with all that before we keep thinking about new ideas (the more ideas we get, the more work it will take to try them all) ;)
Also, I will try to code it first myself (as a learning exercise) before looking more at your code (as a reference/solution). I learn better that way.
I will get back to you when all of this will be done. I am under the impression that you are also interested in coding a synth with Pythonista (correct me if I am wrong), but it seems that your vision is to have a complete modular environment while I am (right now) really trying to do something minimalistic and customized for my personal (and subjective) preferences, so we will probably end up with two different codes :)
-
I am not coding a synth. Mostly I have been interested in pushing the boundaries of high performance on pythonista, as a fun exercise, and trying to understand these various libraries. Making the iPad bleep is also kinda fun ;)
-
@JonB Coming back sooner than I thought :)
I made an interesting experiment with your codes and there is some phenomenon I am not sure I understand right.
If you take your audiounittest.py code (the very first one) and play the sine by modulating the frequency very fast on the whole range, you will notice quantized modulation. Now if you take your other code: audiounittest2.py, set it to a sine, and do the same thing. It’s a perfectly smooth frequency modulation. Do you hear the difference? It’s not really audio glitches like caused by overheading. It’s really « quantized » frequencies.
Here is what I tried to fight that. In the render() method of audiounittest.py, I stored the last used frequency in the previous render() call in a self.previous_frequency attribute.
Then, at the beginning of render(), I assign self.previous_frequency to a prev_freq variable and the frequency from self.sound[touch_id] to a current_freq variable. Then, during the « for frame in range(numFrames) » loop, I interpolate between prev_freq and current_freq. It killed the frequency quantization effect. That’s the only way I could get that perfectly smooth modulation I was hearing in audiounittest2.py.
The problem was solved but I still wanted to understand what was the issue.I first thought it had to do with maybe the touch_moved() method getting slowed down in audiounittest.py by the less efficient implementation compared to audiounittest2.py, but I just timed them and both touch_moved are around 110Hz. In other words, the sounds[touch_id] attribute changes as often in audiounittest.py as in audiounittest2.py. So it doesn’t explain the quantization.
Here is my current guess.
In audiounittest2.py, you compute the samples perfectly without missing any frequency values because the touch_moved() method is actually calling the generator and sending to it the frequency as a parameter. So no frequency is missed by the generator.
On the other hand, in audiounittest.py, the render() method generally fills the buffer faster than the duration of the buffer itself. In other words, there is always time when the render() method « waits » before filling the buffer again. The consequence is that when it fills the buffer it actually only accesses the first few frequency values happening during this very short time and use them to fill the whole buffer. Then, during the waiting time, it misses all the other frequency values. Btw, I don’t think it falls under the scope of overheading. To me overheading is the opposite (the render() method being to slow), but here it would be kind of too fast.
What do you think about this? Am I understanding the issue correctly?
-
Check how many samples render is asking for (display numFrames). In my version, it asks for 1024 at a time for 44100 sample rate. So, if you move frequency over 8000 Hz in half a sec, (22000 samples), it will be quantized into 8000/22 Hz chunks.
The buffered approach updates the buffer (up to present time, at least using time.perf_counter) for every touch moved, and within update. So over 0.5 sec you might get several hundred touch events, and updates every few msec.
https://gist.github.com/6ccd9ad8ba95c373ec7d76ceaf9061bc has some minor corrections, adds some diagnostic prinouts, and pulled all the ctypes garbage into a separate file.
So, even if you don't use the modular generator approach, you might consider using a custom generator (i.e subclass ToneGenerator and handle the logic within the buffer filling methods.
It is also theoretically possible to force the audiounit to ask for data more frequently. https://developer.apple.com/documentation/audiotoolbox/1534199-generic_audio_unit_properties/kaudiounitproperty_maximumframesperslice?language=objc
maxFrames=c_uint32(256) err = AudioUnitSetProperty(toneUnit, kAudioUnitProperty_MaximumFramesPerSlice, kAudioUnitScope_Global, 0, byref(maxFrames), sizeof(maxFrames));
however this didnt't work when i tried.
-
1024 as well for me. Please refer to audiounittest.py or audiounittest2.py because they are both your version :)
I agree with you about the 8000/22Hz computation but it’s only true if my assumption (the render() method filling the buffer way faster than the buffer’s duration) is true.
- if the render() method was taking 0.02s to fill a 0.02s buffer, then, as it accesses the sounds[touch_id] attribute at each iteration of the « for frame in range(numFrames) » loop, it should be accessing the correct frequency values in real time and at the right time and fill a 0.02s buffer with frequency values corresponding to a 0.02s time of modulation, so there shouldn’t occur quantizing, at least not more than in audiounittest2.py, knowing that touch_moved updates occurred at 110Hz even for the fastest moves in both codes, so the quantization should happen at a 110Hz rate (which is not noticeable and appears as smooth) (Even if you remove the computation in the update part of audiounittest2.py you won’t notice quantization)
- If on the contrary, the render() method takes 0.001s to fill a 0.02s buffer, then, even if it accesses the sounds[touch_id] attribute at each iteration of the « for frame in range(numFrames) » loop, it will be doing so during a 0.001s period, thus accessing frequency values corresponding to such a small period of time and using it to fill the whole 0.02s buffer. So it will be almost as if it only gets the very first frequency of the 0.02s and use it for the whole chunk resulting in a 1/0.02 = 50Hz quantization. Somehow this is noticeable on fast modulation (like 30fps vs 15fps in graphics when there is fast motion, I guess).
Sorry for the redundancy but I wanted to be more precise in what I meant.
Also, I do intend to have a prebuffered approach in the future, I am just studying the differences in order to learn and figure out more exactly how the render() thread works.