Chair of Multimedia Telecommunications and Microelectronics - Audio Research Group

AM/FM Modeling of Harmonic Sounds

Demonstration: mute horn

Download all the audio files from this page in a single ZIP archive (2.3MB).

A raspy, aggressive, growling sound of a mute horn is analyzed in this example. The challenge here is in the rapid variations of the tone that are probably caused by moisture which is condensed in the mouthpiece. These variations are rapid modulations apparent in the spectrogram (especially in the 4th note). A harmonically constrained sinusoidal model represents these varying partials through amplitude and frequency variations of sinusoidal tracks. Since high frame rate is necessary for such representation, there is an audible degradation if the trajectories are too much smoothed. At the same time, the AMFM model is able to re-synthesize this sound quite accurately with the help of the prototype signal.

Original sound (WAV file, 44.1kHz, 16bit, 370kB)

Reconstructed sound (WAV file, 44.1kHz, 16bit, 313kB) obtained from synthesis based on F0 + Harmonic Envelope subsampled 1:1000
(A phase incoherent, baseline heterodyne analysis, acting as a mock-up of a perfect harmonic sinusoidal model, without residual noise)

Reconstructed sound (WAV file, 44.1kHz, 16bit, 313kB) obtained from synthesis based on instantaneous F0 + Harmonic Envelope subsampled 1:1000 + prototype signal. NOTE: no residual noise is modeled in this example.

Reconstructed sound (WAV file, 44.1kHz, 16bit, 313kB) obtained from synthesis based on F0 + Harmonic Envelope subsampled 1:1000 + a noise residual modeled by 10-th order WLPC.

Reconstructed sound (WAV file, 44.1kHz, 16bit, 313kB) obtained from synthesis based on instantaneous F0 + Harmonic Envelope subsampled 1:1000 + prototype signal. As above, the noise residual is modeled by 10-th order WLPC.


Reconstructed sound (WAV file, 44.1kHz, 16bit, 314kB) obtained from the SMS technique with frame rate 44Hz (hop = 1000 samples) and no residual noise.


Reconstructed sound (WAV file, 44.1kHz, 16bit, 314kB) obtained from the SMS technique (as above) + background noise modeled using 10-order WLPC. We used our own noise model in this example, since the traditional LPC-based model in the SMS software produced too much artifacts.

Reconstructed sound (WAV file, 44.1kHz, 16bit, 312kB) obtained from the LORIS technique bandwidth association region width of 250Hz. The "distill" option of the LORIS software was apparently unsuccessful in its attempt at constraining the partials to harmonic overtones of the fundamental. Note also that the amount of noise has been over-estimated.