Chair of Multimedia Telecommunications and Microelectronics - Audio Research Group

AM/FM Modeling of Harmonic Sounds

Demonstration: violin arco

Download all the audio files from this page in a single ZIP archive (3.2MB).

A screechy and quite noisy violin sound (D5, played arco) is analyzed in this example. This kind of sound may not seem to be very challenging for SM or AMFM modeling, however, the difficulty here is in accurate frequency estimation of weak and noisy partials. For a low frame rate of fs/1000, the individual partial parameters may be considered as control signals which are sampled with insufficient density. This leads to audible pitch variations w.r.t. the original pitch. Nevertheless, the differences in this demo are quite subtle.

For a comparison, we show again the sounds reconstructed from the spectral modeling synthesis technique (SMS) by X.Serra. Please note, that while the SMS model is harmonic (in a sense that tracking is guided by the detected F0), it generates incoherent frequency trajectories that must be encoded individually. We also show a comparison with the bandwidth-enhanced sinusoidal model (LORIS) by K.Fitz which generates bandwidth-enhanced partials modulated by random noise. For fair comparison, we performed a partial selection operation ("distill" command of the LORIS software) that constraints the partials to harmonic multiples of the fundamental frequency. The background noise is modeled using a 10-th order warped LPC model.

Original sound (WAV file, 44.1kHz, 16bit, 455kB)

Reconstructed sound (WAV file, 44.1kHz, 16bit, 370kB) obtained from synthesis based on F0 + Harmonic Envelope subsampled 1:1000
(A phase incoherent, baseline heterodyne analysis, acting as a mock-up of a perfect harmonic sinusoidal model, without residual noise)

Reconstructed sound (WAV file, 44.1kHz, 16bit, 455kB) obtained from synthesis based on instantaneous F0 + Harmonic Envelope subsampled 1:1000 + prototype signal. NOTE: even though no residual background noise is modeled in this example, the resynthesized sound represents a fair amount of the mechanical noise, especially the attack noise.

Reconstructed sound (WAV file, 44.1kHz, 16bit, 455kB) obtained from synthesis based on F0 + Harmonic Envelope subsampled 1:1000 + a noise residual modeled by 10-th order WLPC.

Reconstructed sound (WAV file, 44.1kHz, 16bit, 455kB) obtained from synthesis based on instantaneous F0 + Harmonic Envelope subsampled 1:1000 + prototype signal. As above, the noise residual is modeled by 10-th order WLPC.


Reconstructed sound (WAV file, 44.1kHz, 16bit, 457kB) obtained from the SMS technique with frame rate 44Hz (hop = 1000 samples) and no residual noise. Note the pitch inaccuracy due to unreliable frequency estimation. Also note that the waveform is not preserved since the SMS technique discards the phase information.


Reconstructed sound (WAV file, 44.1kHz, 16bit, 457kB) obtained from the SMS technique (as above) + background noise modeled using 10-order WLPC. We used our own noise model in this example, since the traditional LPC-based model in the SMS software produced too much artifacts.

Reconstructed sound (WAV file, 44.1kHz, 16bit, 455kB) obtained from the LORIS technique bandwidth association region width of 100Hz and partials constrained to harmonic. Note that again, the estimated amount of noise is quite inappropriate.