Download all the audio files from this page in a single ZIP archive (0.9MB).
This demo shows the high temporal resolution of the representation that allows to re-synthesize sounds with transients
much accurately than any other model with analysis based on overlapped frames and synthesis based on interpolated
partial parameters. The analyzed sound is a single note of an acoustic guitar. Such a sound consists of sinusoidal partials
of almost constant frequency and a significant amount of non-stationary noise, responsible for the initial pluck.
Please, observe that the baseline heterodyne analysis ("SM mockup") exhibits a significant pre-echo artifact due to the
long build-up time of the narrowband filters employed for analysis of individual partials. A very similar artifact may
be observed in the SMS model and LORIS model. This pre-echo is significantly suppressed in the full AM-FM model employing
the residual signal that conveys cancelling terms which are to certain extent common to all partials.
The addition of the synthetic noise (modeled using a 10-th order warped LPC)
compensates for the lack of the pluck noise in the deterministic synthesis, however it is not able to mask the pre-echo.
For a comparison, we show also the sounds reconstructed from the spectral modeling synthesis technique
(SMS) by X.Serra. We also show a comparison with the bandwidth-enhanced
sinusoidal model (LORIS) by K.Fitz which generates bandwidth-enhanced
partials modulated by random noise. For fair comparison, we performed a partial selection operation ("distill" command of the
LORIS software) that reduces the amount of partials and constraints them to overtones of the fundamental frequency.
Original sound (WAV file, 44.1kHz, 16bit, 137kB)
Reconstructed sound (WAV file, 44.1kHz, 16bit, 137kB) obtained from
synthesis based on F0 + Harmonic Envelope subsampled 1:500
(A baseline incoherent heterodyne analysis, acting as
a mock-up of a perfect harmonic sinusoidal model, without residual noise). Please, observe the pre-echo artifact.
Reconstructed sound (WAV file, 44.1kHz, 16bit, 137kB) obtained from synthesis based on instantaneous F0 + Harmonic Envelope subsampled 1:500 + prototype signal. NOTE: no residual noise is modeled in this example. Please, observe the pre-echo is suppressed.
Reconstructed sound (WAV file, 44.1kHz, 16bit, 137kB) obtained from synthesis based on F0 + Harmonic Envelope subsampled 1:500 + a noise residual modeled by 10-th order warped LPC (A mock-up of a harmonic sinusoidal + noise model)
Reconstructed sound (WAV file, 44.1kHz, 16bit, 137kB) obtained from synthesis based on instantaneous F0 + Harmonic Envelope subsampled 1:500 + prototype signal. As above, the noise residual is modeled by 10-th order WLPC.

Reconstructed sound (WAV file, 44.1kHz, 16bit, 140kB) obtained from the SMS technique with frame rate 88Hz (hop = 500 samples) and no residual noise. Observe the pre-echo artifact. Also, please note, that while this model is harmonic (in a sense that tracking is guided by the detected F0), it generates incoherent frequency trajectories that must be encoded individually. The chaotic frequency variations of high-order partials are caused by estimation problems at low SNR and tracking errors. These variations are not very audible in this particular example since high order partials are very weak.

Reconstructed sound (WAV file, 44.1kHz, 16bit, 140kB) obtained from the SMS (as above) + background noise modeled using 10-order WLPC. We used our own noise model in this example, since the traditional LPC-based model in the SMS software produced too much artifacts. Note that the pre-echo artifact is not masked by the noise.
Reconstructed sound (WAV file, 44.1kHz, 16bit, 138kB) obtained from the LORIS technique with bandwidth association region width of 100Hz and partials constrained to harmonic. The amount of noise is apparently over-estimated. This is the best result we could obtain from the LORIS software. If you know, how to obtain a better result for this sound, please let us know.