Chair of Multimedia Telecommunications and Microelectronics - Audio Research Group

SinMod - audio sinusoidal modeling toolbox for Matlab

About Sinusoidal Modeling

Sinusoidal modeling and hybrid sinusoidal + noise modeling (S+N) are well established signal processing frameworks applicable to speech and audio analysis, time and pitch scaling, enhancement, restoration, source separation, automatic recognition, watermarking, compression, and synthesis [1-11]. Within a S+N model, the signal is modeled as a sum of quasi-sinusoids with continuously varying magnitudes and frequencies (called the deterministic component), and a stochastic component (noise), whose short-time power spectra envelope changes over time. In order to achieve such representation, the input signal is split to overlapping segments called frames, its magnitude spectrum is computed in each frame, quasi-sinusoidal partials are detected as distinctive peaks, and corresponding parameters (frequency, amplitude and phase) are estimated. Subsequently, these peaks are linked between neighboring frames, by a tracking algorithm that takes into account various continuity criteria and forms sinusoidal trajectories. A reconstruction of signal from the trajectory data is possible by generating quasi-sinusoidal components with continuously variable parameters that are interpolated between the estimated and linked parameters. This reconstructed signal may by subtracted from the original signal in order to obtain a residual. The residual contains mostly non-sinusoidal part of the signal and is usually modeled as a non-stationary noise with varying intensity and spectral envelope. The spectral envelope may be modeled as an all-pole filter response, by the use of the LPC technique.


Original signal spectrogram Reconstructed signal spectrogram
(after pitch scaling of sinusoidal trajectories)

References

  1. T.F. Quatieri, R. J. McAulay, "Speech analysis/synthesis based on a sinusoidal representation", IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 34, no. 4, 1986.
  2. J.O.Smith, X.Serra, "PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation", Int. Computer Music Conference, 1987
  3. G. Peeters, X. Rodet, "SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum", Proc. International Computer Music Conference (ICMC), Beijing.
  4. K. Fitz, L. Haken, "Sinusoidal modeling and manipulation using Lemur", Computer Music Journal, 20:4, Winter 1996
  5. M. Macon, L. Jensen, J. Oliverio, M. Clements, E. George, "A Singing Voice Synthesis System Based on Sinusoidal Modeling", Proc. ICASSP'97, vol. 1, 1997
  6. T. Verma, T.H.Y. Meng, "Time Scale Modification Using a Sines + Transients + Noise Model", Proc. DAFx'98, Barcelona, 1998
  7. H. Purnhagen, B. Edler, Ch. Ferekidis, "Object-Based Analysis/Synthesis Audio Coder for Very Low Bit Rates", Paper 4747, Proc. 104th AES Convention, 1998
  8. T. Tolonen, "Methods for Separation of Harmonic Sound Sources Using Sinusoidal Modeling", Paper 4958, Proc. 106th AES Convention, May 1999
  9. L. Girin, S. Marchand, "Watermarking of speech signals using the sinusoidal model and frequency modulation of the partials", Proc. ICASSP'04, vol.1, 2004
  10. J. Jensen, J.H.L. Hansen, "Speech enhancement using a constrained iterative sinusoidal model", IEEE Trans. SAP, vol.9, no.7, 2001
  11. T.Heittola, A.Klapuri, T.Virtanen, "Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation", Proc. ISMIR 2009

About the toolbox

The sinusoidal modeling Matlab toolbox offered on this webpage is an open-source project intended strictly for research and educational purposes. The package is currently under development, and is offered as a pre-release with occasional bugs. We are open for any comments, bug reports and suggestions of further development.

The SinMod toolbox allows to perform various audio signal modeling tasks:

The most important features of this implementation are:

The toolbox features also an advanced multi-criterion and heavily parameterized offline tracking utility, that applies many different rules for obtaining possibly best sinusoidal tracking results. Accurate and reliable tracking is essential for obtaining a high-quality re-synthesized output.

Download

Download the toolbox package (124kB).
Download a set of important utilities (72kB).

The SinMod Toolbox requires MATLAB 7.6 (R2008a or later) and the Signal Processing toolbox.
In addition, to access all functions of the Toolbox, you will need the following external public domain toolboxes and utilities:

Should you encounter any problems with getting hold of these, do not hesitate to contact us.

How to use for plain analysis-transformation-synthesis

A complete sequence of audio signal analysis and synthesis is controlled by a script function

[synsig,model] = sinmod_ansyn(signal,config,pitch,speed)

The required arguments are: a vector of samples (signal), and a structure of configuration parameters (config). The remaining two optional arguments define simple transformations applied during synthesis: the pitch scaling ratio and time stretching ratio. The output data are resynthesized signal (synsig), and a structure (model) with all parameters and data allowing to re-synthesize the signal.

The configuration structure should have a strictly defined list of fields. To prepare such a structure, call

config = sinmod_conf(freq_res,time_res)

This will automatically set up the config structure for you, with parameter values defined inside sinmod_conf.m. You may edit sinmod_conf.m and modify these parameter values. This function has also two optional parameters, that will modify the default settings for analysis frame length and offset (hop). After calling sinmod_conf, you may also modify any parameter value from command line.

The call of sinmod_ansyn will perform a sequence of consecutive analysis-synthesis steps:

The optional output of sinmod_ansyn is a structure of model parameters, featuring the following fields:

data: [struct], which contains a set of full matrices with raw data from signal analysis:
f, a, ph, fm, am, cnf, g, f0

trj: [struct], which contains a set of sparse matrices with parameters arranged in sinusoidal trajectories:
f, a, ph, fm, am, cnf, z, g

lpc_coeffs: [double matrix] which contains LPC/WLPC parameters of noise residual

lpc_gain: [double vector] which contains noise energy in each frame

In the above list, matrices with sinusoidal data represent respectively:

f - frequencies (0..0.5)
a - amplitudes (in linear scale)
ph - phase (principal argument)
fm - frequency chirp rate (per sample)
am - amplitude slew rate (dB per sample)
cnf - likelihood measure of detection of sinusoid
g - group membership information (used in harmonic analysis of groups of partials)
z - binary flags of zombie nodes in trajectories

To display the data, you may just use the plot command, for example:

plot(model.data.f','.');      or
model.trj.f(model.trj.f==0)=NaN; plot(model.trj.f','.-');



For more advanced study of sinusoidal data, trajectory data, zombie nodes, etc, it is always advantageous to show them in the context of the signal spectrogram. For this purpose, you may use

sinmod_compare(signal,frame_length,frame_hop,data1,style1,data2,style2,...)

Sinmod_compare draws your data similarly to the plot command, on the background of the signal spectrogram. It is important to give a proper length and hop parameter, as used in the sinusoidal analysis, since these parameter are crucial for the alignment of data points to the centers of frames. The advantage of sinmod_compare over plot is also that it takes care about the zero-valued entries in trajectories which are not shown. Apart from typical drawing style descriptions (like 'k.', 'w.-', '*', etc), there is an additional line style defined by a 'o-x' string, that makes each trajectory appear as a line starting with o (birth) and ending with a x (death).

How to use for data compression

For data compression experiments you may use a dedicated script function that controls a cycle of data analysis-coding-decoding-resynthesis,

[bits,synsig,model] = sinmod_codec(signal,config)













PLEASE NOTE: THIS PAGE IS UNDER CONSTRUCTION