The paper presents a new idea for coding of audio signals whose short-time spectrum is sparse, i.e. the energy is concentrated within few narrowband peaks. Typical examples are solo instrument parts with distinct harmonics as well as singing voices, especially in high registers. Such signals are often degraded by standard perceptual codecs that assume quite different spectral energy distribution. Our approach uses spectral peak detection and tracking to follow the energy bands evolving in TF plane. Individual bands are isolated and shifted down in frequency so that they are well approximated by only few MDCT coefficients. Moreover, we interleave the transform coefficients in order to exploit their similarities across bands and to increase the coding efficiency. Experimental results show that for signals with sparse spectra a significant improvement of coding efficiency is achieved. This technique may be considered as a complementary coding tool for the MPEG-4 HE-AAC codec.