The paper describes a scalable extension of the AVC coder. The assumption is to introduce possibly minor modifications of the bitstream semantics and syntax as well as to avoid as much as possible the technologies that are not present in the existing structure of the AVC codec. The coder combines spatial with temporal scalability. The coder consists of two motioncompensated sub-coders that encode a video sequence and produce two bitstreams corresponding to two different levels of spatial and temporal resolution. Each of the sub-coders has its own prediction loop with independent motion estimation. The system employs adaptive interpolation. The interpolation-dependent modes are carefully embedded into the mode hierarchy of the AVC coder thus obtaining the codes that correspond to the mode probabilities.