HD Video Transcoding Strategies using Multicore Media Processors: Part 2 – Flexible Architecture

Delivering video across a variety of platforms involving multiple codecs can be efficiently handled by multicore media processors. Part Two explains the architectural requirements for flexible processing.

By Bahman Barazesh, Senior Technical Manager, and George Kustka, Senior Video Architect, LSI Corporation

Page 3 of 4
Video/Imaging DesignWire
(4/12/2010 8:30:02 AM)

Spatial Partitioning

Another approach is partitioning the picture into multiple Macro-Block groups (MB groups), or slices, and assigning each group of blocks to a DSP core. There are multiple dependencies between different MB groups due to the way that video-compression standards take advantage of spatial redundancy in the picture to achieve compression. These dependencies must be taken into account and tasks must be coordinated so that data from adjacent groups (generally up and left) are available to start processing the current MB group. One approach to minimize the interdependencies between MB groups is to define them as independent slices that can be coded as a single NALU.  We’ll refer to this approach hereinafter as the multiple-slice architecture. It lends itself well to a scalable multicore device. The multiple-slice approach benefits from support built into modern video standards to minimize dependencies between slices. A corollary benefit is that video decoders can do a better job of concealing artifacts caused by transmission errors and lost data packets. The multiple-slice architecture is suitable for a multiple-device architecture where several multicore devices are connected through a high-speed interconnection bus, such as an sRIO or a PCIe®.

In this architecture, the encoder implementation is split between several DSP cores with one or more slices assigned to a specific DSP core. One core can be assigned to provide certain centralized functions, such as rate control and scene change detection, in addition to other coding tasks.

Figure 3 shows one example of a spatial partitioning for a multiple-slice architecture encoder. The DSP cores receive raw video in YUV format from the sRIO interface connected to another processor, which may implement a decoder for a different standard. This architecture takes advantage of the sRIO’s flexibility to dynamically assign slices decoded in one multicore media processor to another multicore media processor for processing.

Figure 3: Multiple-Slice Processing for HD Encoder

View full size

The sRIO offers a high level of connection flexibility. Combining it with efficient DMA channels enables these operations:

  • Pipelining the transfer of data with video processing
  • Coordinating execution
  • Sharing data via shared memory (if the DSP cores are on the same device) or via the high-speed sRIO interface (if the DSP cores are on different devices)

Figure 4:High Throughput, Low-Latency Multicore Device Interconnect Example

View full size

Figure 4 depicts connectivity options for scaling the processing capability across multiple devices, which allow implementation of more complex video processing or an increase in the number of supported transcoding channels. Using an sRIO switch allows more flexibility for device-to-device communication, but it can be avoided if the processing flow remains within neighboring devices. sRIO switches compared to PCIe switches are typically lower in cost and latency, and higher in performance.
NEXT: Multicore Decoder Architecture

Page 3: next page

Pages: 1 2 3 4