This dissertation proposes the new Motion Estimation Specific Instruction-set Processor, called MESIP, to implement low power and high performance Motion Estimation (ME) algorithms. ME is widely used to inter prediction based on temporal similarity in various multimedia codecs, such as, MPEG-2/4, H.263, H.264/AVC, HECV, etc., and thus, the solution of key block for various multimedia codecs has been proposed.
The proposed MESIP has two major advantages compared with existing ME processors. First, MESIP can handle multiple candidate points with a single specific Sum of Absolute Difference (SAD) instructions. The existing SAD instructions only calculate the SAD result for a single candidate point. Hence, the number of required SAD instructions is proportional to the complex of search pattern. This is a weak point compared with ME Application Specific Integrated Circuit (ASIC) architectures because each individual SAD instructions require extra setting up operations. MESIP can show the comparable performance with ME ASICs by using the proposed new SAD instructions. Second, MESIP can support the proposed new search scan orders to improve the data reusability. Smart snake scan method with Reconfigurable Register Array (RRA) shows the best data reusability for Full Search (FS) algorithm. But the size of RRA is an obstacle for implement. The simplified snake scan with Optimized Sub-region Partition (OSP) method can reduce the size of RRA with the same data reusability of smart snake scan. Fast search algorithms require different scan order from that of FS since the search is only performed at the selected candidate points. The proposed Center Biased Search Scan (CBSS) order offers the efficient RRA update strategy and reduces the redundant data loading compared with existing search scan orders such as raster and snake scan.
In addition, MESIP has efficient program control schemes such as dynamic pipeline control and Hardware (HW) loop acceleration for complicated ME algorithms. Existing ME processors focus on the efficient parallel operation architectures. To efficiently support complex ME algorithms, it is necessary to investigate not only the parallel operations but also the program control schemes. The proposed dynamic pipeline control scheme can reduce the pipeline stall caused by HW accelerators. Four loop specific instructions and their specific architecture can support efficient HW loop operations. Specially, the early-termination conditions can be implemented by using the single specific instruction.
To implement these features of MESIP, flexible and reconfigurable Processing Element (PE) architecture and data arrangement schemes are also proposed. The flexible and reconfigurable PE architecture can be shifted the reference pixel data to the left, right, up, and down in the PE array. Specially, the left shift amount can be 4, 2, or 1 pixels in one clock cycle. Moreover, through the special data path for data reversing, the proposed PE architecture supports not only the left side search but also the right side search with the same architecture. The proposed data arrangement scheme can handle the increasing data bandwidth for new PE architecture. At the same time, the address calculation also can be simplified by using the proposed data arrangement scheme. The implemented MESIP architecture using the IBM 90nm library consists of 192k gates. At a clock frequency of 200MHz, MESIP achieves real-time 1920 x 1080 ME at 30 frames/s. The simulation results show that the proposed MESIP can reduce the number of required instructions by up to 18.9% compared with existing ME processors. Moreover, MESIP can show the comparable performance in terms of size, processing ability, and power consumption with ME ASICs. Hence, MESIP is quite suitable for low power and high performance programmable ME implementation.