low power h.264 architectures for mobiles

Upload: vivek-guntumudugu

Post on 02-Jun-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 low power h.264 architectures for mobiles

    1/79

  • 8/10/2019 low power h.264 architectures for mobiles

    2/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    reduce the computational load by allowing us to disable the hardware that processes the

    truncated bits. >hile previous studies focused on fi!ed-bloc5 si6e % 1# 8 # pi!els2, very

    little wor5 has been done to study the effect of pi!el truncation for smaller bloc5 si6es. The

    latest %&'-( standard, %&'-( )V*+H.(, allows variable bloc5 si6e for motion

    estimation 1V7:%2. ;efines #8#, # 8 B, B 8 #, B 8 B, B 8 (, ( 8 B, and ( 8 ( bloc5

    si6es. )t smaller bloc5 partitions, a better prediction is achieved for objects with comple!

    motion. "$

    Truncating pi!els at a #8# bloc5 si6e results in acceptable performance as shown in the

    literature. However, at smaller bloc5 si6es, the number of pi!els involved during motion

    prediction is reduced. ;ue to the truncation error, there is a tendency for smaller bloc5s to

    yield matched candidates, which could lead to the wrong motion vector. Thus, truncating

    pi!els using smaller bloc5s results in poor prediction.

    >e have implemented a low-power algorithm and architecture for % using pi!el truncation

    for smaller bloc5 si6es. The search is performed in two steps< #2 truncation mode and 2

    refinement mode. This method reduces the computational cost and memory access without

    significantly degrading the prediction accuracy. In this project, we perform an in-depth

    analysis of this techniAue and e!tend the techniAue to a complete H.( system. "#$

    1.1.Introdut!on "#out Co$%r&''!on(

    *ompression is the process of reducing the si6e of the data sent, thereby, reducing the

    bandwidth reAuired for the digital representation of a signal. %any ine!pensive video and

    audio applications are made possible by the compression of signals. *ompression technology

    can result in reduced transmission time due to less data being transmitted. It also decreases

    the storage reAuirements because there is less data. However, signal Auality, implementation

    comple!ity, and the introduction of communication delay are potential negative factors that

    should be considered when choosing compression technology."$

    Video and audio signals can be compressed because of the spatial, spectral, and

    temporal correlation inherent in these signals. :patial correlation is the correlation between

    neighboring samples in an image frame. Temporal refers to correlation between samples in

  • 8/10/2019 low power h.264 architectures for mobiles

    3/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    different frames but in the same pi!el position. :pectral correlation is the correlation between

    samples of the same source from multiple sensors.

    1.2. Pr"t!") I$%ort"n& o* I$"+& o$%r&''!on(

    Typical values of image or video without any compression would be in the range of

    some 'iga 7ytes 1of some #.C/hrs video2. To store this much data, it will not be practical and

    of course, it is not possible for transmissions also 1as it reAuires high bandwidth rates2. :o, in

    order to store or transmit the data, we need to compress the data, so as to meet the

    reAuirements."#$

    fig

  • 8/10/2019 low power h.264 architectures for mobiles

    4/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    The latest video compression standard %&'-( )V*+H.( gives /0 improvement

    in compression efficiency compared to previous standard. However, the coding gain comes at

    the e!pense of increased computational comple!ity at the encoder. %otion estimation 1%2

    has been

    Identified as the main source of power consumption in video encoders. It consumes /34/0

    of the total power used in video compression. The introduction of variable bloc5 si6e

    partitions and multiple reference frames in the standard result in increased of computational

    load and memory bandwidth during motion prediction."$

    7loc5-based %otion stimation 1%2 has been widely adopted by the industry due to

    its simplicity and ease of implementation. ach frame is partitioned into # 8 # pi!els,

    5nown as macro bloc5s 1%7s2.

    9ull-search % predicts the current %7 by finding the candidate that gives the

    minimum sum of absolute difference 1:);2, as followshere * 15, l2is the current %7, and ? 1i @ 5, j @ l2is the candidate %7 located in

    the search window within the previously encoded frame. 9rom above eAuation, the power

    consumption in % is affected by the number of candidates and the total computation to

    calculate the matching cost. Thus, the power can be reduced by minimi6ing these parameters .

    "#$

    Pixel truncation 1usually, a pi!el will be represented with B-bits, here, instead of using

    B-bits for pi!el representation, we use less number of pi!els2 can be used to reduce the

    computational load by allowing us to disable the hardware that processes the truncated bits.

    >hile previous studies focused on fi!ed-bloc5 si6e % 1# 8 # pi!els2, very little wor5 has

    been done to study the effect of pi!el truncation for smaller bloc5 si6es. The latest %&'-(

    standard, %&'-( )V*+H.(, allows variable bloc5 si6e for motion estimation 1V7:%2.

    )t smaller bloc5 partitions, a better prediction is achieved for objects with comple! motion.

  • 8/10/2019 low power h.264 architectures for mobiles

    5/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    Truncating pi!els at a #8# bloc5 si6e results in acceptable performance. However,

    at smaller bloc5 si6es, the number of pi!els involved during motion prediction is reduced.

    ;ue to the truncation error, there is a tendency for smaller bloc5s to yield matched

    candidates, which could lead to the wrong motion vector. Thus, truncating pi!els using

    smaller bloc5s results in poor prediction. "$

    The search is performed in two stepse now define a cost function based on the %ean )bsolute rror

    1%)2 or %ean )bsolute ;ifference 1%);2 or :um of )bsolute ;ifferences 1:);2. The

    matching bloc5 will be ?1!@i,y@j2 for whichM,is minimi6ed, henceforth i, j defines the

    motion vector."($

    It is important to notice that the %) is not the only option for matching criteria, as

    one can use %ean :Auare rror and other e!pressions as well. %) is, however, often

    selected due to its computational simplicity. It is also worth mentioning that when the

    minimum %) has a value higher than a given threshold the bloc5 is said Lnot foundM. That

    means that the motion estimation failed and that bloc5 is to be encoded without e!ploiting

    temporal redundancy with regard to the ? reference frame. >e can observe this phenomena

    when there is a scene change or a new object is inserted in the scene between the reference

    frame and the current frame, or yet if motion goes beyond the search area "*p# p$."($

    %otion estimation doesnt have to be performed in a single reference frame. In some

    cases bi-directional motion estimation is performed. That means that other than loo5ing for a

    macro bloc5 in a previous reference frame ?, the bloc5 is also sought for in a reference frame

    9 in the future. That is very useful for the case where a new object is inserted into the scene,as itll be found in a future reference frame 9 even tough not being present in a previous

  • 8/10/2019 low power h.264 architectures for mobiles

    11/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    reference frame ?. %ore generally, multiple frames in the past and future can be used by a

    motion estimator, but more often we either use a single frame in the past or one in the past

    and one in the future, as described herein."($

    )s we have seen, the temporal prediction techniAue used in %&' video is based on

    motion estimation. The basic premise of motion estimation is that in most cases, consecutive

    video frames will be similar e!cept for changes induced by objects moving within the frames.

    In the trivial case of 6ero motion between frames 1and no other differences caused by noise,

    etc.2, it is easy for the encoder to efficiently predict the current frame as a duplicate of the

    prediction frame. >hen this is done, the only information necessary to transmit to the

    decoder becomes the syntactic overhead necessary to reconstruct the picture from the original

    reference frame. >hen there is motion in the images, the situation is not as simple.

    4.4. A)+or!t/$ *or Mot!on E't!$"t!on(

    The motion estimation is the most computational intensive procedure for standard

    video compression. To see5 a match for a macro bloc5 in a " -p, p$ search region leads to 1 p

    @ #2 search locations, each of which reAuires M & Npi!els to be compared, where %, E

    give the si6e of the source macro bloc5. *onsideringFthe number of reference frames being

    considered in the matching process, such procedure reaches a total of 1p@ #2 8MN8F

    e!ecutions of the

    %ean )bsolute rror e!pression< O *1! @ 5, y @ l2 3 ?1! @ i @ 5, y @ j @ l2 O,

    >hich involves C operations per pi!el. *onsidering a standard #8# macro bloc5

    withp=# andF-lin a V')-si6ed video stream 1(/8(B/ pi!els2 we have (/8C/ = #//

    macro bloc5s per frame. That leads to a grand total of B.(#% operations for a single frame.

    In a C/-fps video stream that would entail C.CC' operations per second for motion

    estimation alone. If bi-directional motion estimation is in place wed reach (./('

    operations per second. "C$

    The scheme described herein is refereed to as exhaustive search'If a video needs not

    to be processed in real time it may be the best option as the e!haustive search ensures that

    well find the best match for every macro bloc5. In some cases, however, we wont be able to

  • 8/10/2019 low power h.264 architectures for mobiles

    12/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    afford an e!haustive search. :ome algorithms have been developed aiming at finding a sub-

    optimal match in much less time than the e!haustive search.

    The temporal prediction techniAue used in %&' video is based on motion

    estimation. The basic premise of motion estimation is that in most cases, consecutive video

    frames will be similar e!cept for changes induced by objects moving within the frames. In the

    trivial case of 6ero motion between frames 1and no other differences caused by noise, etc.2, it

    is easy for the encoder to efficiently predict the current frame as a duplicate of the prediction

    frame. >hen this is done, the only information necessary to transmit to the decoder becomes

    the syntactic overhead necessary to reconstruct the picture from the original reference frame.

    >hen there is motion in the images, the situation is not as simple."($

    :hows an e!ample of a frame with stic5 figures and a tree. The second half of this

    figure is an e!ample of a possible ne!t frame, where panning has resulted in the tree moving

    down and to the right, and the figures have moved farther to the right because of their own

    movement outside of the panning. The problem for motion estimation to solve is how to

    adeAuately represent the changes, or differences, between these two video frames.

    4.7. Mot!on E't!$"t!on E8"$%)&? 1%w2

    %odulesme-sad me-dpc

    )rea 'ates &ower )rea 'ates &ower

    & /.4/ #C## B. /.C #B .C#)dder tree /.#C / .C /./( # /.44

    *omparator Gnit /.## ##4 #. /./4 #C# /.B

    ;ecision Gnit /.#/ #44/ /.( /./ #C/C /./

    Total #.( C4#4B C.// /. #//C/4 (.

    The table shown that meRsads area is dominated by & 1C02. Thus, with the

    significantly smaller area for &, the meRdpc will reAuire less area than the meRsad. The

    overall meRdpc reAuires (0 of the meRsad area.

    7ased on the above analysis, we propose two types of architectures for the %

    computation unit that can perform low-resolution and full-resolution searches. These are

    meRsplit and meRcombine as shown in fig 1C1a2 and 1b22.

    %eRsplit implememts meRsad and meRdpc as two seaparte modules as shown in fig

    C1a2. during low resolution search, meRsad is switched off, while the meRdpc is used to

    perform the search. The second step usesthe meRsad, while the meRdpc is switched off. This

    architecture allows only the necessary bit si6e to be used during different search modes.

  • 8/10/2019 low power h.264 architectures for mobiles

    20/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    >hile potential power savings is possible, this architecture reAuires additional area for the

    adder tree, comparator and decision unit to support the low-resolution search.

    o$%ut"t!on") un!t *!+ @0 @" $&'%)!t

    Co$%ut"t!on") un!t *!+@0 @# $&o$#!n&d

    ;ue to the functions of the adder tree, the comparator and the decision units are

    similar for both meRsad and meRdpc, and meRcombined shares these units during low

  • 8/10/2019 low power h.264 architectures for mobiles

    21/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    resolution search and full pi!el resolution is shown in fig 1C21b2. this architecture results in a

    much smaller area compared to meRsplit. However higher power consumption is e!pected

    during the low-resolution search because the adder tree, comparator and decision unit operate

    at higher bit si6e than needed."$

    6.1.1.M&$or? "r/!t&tur&@S&"r/ Ar&" M&$or? Or+"n!3"t!on

    *onventional % architecture implements the :) memory using single post static

    random access memoru 1:?)%2 with one pi!el 1B-bit2 per word. To implement the two-step

    search, we need to access the first two %:7s for each pi!el during the first search and B-bits

    in the second stage. Thus, the pi!els need to be stored to allow two reading modes. 9or this,three types of memory architectures are proposed. These are 1a2 B-bit memory 1memB2, 1b2 -

    bit and B-bit memoty 1memB2, and 1c2 B-bit memoty with prearranged data and transposed

    register 1memBpre2 as shown in fig 1(2."C$

    F!+ @6.1.:) memory arrangement 1a2 memB 1b2 memB2 and 1c2 memBpre

    %emB stores the data in the same way as in the conventional %. >e access B-bit

    data during both low-resolution and refinement stage. However, during the low resolution

    search, the lower si! bits are not used by the &. 7ecause the memory is accessed during the

  • 8/10/2019 low power h.264 architectures for mobiles

    22/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    low-resolution and the refinement stage, it results in higher memory bandwidth than the

    conventional % architecture.

    To overcome the problem in memB, memB uses two types of memory< -bit and B-

    bit. The - bit memory stores the first two %:7s of each datum, and the B-bit memory stores

    the complete full pi!el bitwidth. ;uring the low-resolution search the data from the -bit

    memory are accessed. This allows only the reAuired bits to be accessed without wasting any

    power during low-resolution. In the refinement stage, the B-bit memory is read into the &s.

    )lthough this architecture potentially reduce the memory bandwidth and power consumption,

    it needs an additional area for the -bit memory."C$

    In memBpre, the data is prearranged before storing them in B-bit memory. 9our pi!els

    are grouped together, and then transposed according to their bit position, as shown in fig 12.

    F!+ @6.2.storing B-bit pi!el in B-bit memory< 1a2 conventional arrangement and 1b2 memBpre.

    ;uring the low-resolution search, we read only the memory locations that stores the

    first two %:7s of the original pi!els. Thus the total memory is accessed during the low-

    resolution is one-fourth of the conventional full pi!el access."C$

    In full resolution search, we read four memory locations that contain the first up to

    eighth bits in four cloc5 cycles. ;elay buffers as shown in fig 1(2realigns these words to

    match the original B-bit pi!el. 7y prearranging the pi!els this way, we can use the same

    memory si6e as in the conventional full search while retaining the ability to access the first

    two %:7s, as well as the full bit resolution. The drawbac5 of this approach is that it needs

    additioanlal circuity to transpose and realign the pi!els during the motion prediction. The

    estimated bandwidth for the above three memory architectures are shown in table 12.

  • 8/10/2019 low power h.264 architectures for mobiles

    23/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    TABLE @2

    MEMOR BANDWIDTH FOR DIFFERENT ARCHITECTURES

    Lo-r&'o)ut!on H!+/-r&'o)ut!on

    %emB E>H S B-7ITE

    (

    ./! B-bit

    %emB E>H S -7ITE

    (

    ./! B-bit

    %emBpre E>H S -7ITE

    (

    ./! B-bit

    The memories for the search area data consume non-trivial hardware cost and power

    dissipation. 9or instance, the memory modules of the -; design occupy almost /0 die

    area. In fact, the memory organi6ation is one critical issue for the system design, especially in

    H.( which applies multiple reference frames. In this section, one memory mapping

    algorithm is proposed to reduce the memory partition number in our architecture.

    *onseAuently, the hardware cost and the power consumption are both optimi6ed.

  • 8/10/2019 low power h.264 architectures for mobiles

    24/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    fig

  • 8/10/2019 low power h.264 architectures for mobiles

    25/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    9or e!ample, he search width is (B and the search height is C. :o, the search areais

    C8( pi!els. Fne pi!el is e!tended in both vertical and hori6ontal directions. Thus, the

    memory capacity for the search area is (8(B pi!els. >hen # &'s are configured, the

    memory is divided into ( logic partitions, which are labeled as LD/M, LD#M, LDM and LDCM, as

    illustrated in 9ig. 1a2. ach solid line rectangle represents one #-pi!el wide and (B-pi!el

    high logic partition. The % processing includes three stages. In the first stage, the area

    covered by slash pattern is searched, so LD/M and LD#M are active. In the second stage, the sub

    search area is moved #-pi!el in hori6on. The rectangle with bac5slash pattern includes these

    searching candidates. LD#M and LDM are active in this stage. In the last phase, LDM and LDCM

    are used and the rectangle filled with dot represents this sub search area. The intuitive

    approach is implementing these logic partitions with four memory modules. ach module is

    (B>8#Bb. 7ut this method causes low memory IF utili6ation, which is just (B0. >hen the

    search width is increased, the memory utili6ation becomes even worse."C$

    1a2 Dogic %emory &artitions 1b2 &hysical %emory &artitions.

  • 8/10/2019 low power h.264 architectures for mobiles

    26/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    Fig 6.416-PEG Design Layout

    7ased on the proposed mapping algorithm, just two memory modules are reAuired for

    the search area data storage, as shown in 9ig. 1b2. ach memory module is #-pi!el wide

    and 4-pi!el high. D and D/ are stac5ed up and mapped to L%/M. DC and D# are mapped to

    L%#M in the same way. The output pi!els dispatched to &'s come from the outputs of L%/M

    and L%#M. In the first search stage, the read pointers of L%/M and L%#M both begin from row

    /. The # most significant pi!els 1%:&2 of L?9M come from L%/RFM and the rest # least

    significant pi!els 1D:&2 come from L%#RF"#

  • 8/10/2019 low power h.264 architectures for mobiles

    27/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    6.1.2. Pro&''!n+ &)&$&nt(

    ) generic term used to reference a hardware element that e!ecutes a stream of

    instructions. The conte!t defines what unit of hardware is considered a processing element1e.g., core, processor, and computer2. *onsider a cluster of :%& wor5stations. In some

    programming environments, each wor5station is viewed as e!ecuting a single instruction

    streamN in this case, a processing element is a wor5station. ) different programming

    environment running on the same hardware, however, may view each processor or core of the

    individual wor5stations as e!ecuting an individual instruction streamN in this case, the

    processing element is the processor or core rather than the wor5station."#$

    To integrate more general-purpose &s and to ma5e a practical vision chip which can

    be used for real systems, we have developed a new and much simpler architecture called

    :C& 1:imple and :mart :ensory &rocessing lements2 . >e introduce details of the

    architecture below.

    The bloc5 diagram of the whole chip is shown in below 9ig.. ach & is directly

    connected to a photo-detector, an output circuit, and its four neighboring &s. The input

    Image signals are )+;-converted and transmitted in parallel to all &s. The instruction codes

    from the e!ternal pins are transmitted to all the &s and processed simultaneously 1:I%;

    type processing2. The resulting data are transmitted to the output circuit and the feature

    Auantities are e!tracted and transmitted to the e!ternal pins."#$

  • 8/10/2019 low power h.264 architectures for mobiles

    28/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    Strutur& o* t/& PE

    The bloc5 diagram of the & is shown in 9ig.. ach & consists of an )DG, local

    memory and three registers. The )DG ta5es charge of the calculation and the memory data

    recording and I+F. Two registers, called ) register and 7 register, read data from the memory

    and the )DG performs an operation on the data. Then the result is once fetched by the

    register and is written into the memory again. This process is defined as a single cycle and by

    performing several cycles you can process various 5inds of algorithms."#$

    The bloc5 diagram of the )DG is shown in 9ig.C. The )DG can process one of #/

    logical and B arithmetic operations at one time. They are all binary operations and multi-bits

    operations are processed by repeating single operations serially.

    The local memory has -bit address space and consists of a (-bit ?)%, and an B-bit

    memory-mapped I+F connected to a sensor, an output circuit, four-neighboring &s and

    ground. ach bit can be randomly accessed. The address map is shown in Table#."#$

  • 8/10/2019 low power h.264 architectures for mobiles

    29/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    The function of the )DG and the si6e of the memory are proved to be enough for most

    early visual processing algorithms which are often used in visual applications.

  • 8/10/2019 low power h.264 architectures for mobiles

    30/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    The sum of absolute differences may be used for a variety of purposes, such as object

    recognition, the generation of disparity maps for stereo images, and motion estimation for

    video compression.

    )n algorithm used for video compression that adds up the absolute differences

    between corresponding elements in the macro bloc5s of video frames. >hen coding or

    compressing video, the similarities between video frames could be used to the best advantage

    to achieve better compression ratios. Gsing the usual coding techniAues on moving objects

    within a video scene diminish the compression efficiency because they only consider the

    pi!els located at the same position in the video frames. %otion estimation and the :);

    algorithm are used to capture such movements more accurately for better compression

    efficiency."C$

    6.1.0. Add&r tr&&(

    Fne that adds, especially a computational device that performs arithmetic addition. a

    small machine that is used for mathematical calculations.

    C!ru!t D&'r!%t!on(

    This circuit counts the number of active 1#2 bits in the input word 1also 5nown as the

    Hamming weight2. Fbviously, a single full-adder can be used to calculate the Hamming

    weight of a three bit input word. 9or larger input word width, a tree of adders can be used to

    combine the intermediate sums from the previous stages.

    *lic5 the input switches or type the W/W..W4W and WaW, WbW bind 5eys to play with the circuit.

    :ee the ne!t applet for an animated demonstration of the Hamming weight adder-tree.

    To 5eep the circuit layout readable, the carry-in inputs of the multi-bit adders are tied

    to 'E; 1logical /2 in the e!ample. Eaturally, these carry-in inputs could be connected to

    additional input pins, which would allow for #-bits of input with the same number of adders.

    )s # is also the highest number that one three-bit adder can generate, a #-bit bit counter

    would obviously reAuire another adder, etc.

    http://en.wikipedia.org/wiki/Macroblockhttp://en.wikipedia.org/wiki/Absolute_differencehttp://en.wikipedia.org/wiki/Pixelhttp://en.wikipedia.org/wiki/Lp_spacehttp://en.wikipedia.org/wiki/Macroblockhttp://en.wikipedia.org/wiki/Absolute_differencehttp://en.wikipedia.org/wiki/Pixelhttp://en.wikipedia.org/wiki/Lp_space
  • 8/10/2019 low power h.264 architectures for mobiles

    31/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    6.1.4. o$%"r"tor un!t(

    D&*!n!t!on' o* o$%"r"tor(

    In electronics, a comparator is a device which compares two voltages or currents and

    switches its output to indicate which is larger.

    )ny device for comparing a physical property of two objects, or an object with a

    standardN )n electronic device that compares two voltages, currents or streams of

    data.

    ) machine used for loo5ing for paralla! motion, proper motion, asteroids, or variable

    stars by Auic5ly alternating between viewing two photographic plates from twodifferent times.

    http://tams-www.informatik.uni-hamburg.de/applets/hades/webdemos/20-arithmetic/80-bitcount/bitcount-animation.htmlhttp://en.wikipedia.org/wiki/Object_recognition_(computer_vision)http://en.wikipedia.org/wiki/Object_recognition_(computer_vision)http://en.wikipedia.org/wiki/Binocular_disparityhttp://en.wikipedia.org/wiki/Computer_stereo_visionhttp://en.wikipedia.org/wiki/Motion_estimationhttp://en.wikipedia.org/wiki/Video_compressionhttp://tams-www.informatik.uni-hamburg.de/applets/hades/webdemos/20-arithmetic/80-bitcount/bitcount-animation.html
  • 8/10/2019 low power h.264 architectures for mobiles

    32/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    O%-"$% >o)t"+& o$%"r"tor(

    ) simple op-amp comparator

    )n operational amplifier1op-amp2 has a well balanced difference input and a very

    high gain. The parallel in the characteristics allows the op-amps to serve as comparators in

    some functions.

    ) standard op-amp operating in open loop configuration 1without negative feedbac52 can be

    used as a comparator. >hen the non-inverting input 1V@2 is at a higher voltage than the

    inverting input 1V-2, the high gain of the op-amp causes it to output the most positive voltage

    it can. >hen the non-inverting input 1V@2 drops below the inverting input 1V-2, the op-amp

    outputs the most negative voltage it can. :ince the output voltage is limited by the supply

    voltage, for an op-amp that uses a balanced, split supply, 1powered by X V :2 this action can be

    writtenhen the voltages are nearly

    eAual, the output voltage will not fall into one of the logic levels, thus analog signals willenter the digital domain with unpredictable results. To ma5e this range as small as possible,

    the amplifier cascade is high gain. The circuit consists of mainly 7ipolar transistorse!cept

    perhaps in the beginning stage which will li5ely be field effect transistors. 9or very high

    freAuencies, the input impedanceof the stages is low. This reduces the saturation of the slow,

    large &-E junctionbipolar transistors that would otherwise lead to long recovery times. 9ast

    small :chott5y diodes, li5e those found in binary logic designs, improve the performance

    significantly though the performance still lags that of circuits with amplifiers using analog

    signals. :lew rate has no meaning for these devices. 9or applications in flash );*s the

    http://en.wikipedia.org/wiki/Slew_ratehttp://en.wikipedia.org/wiki/Propagation_delayhttp://en.wikipedia.org/wiki/File:Op-Amp_Comparator.svghttp://en.wikipedia.org/wiki/Operational_amplifierhttp://en.wikipedia.org/wiki/Operational_amplifierhttp://en.wikipedia.org/wiki/Slew_ratehttp://en.wikipedia.org/wiki/Propagation_delay
  • 8/10/2019 low power h.264 architectures for mobiles

    34/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    distributed signal across B ports matches the voltage and current gain after each amplifier, and

    resistors then behave as level-shifters.

    The D%CC4 accomplishes this with an open collectoroutput. >hen the inverting

    input is at a higher voltage than the non inverting input, the output of the comparator

    connects to the negative power supply. >hen the non inverting input is higher than the

    inverting input, the output is WfloatingW 1has a very high impedanceto ground2.

    Out%ut t?%&(

    fighen output currents are light, output

    voltages of *%F: rail-to-rail comparators, which rely on a saturated %F:9T, rangecloser

    to the rails than their bipolar counterparts.

    Fn the basis of outputs, comparators can also be classified as open drain or push3pull.

    *omparators with an open-drain output stage use an e!ternal pull up resistor to a positive

    supply that defines the logic high level. Fpen drain comparators are more suitable for mi!ed-

    voltage system design. :ince the output is high impedance for logic level high, open draincomparators can also be used to connect multiple comparators on to a single bus. &ush pull

    http://en.wikipedia.org/wiki/File:Comparators_stuuf.jpghttp://en.wikipedia.org/wiki/File:Comparators_stuuf.jpghttp://en.wikipedia.org/wiki/File:Comparators_stuuf.jpghttp://en.wikipedia.org/wiki/Transistor-transistor_logichttp://en.wikipedia.org/wiki/CMOShttp://en.wikipedia.org/wiki/Bipolar_transistorhttp://en.wikipedia.org/wiki/P-n_junctionhttp://en.wikipedia.org/wiki/Schottky_diodehttp://en.wikipedia.org/wiki/Flash_ADChttp://en.wikipedia.org/wiki/File:Comparators_stuuf.jpghttp://en.wikipedia.org/wiki/Hysteresishttp://en.wikipedia.org/wiki/Transistor-transistor_logichttp://en.wikipedia.org/wiki/CMOShttp://en.wikipedia.org/wiki/Analog_to_digital_converterhttp://en.wikipedia.org/wiki/Bipolar_transistorhttp://en.wikipedia.org/wiki/Field_effect_transistorhttp://en.wikipedia.org/wiki/Impedancehttp://en.wikipedia.org/wiki/P-n_junctionhttp://en.wikipedia.org/wiki/Schottky_diodehttp://en.wikipedia.org/wiki/Flash_ADC
  • 8/10/2019 low power h.264 architectures for mobiles

    35/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    output does not need a pull up resistor and can also source current unli5e an open drain

    output.

    In%ut >o)t"+& r"n+&(

    The input voltages must stay the limits specified by the manufacturer. arly integrated

    comparators, li5e the D%### family, and certain high-speed comparators li5e the D%##4

    family, reAuire input voltage ranges substantially lower than the power supply voltages 1X#

    V vs. CV2."#$Rail*to*railcomparators allow any input voltages within the power supply

    range. >hen powered from a bipolar 1dual rail2 supply,

    V: - ZV@,V- Z V: @

    or, when powered from a uni-polar TTD+*%F:power supplyhen a video decoder restores a video by decoding the bit stream frame by frame,

    decoding must always start with an I-frame. &-frames and 7-frames, if used, must be decoded

    together with the reference frame1s2.

    In the H.( baseline profile, only I-and &-frames are used. This profile is ideal for

    networ5 cameras and video encoders since low latency is achieved because 7-frames are not

    used.

  • 8/10/2019 low power h.264 architectures for mobiles

    47/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    F!+ur&-4. >ith difference coding 1used in most video compression standards including

    H.(2, only the first image 1I-frame2 is coded in its entirely. In the two following images 1&-

    frames2, references are made to the first picture for the static elements i.e., the house and the

    moving parts, i.e., the running man, is coded using motion vectors, thus reducing the amount

    of information that is sent and stored.

    The amount of encoding can be further reduced if detection and encoding of

    differences is based on bloc5s of pi!els 1macro bloc5s2 rather than individual pi!elsN

    therefore, bigger area are compared and only bloc5s that are significantly different are

    decoded. The overhead associated with indicating the location of areas to be changed is also

    reduced.

    ;ifference coding, however, would not significantly reduce data if there was a lot of

    motion in a video. Here, techniAues such as bloc5 based motion compensation can be used.

    7loc5 based motion compensation ta5es into account that much of what ma5es up a newframe in a video seAuence can be found in an earlier frame, but perhaps in a difference

    location. This techniAue divides a frame in to a series of macro bloc5s. 7loc5 by bloc5, a new

    frame-for instance, a &-frame can be composed or predicted by loo5ing for a matching bloc5

    in a reference frame. If a match is found, the encoder simply codes the position where the

    matching bloc5 is to be found in the reference frame. *oding the motion vector, as it is

    called, ta5es up fewer bits than if the actual content of a bloc5 were to be coded.

  • 8/10/2019 low power h.264 architectures for mobiles

    48/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    F!+ur&-

  • 8/10/2019 low power h.264 architectures for mobiles

    49/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    The ;igital Video 7roadcast project 1;V72 approved the use of H.(+)V* for

    broadcast television in late //(.

    The )dvanced Television :ystems *ommittee 1)T:*2 standards body in the Gnited

    :tates approved the use of H.(+)V* for broadcast television in Quly //B, although the

    standard is not yet used for fi!ed )T:* broadcasts within the Gnited :tates. It has also been

    approved for use with the more recent )T:*-%+H 1%obile+Handheld2 standard, using the

    )V* and :V* portions of H.(.

    )V*H; is a high-definition recording format designed by :ony and &anasonic that uses

    H.( 1conforming to H.( while adding additional application-specific features and

    constraints2.

    )V*-Intra is an intraframe-only compression format, developed by &anasonic.

    The **TV 1*lose *ircuit TV2 or Video :urveillance mar5et has included the technology

    in many products. &rior to this technology, the compression formats used within the industryWs

    ;V?s ;igital Video ?ecorders were generally low Aualities in compression capability. >ith

    the application of the H.( compression technology to the video surveillance industry, the

    Auality of the video recordings became substantially improved. :tarting in //B, some in the

    surveillance industry promoted the H.( technology as synonymous with Phigh AualityP

    video.

  • 8/10/2019 low power h.264 architectures for mobiles

    50/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    allows modest improvements in bit rate and Auality in most scenes. 7ut in certain types of

    scenes, such as those with repetitive motion or bac5-and-forth scene cuts or uncovered

    bac5ground areas, it allows a significant reduction in bit rate while maintaining clarity."#C$

    Variable bloc5-si6e motion compensation 1V7:%*2 with bloc5 si6es as large as #8#

    and as small as (8(, enabling precise segmentation of moving regions. The supported luma

    prediction bloc5 si6es include #8#, #8B, B8#, B8B, B8(, (8B, and (8(, many of which

    can be used together in a single macro bloc5. *hroma prediction bloc5 si6es are

    correspondingly smaller according to the chroma sub sampling in use.

    The ability to use multiple motion vectors per macro bloc5 1one or two per partition2 with

    a ma!imum of C in the case of a 7 macro bloc5 constructed of # (8( partitions. Themotion vectors for each B8B or larger partition region can point to different reference

    pictures.

    The ability to use any macro bloc5s type in 7-frames, including I-macro bloc5s, resulting

    in much more efficient encoding when using 7-frames. "#C$

    %&'-( ):&.

    :i!-tap filtering for derivation of half-peal lima sample predictions, for sharper sub pi!el

    motion-compensation. uarter-pi!el motion is derived by linear interpolation of the helpful

    values, to save processing power.

    uarter-pi!el precision for motion compensation, enabling precise description of the

    displacements of moving areas. 9or chroma the resolution is typically halved both vertically

    and hori6ontally. Therefore the motion compensation of chroma uses one-eighth chroma pi!el

    grid units.

    >eighted prediction, allowing an encoder to specify the use of a scaling and offset when

    performing motion compensation, and providing a significant benefit in performance in

    special cases]such as fade-to-blac5, fade-in, and cross-fade transitions. This includes

    implicit weighted prediction for 7-frames, and e!plicit weighted prediction for &-frames.

    :patial prediction from the edges of neighboring bloc5s for PintraP coding, rather than the

    P;*P-only prediction found in %&'- &art and the transform coefficient prediction found

    http://en.wikipedia.org/wiki/Digital_Video_Broadcastinghttp://en.wikipedia.org/wiki/United_Stateshttp://en.wikipedia.org/wiki/ATSC-M/Hhttp://en.wikipedia.org/wiki/Digital_Video_Recordershttp://en.wikipedia.org/wiki/Inter_framehttp://en.wikipedia.org/wiki/Video_compression_picture_types#Bi-predictive_pictures_.28or_slices.29http://en.wikipedia.org/wiki/Digital_Video_Broadcastinghttp://en.wikipedia.org/wiki/Advanced_Television_Systems_Committeehttp://en.wikipedia.org/wiki/United_Stateshttp://en.wikipedia.org/wiki/United_Stateshttp://en.wikipedia.org/wiki/ATSC-M/Hhttp://en.wikipedia.org/wiki/AVCHDhttp://en.wikipedia.org/wiki/Sonyhttp://en.wikipedia.org/wiki/Panasonichttp://en.wikipedia.org/wiki/AVC-Intrahttp://en.wikipedia.org/wiki/Video_compression#Intraframe_vs_interframe_compressionhttp://en.wikipedia.org/wiki/Panasonichttp://en.wikipedia.org/wiki/CCTVhttp://en.wikipedia.org/wiki/Video_Surveillancehttp://en.wikipedia.org/wiki/Digital_Video_Recordershttp://en.wikipedia.org/wiki/Inter_framehttp://en.wikipedia.org/wiki/Video_compression_picture_types#Bi-predictive_pictures_.28or_slices.29
  • 8/10/2019 low power h.264 architectures for mobiles

    51/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    in H.Cv and %&'-( &art . This includes luma prediction bloc5 si6es of #8#, B8B,

    and (8( 1of which only one type can be used within each macro bloc52."#C$

    Dossless macro bloc5 coding features includingeb"n&d )o o't *&"tur&'(

    #. 9ive devices with #//^ to #.% system gates. 9rom to C I+Fs with pac5age and density migrationC. Gp to (B^bits of bloc5 ?)% and up to C#^bits of distributed ?)%(. Gp to C embedded #BK#B multipliers for high performance ;:& applications. Gp to eight digital cloc5 managers

    Co't '">!n+ '?'t&$ !nt&r*"&' "nd 'o)ut!on'(

    #. :upports for Silin! plot form flash as well as commodity serial 1:&I2 and byte wide

    flash memory for configuration. asy to implement to interfaces to ;;? memoryC. :upports for #Bcommon I+F standards including &*I CC+, &*I-S, mini-DV;: and

    ?:;:

    Indu'tr?-)&"d!n+ d&'!+n too)' "nd IP'(

    #. I: design tools to shorten the design and verification. Hundreds of pre-verified, pre-optimi6ed intellectual property 1I&2 cores and reference

    designsC. *hip scope pro system debugging environment

    E"'? to u'&: )o o't FP=A d&>&)o%$&nt '?'t&$(

    #. *omplete :partan-C starter 5it available for only _#(4 G:;2. Includes S*C:// 9&'), :&I flash, C %7 ;;? memory, and support for G:7 .0

    9.

  • 8/10/2019 low power h.264 architectures for mobiles

    60/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    I* Silin! :partan-C 9&'), #//^ gates connectorsG:7 &ort, Hirose 9S, 9our

    #-pin p-mod connectors, V'), &:+, and serial programming ;iligent G:7 port providing

    board power, programming and data transfers.

    F!+ 9.2 ( !)!n8 S%"rt"n 0E FP=A @N&8?'-2

    The Ee!ys- is a powerful digital system design platform built around a Silin!

    :partan C 9&'). >ith #%bytes of fast :;?)% and #%bytes of 9lash ?F%, the Ee!ys-

    is ideally suited to embedded processors li5e Silin!Ws C-bit ?I:* %icro bla6e. The on-

    board high-speed G:7 port, together with a collection of I+F devices, data ports, and

    e!pansion connectors, allow a wide range of designs to be completed without the need for

    any additional components.

    F&"tur&'(

  • 8/10/2019 low power h.264 architectures for mobiles

    61/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    #. Silin! :partan-C 9&'), //^ or #//^ gate

    . G:7 port providing board power, device configuration, and high-speed data

    transfers

    C. >or5s with I:+>eb pac5 and ;^

    (. #%7 fast %icron &:;?)%

    . #%7 Intel :trata 9lash 9lash ?

    . Silin! &latform 9lash ?F%

    . High-efficiency switching power supplies 1good for battery-powered applications

    B. /%H6 oscillator, plus a soc5et for a second oscillator

    4. 9&') I+Fs routed to e!pansion connectors 1one high-speed Hirose 9S connector

    with (C signals and four ! &-mod connectors2

    #/. )ll I+F signals are :; and short-circuit protected, ensuring a long operating life in

    any environment.

    ##. Fn-board I+F includes eight D;s, four-digit seven-segment display, four

    pushbuttons, eight slide switches

    #. :hips in a ;V; case with a high-speed G:7 cable

    S%&!*!"t!on'(

  • 8/10/2019 low power h.264 architectures for mobiles

    62/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    Su%%)? >o)t"+& )&>&)' n&&''"r? *or %r&'&r>!n+ t/& RAM ont&nt'(

    S?$#o) D&'r!%t!on M!n Un!t'

    V;?IET V**IET Devel reAuired to retain ?)% data #./ V

    V;?)GS V**)GS Devel reAuired to retain ?)% data ./ V

    A#'o)ut& $"8!$u$ r"t!n+'(

    S?$#o) D&'r!%t!on Cond!t!on' M!n M"8 Un!t'

    V**IET Inter supply voltage -/. #.C V

    V**)GS )u!iliary :upply Voltage -/. C.// V

    V**F Futput driver supply voltage -/. C. V

    V?9 Input reference Voltage -/. V**F@/.1#2 V

    VIE1#,,C,(2

    Voltage applied to all

    user I+F pins and ;ual 3purpose pins

    ;river in

    a high-impedance state

    *ommercial -/.4 (.( V

    Industrial -/.B (.C V

    Voltage applied to alldedicated pins

    )ll temp. ranges -/. V**)GS@/.1C2 m)

    II^ Input clamp current perI+F pin

    -/.V` VIE`1 V**F@ /.V2 -- X#// V

    V:; lectrostatic ;ischarge

    Voltage

    Human body model --- X/// V

    *harged device model --- X// V

    %achine model --- X// V

    TQ Qunction temperature --- # /*

    T:T' :torage temperature - #//*

    Po&r 'u%%)? '%&!*!"t!on'(

    Su%%)? >o)t"+& t/r&'/o)d' *or %o&r-on r&'&t(

    S?$#o) D&'r!%t!on M!n M"8 Un!t'V**IET Threshold for the V**IETsupply /.( #./ V

  • 8/10/2019 low power h.264 architectures for mobiles

    63/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    V**)GST Threshold for the V**)GSTsupply /.B ./ V

    V**FT Threshold for the V**F7an5 supply /.( #./ V

    . SCREEN SHOTS OF ,HDL CODE SIMULATION

    St&%' to '!$u)"t& "nd du$% t/& %ro+r"$ !nto "n FP=A S%"rt"n-0E G!t(

    "#$ Fpen Silin!4. software icon on the des5 top.

  • 8/10/2019 low power h.264 architectures for mobiles

    64/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    "$ *lic5 on 9ID option on the tool bar and select a new project.

    1)n empty window will be opened2

    "C$ >rite the program and save it.

    "($ *lic5 on synthesi6e-S:T and chec5 the synta!.

    "$ )fter chec5ing the synta! successfully, write the test bench program.

    "$ 9or test bench waveform, open the test bench program and chec5 the synta! and

    simulate the behavioral model. 1Timing diagram will be appeared2.

    "$ Eow connect the 9&') :partan-C 5it to the system.

    "B$ *lic5 on user constraints, assign pac5age pins.

    "4$ Ee!t, generate post-synthesis simulator and generate programming file.

    "#/$ *lic5 on ;I'IDET software icon--adopt--e!port.

    "##$ Here assign the program to the program chain and initiali6e it.

    "#$ Then the program will be dumped Into 9&') :partan-C 5it, and output will be

    displayed as indication of D;s.

    FLOW CHARTS

    F)o /"rt *or SAD F)o(

    7uffering *urrent frame

  • 8/10/2019 low power h.264 architectures for mobiles

    65/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    ?econstruct previous frame

    9etch pi!els from currentframe and previous frame

    &rocess the pi!els1subtraction2

    )bsolute the differences

    )ccumulate theabsolute differences

    *ompare for minimum :);

    ;ecide %VR?F>,%VR*FD

    F)o /"rt *or DPC F)o(

    7uffering *urrent frame

    ?econstruct previous frame

  • 8/10/2019 low power h.264 architectures for mobiles

    66/79

  • 8/10/2019 low power h.264 architectures for mobiles

    67/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    "$ In behavioral simulation, open test bench program-save it-Silin! I: simulator and

    chec5 the synta!-simulate behavioral model 1then we can get the timing diagram2

  • 8/10/2019 low power h.264 architectures for mobiles

    68/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    "C$ )fter simulating the behavioral model we can get the timing diagram as shown below

  • 8/10/2019 low power h.264 architectures for mobiles

    69/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    "($ Eow generate the programming file and chec5 it out.

  • 8/10/2019 low power h.264 architectures for mobiles

    70/79

  • 8/10/2019 low power h.264 architectures for mobiles

    71/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    "$ )fter connecting the hardware to the *&G, firstly, we have to initiali6e the chain.

  • 8/10/2019 low power h.264 architectures for mobiles

    72/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    "$ )fter initiali6ation, add source file to the chain.

  • 8/10/2019 low power h.264 architectures for mobiles

    73/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    "B$ Eow run the programming chain, then the program will be dumped into 9&') :partan-C ^IT.:o the output will be displayed as indication of D;s.

  • 8/10/2019 low power h.264 architectures for mobiles

    74/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    1.CONCLUSION

  • 8/10/2019 low power h.264 architectures for mobiles

    75/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    This paper has represented a method to reduce the computational cost and memory

    access for V7:% using pi!el truncation. V7:% is a new coding techniAue and provides

    more accurate predictions compared to traditional fi!ed bloc5 si6e motion estimation. >ith

    97:%, if an %7 consists of two objects with different motion directions, the coding

    performance of this %7 is worse. Fn the other hand, for the same condition, the %7 can be

    divided into smaller bloc5s in order to fit the different motion directions with V7:%. Hence

    the coding performance is improved.

    However, for motion prediction using a smaller bloc5 si6es, pi!el truncation reduces the

    motion prediction accuracy. :o here we have proposed a two-step search to improve the

    frame prediction using pi!el truncation. Fur method reduces the total computation and

    memory accuracy compared to the conventional method without significantly degrading the

    picture Auality. The results theoretically show that the proposed architectures are able to save

    up to C0energy compared to the full search conventional full-search % architecture, which

    is eAuivalent to (/0 energy saving over the conventional H.( system. This ma5es such

    architecture attractive for H.( application in future mobile devices.

    11. REFERENCES

  • 8/10/2019 low power h.264 architectures for mobiles

    76/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    "# )dvanced video coding for generic audiovisual services,ITG-T ?ecommendation H.(

    I:F+I* #((4-#/ 1%&'-(2 )V*, //.

    "$ *- *hen, :- *hien,->. Huang. T-* *hen. >ang, and D-' *hen, L)nalysis

    andarchitecture design of variable bloc5 si6e motion estimation for H.(+)V*,M I Trans.

    "C$ -D.7ahari, T.)rslan, and ).T.edogan, Llow computation and memory access for

    variable bloc5 si6e motion estimation using pi!el truncationM, In proc. I wor5shop signal

    process. :yst. // pp.B#-B.

    "($ -D, He, *-.Tsui, ^-^. *hen, and %.D. Diou, low power VD:I design for motion

    estimation using adaptive pi!el truncationM I Trans, vol. #/, no. , pp. 4-B, aug

    ///.

    "$ -D.7ahari, T.)rslan, and ).T.edogan, Llow power hardware architecture for V7:%

    using pi!el truncationM, In proc. I wor5shop signal process. :yst. // pp.B#-B.

    Hyderabad, andhrapradesh.

    "$ 7.Eatrajan, V.7has5aran, and ^.*onstantitudes, LDow comple!ity bloc5 based motion

    estimation via one-bit transformM, I Trans, vol, , no.(, pp. /-/.

    )dvanced video coding for generic audiovisual services,ITG-T ?ecommendation H.(

    I:F+I* #((4-#/ 1%&'-(2 )V*, //.

    "$ ).rtur5 and :.rtur5, Ltwo-bit transform for binary bloc5 motion estimationM I

    Trans, *ircuits :yst. Video Technol., vol. #, no., pp. 4CB-4(, Qul.//.

    "B$ :.Dee, Q.%. ^im, and :-I *hae, LEew motion estimation algorithm using adaptively

    Auanti6ed low bit resolution image and its VD:I architecture for %&' video encodingM

    I Trans. *irsuits syst. Viseo technol. Vol.B, no. , pp. C(-((. Fct #44B.

    "4$ .*han and :.^ung, L%ulti level pi!el differece classification methodsM, in proc,

    Ilnt, *onf. Image processs, vol.C. >ashington ;.*., #44, pp.-.

  • 8/10/2019 low power h.264 architectures for mobiles

    77/79

    LOW-POWER H.264 ARCHITECTURES FOR MOBILE COMMUNICATION

    "#/$ V.'.%oshnyaga, L%sb truncation scheme for low-power video processorsM, in &roc,

    I lnt.:ymp.*ircuits :yst., vol. ( Frlando, 9D #444, pp. 4#-4(.

    "##$ %-Q. *hen, D-'. *hen, T-;.*hiueh, and -&. Dee, L) new bloc5 matching criterion for

    motion estimation and its implementationM, I Trans. *ircuits :yst. Video Technol., vol. ,

    no.C pp. C#-C.

    "#$ :.:harma, &.%ishra, :. :awant, *.&. %ammen, and V.%. 'adre, L&re-decision strategy

    for coded+non-coded %7s in %&'(M, in &roc, Dnt. *onf. :ignal &rocess. *omman

    //(1:&*F%2, 7angalore, India //(, pp./#-/.

    "#C$ T.*.*hen, -H.*hen, :-9. Tsai, :..*hien, and D.'.*hen, L 9ast algorithm and

    architecture design of low-power integer motion estimation for H.(+)V*M, I Trans.

    *ircuits :yst. Video Tchnol, vol. #, no. , pp. B-, may //.

    "#($ %.%iyama, Q.%iya5oshi, . urada, ^.Dmamura, H.Hashimoto, and %.oshimoto, L )

    sub-mw %&'( motion estimation process core for mobile video applivationM, I Trans.

    *icuits :yst. Vol. C4, no. 4, pp. #-#/. :ep.//(.

    "#$ Video coding for Dow 7it ?ate *ommunication, 9eb. #44B.

    "#$ Information Technology *oding of )udiF-Visul Fbjects- &art N Visual, I:F+I*

    #((4- , #444.

    "#$ :.:rinivasan, Q.Hsu, T.Holcomb, ^.%i5erjee, :.D. ?egunathan, 7.Din, Q.Diang, %-

    *.Dee, and Q.?ibas-corbera, L>indows media video 4< over view and applicationsM, :ignal

    process< image commun., vol.#4, pp.B#-B. :ep.//(.

    "#B$ ;raft-ITG-T, ?ecommendations and fian draft international standard of joint video

    specification, %ay //C.

    "#4$ ^.%.nag, %.T.:un, and D.>u, L ) family of VD:I designs for the motion

    compensation bloc5 matching algorithm,M I Trans. *ircuits :yst. Vol.C, no.#/. pp. #C#-

    #C. Fct #4B4.

  • 8/10/2019 low power h.264 architectures for mobiles

    78/79

  • 8/10/2019 low power h.264 architectures for mobiles

    79/79