comp format

Upload: mrdickus

Post on 01-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Comp Doc File Format

    1/25

    OpenOffice.org's Documentation of the

    Microsoft Compound Document

    File Format

    Author Daniel Rentz ✉ mailto:[email protected]  http://sc.openoffice.org

    License Public Documentation License

    Contributors

    Other sources !perlin"s to #i"ipedia $ http://%%%.%i"ipedia.org& for 'arious e(tended information

    )ailing list   ✉ mailto:de'@sc.openoffice.org

    *ubscription ✉ mailto:de'[email protected]

    Do%nload PD, http://sc.openoffice.org/compdocfileformat.pdf -)L  http://sc.openoffice.org/compdocfileformat.odt

    Proect s tarted 001+Aug+20

    Last change 003+Aug+03

    Re'ision 4.5

    mailto:[email protected]:[email protected]://sc.openoffice.org/http://sc.openoffice.org/http://www.wikipedia.org/http://www.wikipedia.org/mailto:[email protected]:[email protected]:[email protected]://sc.openoffice.org/compdocfileformat.pdfhttp://sc.openoffice.org/compdocfileformat.sxwhttp://sc.openoffice.org/compdocfileformat.sxwmailto:[email protected]:[email protected]://sc.openoffice.org/http://sc.openoffice.org/http://sc.openoffice.org/http://www.wikipedia.org/http://www.wikipedia.org/http://www.wikipedia.org/mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://sc.openoffice.org/compdocfileformat.pdfhttp://sc.openoffice.org/compdocfileformat.pdfhttp://sc.openoffice.org/compdocfileformat.pdfhttp://sc.openoffice.org/compdocfileformat.sxwhttp://sc.openoffice.org/compdocfileformat.sxwhttp://sc.openoffice.org/compdocfileformat.sxwmailto:[email protected]

  • 8/9/2019 Comp Doc File Format

    2/25

     

    Contents

    4 6ntroduction ......................................................................................................... 24.4 License 7otices 2

    4. Abstract 2

    4.2 8sed 9erms *!mbols and ,ormatting 1

    *torages and *treams ........................................................................................... 5

    2 *ectors and *ector Chains ................................................................................... ;2.4 *ectors and *ector 6dentifiers ;

    2. *ector Chains and *ec6D Chains 3

    1 Compound Document eader ............................................................................. <1.4 Compound Document eader Contents <

    1. =!te Order >1.2 *ector ,ile Offsets >

    5 *ector Allocation ............................................................................................... 405.4 )aster *ector Allocation 9able 40

    5. *ector Allocation 9able 44

    ; *hort+*treams .................................................................................................... 4;.4 *hort+*tream Container *tream 4

    ;. *hort+*ector Allocation 9able 4

    3 Director! ............................................................................................................ 423.4 Director! *tructure 42

    3. Director! ?ntries 45

    < ?(ample ............................................................................................................. 43

  • 8/9/2019 Comp Doc File Format

    3/25

    4 6ntroduction

    1 Introduction

    1.1 License Notices

    1.1.1 Public Documentation License Notice

    9he contents of this Documentation are subect to the Public Documentation License ersion 4.0 $the BLicenseB& !ou

    ma! onl! use this Documentation if !ou compl! %ith the terms of this License. A cop! of the License is a'ailable at

     http://%%%.openoffice.org/licenses/PDL.html.

    9he Original Documentation is BOpenOffice.orgs Documentation of the )icrosoft Compound Document ,ile ,ormatB.

    9he 6nitial #riter of the Original Documentation is *un )icros!stems 6nc. Cop!right E 002. All Rights Reser'ed.

    *ee title page for Author contact and Contributors.

    All 9rademar"s are properties of their respecti'e o%ners.

    1.1. !i"ipedia

    #i"ipedia Disclaimer: http://en.%i"ipedia.org/%i"i/#i"ipedia:eneralFdisclaimer

    1. #bstract

    9his document contains a description of the binar! format of )icrosoft Compound Document files.

    Compound document files are used to structure the contents of a document in the file. 6t is possible to di'ide the datainto se'eral streams and to store these streams in different storages  in the file. 9his %a! compound document files

    support a complete file s!stem inside the file the streams are li"e files in a real file s!stem and the storages are li"e sub

    directories.

    2

    http://www.openoffice.org/licenses/PDL.htmlhttp://www.openoffice.org/licenses/PDL.htmlhttp://en.wikipedia.org/wiki/Wikipedia:General_disclaimerhttp://www.openoffice.org/licenses/PDL.htmlhttp://www.openoffice.org/licenses/PDL.htmlhttp://www.openoffice.org/licenses/PDL.htmlhttp://en.wikipedia.org/wiki/Wikipedia:General_disclaimerhttp://en.wikipedia.org/wiki/Wikipedia:General_disclaimerhttp://en.wikipedia.org/wiki/Wikipedia:General_disclaimer

  • 8/9/2019 Comp Doc File Format

    4/25

    4 6ntroduction

    1.$ %sed &erms ()mbols and Formatting

    • *eferences

    A reference to another chapter is s!mbolised b! a little arro%: ➜4.4.

    • +,amples

    An e(ample is indented and mar"ed %ith a light+gre! border.

    9his is an e(ample.

    • Numbers and (trings

    7umerical 'alues are sho%n in se'eral number s!stems:

    Number s)stem Mar"ing +,ample

    Decimal 7one 421e(adecimal 9railing GH 1234H

    =inar! 9railing GH 10012

    Constant strings are enclosed in Iuotation mar"s. 9he! ma! contain specific 'alues $control characters unprintable

    characters&. 9hese 'alues are enclosed in angle brac"ets.

    ?(ample of a string containing a control character: Gabcdef ghiH.

    • Content Listings

    • 9he term G Not used H means: 6gnore the data on import and %rite zero b!tes on e(port. 9he same applies for unmen+

    tioned bits in bit fields.• 9he term GUnknownH describes data fields %ith fi(ed but un"no%n contents. On e(port these fields ha'e to be %ritten

    as sho%n.

    • At se'eral places a variable is introduced %hich represents the 'alue of this field for later use e.g. in formulas. An

    e(ample can be found in ➜1.4.

    • Formulas

    6mportant formulas are sho%n in a light+gre! bo(.

    1

  • 8/9/2019 Comp Doc File Format

    5/25

    *torages and *treams

    (torages and (treams

    Compound document files %or" similar to real file s!stems. 9he! contain a number of independent data streams $li"e

    files in a file s!stem& %hich are organised in a hierarch! of storages $li"e sub directories in a file s!stem&.

    *torages and streams are named. 9he names of all storages and streams that are direct members of a storage must be

    different. 7ames of streams or storages that are members of different storages ma! be eIual.

    ?ach compound document file contains a root storage  that is the direct or indirect parent of all other storages and

    streams.

    ?(ample of a storage/stream hierarch!. 9he names of all direct members of a storage must be different but it is

    possible that t%o different storages contain a stream named G*tream4H.

    Root *torage

    *torage4 *tream4 *tream *torage *tream2 *tream1

    *tream4 *tream4 *tream *tream2

    5

  • 8/9/2019 Comp Doc File Format

    6/25

    2 *ectors and *ector Chains

    $ (ectors and (ector Chains

    $.1 (ectors and (ector Identifiers

    All streams of a compound document file are di'ided into small bloc"s of data called sectors. *ectors ma! contain

    internal control data of the compound document or parts of the user data.

    9he entire file consists of a header structure $the compound document header ➜1.4& and a list of all sectors follo%ingthe header. 9he size of the sectors can be set in the header and is fi(ed for all sectors then.

    ?AD?R 

    *?C9OR 0

    *?C9OR 4

    *?C9OR

    *?C9OR 2

    *?C9OR 1

    *?C9OR 5

    *?C9OR ;

    *ectors are enumerated simpl! b! their order in the file. 9he $zero+based& inde( of a sector is called sector identifier

    $*ec6D&. *ec6Ds are signed 32-bit integer values. 6f a *ec6D is not negati'e it must refer to an e(isting sector. 6f a *ec6D

    is negati'e it has a special meaning. 9he follo%ing table sho%s all 'alid special *ec6Ds:

    (ecID Name Meaning

     J4  Free SecID ,ree sector ma! e(ist in the file but is not part of an! stream

     J  End Of !ain SecID 9railing *ec6D in a *ec6D chain $➜2.&

     J2 S"# SecID *ector is used b! the sector allocation table $➜5.&

     J1  $S"# SecID *ector is used b! the master sector allocation table $➜5.4&

    ;

  • 8/9/2019 Comp Doc File Format

    7/25

    2. *ector Chains and *ec6D Chains

    $. (ector Chains and (ecID Chains

    9he list of all sectors used to store the data of one stream is called sector c!ain. 9he sectors ma! appear unordered and

    ma! be located on different positions in the file. 9herefore an arra! of *ec6Ds the SecID c!ain specifies the order of all

    sectors of a stream. A *ec6D chain is al%a!s terminated b! a special  End Of !ain SecID %ith the 'alue J $➜2.4&.

    ?(ample: A stream consists of 1 sectors. 9he *ec6D chain of the stream is K4 ; 2 5 J. *ee ➜1.2 on ho% to

    calculate the file offset of a sector from its *ec6D.

    ?AD?R 

    *?C9OR 0

    *?C9OR 4

    *?C9OR

    *?C9OR 2

    *?C9OR 1

    *?C9OR 5

    *?C9OR ;⋮

    9he *ec6D chain for each stream is built up from the sector allocation table $➜5.& %ith e(ception of short+streams $➜;&

    and the follo%ing t%o internal streams:

    • the master sector allocation table $➜5.4& %hich builds its *ec6D chain from itself $each sector contains the *ec6D of

    the follo%ing sector& and

    • the sector allocation table itself %hich builds its *ec6D chain from the master sector allocation table.

    3

  • 8/9/2019 Comp Doc File Format

    8/25

    1 Compound Document eader

    - Compound Document eader

    9he com%ound document !eader   $simpl! GheaderH in the follo%ing& contains all data needed to start reading a

    compound document file.

    -.1 Compound Document eader Contents

    9he header is al%a!s located at the beginning of the file and its size is e(actl! 54 b!tes. 9his implies that the first

    sector $%ith *ec6D 0& al%a!s starts at file offset 54.

    Contents of the compound document header structure:

    Offset (i/e Contents

    0 < Compound document file identifier: D0H CFH 11H E0H A1H B1H 1AH E1H

    < 4; 8niIue identifier $86D& of this file $not of interest in the follo%ing ma! be all 0&

    1 Re'ision number of the file format $most used is 003EH&

    ; ersion number of the file format $most used is 0003H&

    < =!te order identifier $➜1.&: FEH FFH M Little+?ndian

    FFH FEH M =ig+?ndian

    20 *ize of a sector in the compound document file $➜2.4& in po%er+of+t%o $ssz& real sector

    size is sec_size M ssz b!tes $minimum 'alue is 3 %hich means 4< b!tes most used

    'alue is > %hich means 54 b!tes&

    2 *ize of a short+sector in the short+stream container stream $➜;.4& in po%er+of+t%o $sssz&

    real short+sector size is short_sec_size  M sssz b!tes $ma(imum 'alue is sector size

    ssz see abo'e most used 'alue is ; %hich means ;1 b!tes&

    21 40 7ot used

    11 1 9otal number of sectors used for the sector allocation table $➜5.&

    1< 1 *ec6D of first sector of the director! stream $➜

    3&5 1 7ot used

    5; 1 )inimum size of a standard stream $in b!tes minimum allo%ed and most used size is 10>;

    b!tes& streams %ith an actual size smaller than $and not  eIual to& this 'alue are stored as

    short+streams $➜;&

    ;0 1 *ec6D of first sector of the short+sector allocation table $➜;.& or J $ End Of !ain

    SecID ➜2.4& if not e(tant

    ;1 1 9otal number of sectors used for the short+sector allocation table $➜;.&

    ;< 1 *ec6D of first sector of the master sector allocation table $➜5.4& or J $ End Of !ain

    SecID ➜2.4& if no additional sectors used

    3 1 9otal number of sectors used for the master sector allocation table $➜5.4&

    3; 12; ,irst part of the master sector allocation table $➜5.4& containing 40> *ec6Ds

  • 8/9/2019 Comp Doc File Format

    9/25

    1. =!te Order

    -. 0)te Order

    All data items containing more than one b!te ma! be stored using the Little+?ndian or =ig+?ndian method4 but in real

    %orld applications onl! the Little+?ndian method is used. 9he Little+?ndian method stores the least significant b!te first

    and the most significant b!te last. 9his applies for all data t!pes li"e 4;+bit integers 2+bit integers and 8nicode

    characters.

    ?(ample: 9he 2+bit integer 'alue 1357BDFH is con'erted into the Little+?ndian b!te seIuence DFH BH 57H13H or to the =ig+?ndian b!te seIuence13H 57H BH DFH.

    -.$ (ector File Offsets

    #ith the 'alues from the header it is possible to calculate a file offset from a *ec6D:

    sec_!os"#ec$D% M 54 N #ec$D  sec_size M 54 N #ec$D  ssz

    ?(ample %ith ssz M 40 and #ec$D M 5:

    sec_!os"#ec$D% M 54 N #ec$D  ssz M 54 N 5 40 M 54 N 5 401 M 5;2.

    4 ,or more information see http://en.%i"ipedia.org/%i"i/?ndianness .

    >

    http://en.wikipedia.org/wiki/Endiannesshttp://en.wikipedia.org/wiki/Endiannesshttp://en.wikipedia.org/wiki/Endiannesshttp://en.wikipedia.org/wiki/Endiannesshttp://en.wikipedia.org/wiki/Endianness

  • 8/9/2019 Comp Doc File Format

    10/25

    5 *ector Allocation

    (ector #llocation

    .1 Master (ector #llocation &able

    9he master sector allocation table $)*A9& is an arra! of *ec6Ds of all sectors used b! the sector allocation table $*A9

    ➜5.& %hich finall! is needed to read an! other stream in the file. 9he size of the )*A9 $number of *ec6Ds& is eIual to

    the number of sectors used b! the *A9. 9his 'alue is stored in the header $➜1.4&.9he first 40> *ec6Ds of the )*A9 are contained in the header too. 6f the )*A9 contains more than 40> *ec6Ds

    additional sectors are used to store the follo%ing *ec6Ds. 9he header contains the *ec6D of the first sector used for the

    )*A9 then $other%ise there is the special End Of !ain SecID %ith the 'alue J ➜2.4&.

    9he last *ec6D in each sector of the )*A9 refers to the ne(t sector used b! the )*A9. 6f no more sectors follo% the

    last *ec6D is the special End Of !ain SecID %ith the 'alue J $➜2.4&.

    Contents of a sector of the )*A9 $sec_size is the size of a sector in b!tes see ➜1.4&:

    Offset (i/e Contents

    0 sec_size J 1 Arra! of $sec_size J 1& / 1 *ec6Ds of the )*A9

    sec_size J 1 1 *ec6D of the ne(t sector used for the )*A9 or J if this is the last sector

    9he last sector of the )*A9 ma! not be used completel!. 8nused space is filled %ith the special  Free SecID %ith the

    'alue J4 $➜2.4&. 9he )*A9 is built up b! concatenating all *ec6Ds from the header and the additional )*A9 sectors.

    ?(ample: A compound document file contains a *A9 that needs 200 sectors to be stored. 9he header specifies a

    sector size of 54 b!tes. 9his implies that a sector is able to store 4< *ec6Ds. 9he )*A9 consists of 200

    *ec6Ds $number of sectors used for the *A9&. 9he first 40> *ec6Ds are stored in the header. 9he remaining 4>4

    *ec6Ds of the )*A9 need additional t%o sectors. 6n this e(ample the first sector of the )*A9 ma! be sector 4

    %hich contains the ne(t 43 *ec6Ds of the )*A9 $the 4< th *ec6D points to the ne(t )*A9 sector& and the

    second sector of the )*A9 ma! be sector ; %hich contains the remaining ;1 *ec6Ds.

    ?AD?R *ec6D of first sector of the )*A9 M 4

    *?C9OR 0

    *?C9OR 4 *ec6D of ne(t sector of the )*A9 $last *ec6D in this sector& M ;

    *?C9OR

    *?C9OR 2

    *?C9OR 1

    *?C9OR 5

    *?C9OR ; *ec6D of ne(t sector of the )*A9 $last *ec6D in this sector& M J

    40

  • 8/9/2019 Comp Doc File Format

    11/25

    5. *ector Allocation 9able

    . (ector #llocation &able

    9he sector allocation table $*A9& is an arra! of *ec6Ds. 6t contains the *ec6D chain $➜2.& of all user streams $e(cept

    short+streams ➜;& and of the remaining internal control streams $the short+stream container stream ➜;.4 the short+

    sector allocation table ➜;. and the director! ➜3&. 9he size of the *A9 $number of *ec6Ds& is eIual to the number of

    e(isting sectors in the compound document file.

    ..1 *eading the (ector #llocation &able

    9he *A9 is built b! reading and concatenating the contents of all sectors gi'en in the )*A9 $➜5.4&. 9he sectors ha'e to

    be read according to the order of the *ec6Ds in the )*A9.

    Contents of a sector of the *A9 $sec_size is the size of a sector in b!tes see ➜1.4&:

    Offset (i/e Contents

    0 sec_size Arra! of sec_size/1 *ec6Ds of the *A9

    .. %sing the (ector #llocation &able

    #hen building a *ec6D chain $➜2.& for a specific stream the current %osition $arra! inde(& in the *A9 arra! refers to

    the current sector %hile the *ec6D contained at t!is %osition specifies the follo%ing sector in the sector chain.

    9he *A9 ma! contain special Free SecIDs %ith the 'alue J4 $➜2.4& at an! position. 9hese sectors are not used b! a

    stream. 9he position referring to the last sector of a stream contains the special  End Of !ain SecID %ith the 'alue J.

    *ectors used b! the *A9 itself are not chained but are mar"ed %ith the special S"# SecID %ith the 'alue J2. ,inall!

    sectors used b! the )*A9 are mar"ed %ith the special $S"# SecID %ith the 'alue J1.

    9he entr! point of a *ec6D chain has to be obtained some%here else e.g. from the director! entr! $➜3.& of a userstream or from the header $➜1.4& for internal control streams such as the short+sector allocation table $➜;.& or the

    director! stream itself $➜3.4&.

    ?(ample: A compound document file contains one sector needed for the *A9 $sector 4& and t%o streams.

    *ector 4 contains the *ec6D arra! of the *A9 sho%n belo%. 9he *A9 contains the special S"# SecID $'alue J2&

    at position 4 %hich mar"s this sector being part of the *A9.

    One stream is the internal director! stream. 6n this e(ample the header ma! specif! that it starts %ith sector 0.

    9he *A9 contains the *ec6D at position 0 the *ec6D 2 at position and the *ec6D J at position 2. 9herefore

    the *ec6D chain of the director! stream is K0 2 J and the director! stream is stored in 2 sectors.

    9he director! contains $amongst others& the entr! of a user stream that ma! start %ith sector 40. 9his results in

    the *ec6D chain K40 ; 3 J for this stream.

    Arra! inde(es 0 4 2 1 5 ; 3 < > 40

    *A9 contents $*ec6Ds& J2 2 J J4 J4 3 < > J ;

    44

  • 8/9/2019 Comp Doc File Format

    12/25

    ; *hort+*treams

    2 (hort3(treams

    #hene'er a stream is shorter than a specific length $specified in the header ➜1.4& it is stored as a s!ort-stream. *hort+

    streams do not directl! use sectors to store their data but are all embedded in a specific internal control stream the

    short+stream container stream.

    2.1 (hort3(tream Container (tream

    9he s!ort-stream container stream is stored li"e an! other $long& user stream: 9he first used sector has to be obtained

    from the root storage entr! in the director! $➜3.& and its *ec6D chain $➜2.& is contained in the *A9 $➜5.&. 9he data

    of all sectors used b! the short+stream container stream are concatenated in order of its *ec6D chain. 6n the ne(t step this

    stream is 'irtuall! di'ided into short+sectors similar to sectors in the main compound document file $➜2.4& but %ithout

    a header structure. 9herefore the first short+sector $%ith *ec6D 0& is al%a!s located at offset 0 inside the short+stream

    container stream. 9he size of the short+sectors is contained in the header $➜1.4&. #ith this information it is possible to

    calculate an offset in the short+stream container stream from a *ec6D:

    short_sec_!os"#ec$D% M #ec$D  short_sec_size M #ec$D  sssz

    ?(ample %ith sssz M ; and #ec$D M 5:

    short_sec_!os"#ec$D% M #ec$D  sssz M 5 ; M 5 ;1 M 20.

    2. (hort3(ector #llocation &able

    9he s!ort-sector allocation table $**A9& is an arra! of *ec6Ds and contains the *ec6D chains $ ➜2.& of all short+

    streams similar to the sector allocation table $➜5.& that contains the *ec6D chains of standard streams.

    9he first *ec6D of the **A9 is contained in the header $➜1.4& the remaining *ec6D chain is contained in the *A9. 9he

    **A9 is built b! reading and concatenating the contents of all sectors.

    Contents of a sector of the **A9 $sec_size is the size of a sector in b!tes see ➜1.4&:

    Offset (i/e Contents

    0 sec_size Arra! of sec_size/1 *ec6Ds of the **A9

    9he **A9 %ill be used similarl! to the *A9 $➜5.& %ith the difference that the *ec6D chains refer to short+sectors in the

    short+stream container stream $➜;.4&.

    4

  • 8/9/2019 Comp Doc File Format

    13/25

    3 Director!

    4 Director)

    4.1 Director) (tructure

    9he director& is an internal control stream that consists of an arra! of director& entries  $➜3.&. ?ach director! entr!

    refers to a storage or a stream in the compound document file $➜&. Director! entries are enumerated in order of their

    appearance in the stream. 9he zero+based inde( of a director! entr! is called director& entr& identifier  $Dir6D&.

    D6R?C9ORQ ?79RQ 0

    D6R?C9ORQ ?79RQ 4

    D6R?C9ORQ ?79RQ

    D6R?C9ORQ ?79RQ 2

    9he position of a director! entr! %ill not change as long as the referred storage or stream e(ists in the compound

    document. 9his implies that the Dir6D of a storage or stream ne'er changes regardless ho% man! other obects are

    inserted to or remo'ed from the compound document. 6f a storage or stream is remo'ed the corresponding director! entr!

    is mar"ed as empt!. 9here is a special director! entr! at the beginning of the director! $%ith the Dir6D 0&. 6t represents

    the root storage and is called root storage entr&.9he director! organises direct members $storages and streams& of each storage in a separate red+blac" tree . *hortl!

    nodes in a red+blac" tree ha'e to fulfil all  of the follo%ing conditions:

    • 9he root node is blac".

    • 9he parent of a red node is blac".

    • 9he paths from the root node to all lea'es contain the same number of blac" nodes.

    • 9he left child of a node is less than the node the right child is greater.

    =ut note that not all implementations follo% these rules. 9he safest %a! to read director! entries is to ignore the node

    colours and to rebuild the red+blac" tree from scratch.

    ?(ample: 9a"ing the e(ample from ➜ the director! %ould ha'e the follo%ing structure:

    • 9he root storage is represented b! the root storage entr!. 6t does not ha'e a parent director! entr! thereforethere are no other entries that can be organised in a red+blac" tree.

    • All members of the root storage $G*torage4H G*torageH G*tream4H G*treamH G*tream2H and G*tream1H&

    are inserted into a red+blac" tree. 9he Dir6D of the root node of this tree is stored in the root storage entr!.

    • 9he storage G*torage4H contains one member G*tream4H %hich is inserted into a separate red+blac" tree. 9he

    director! entr! of G*torage4H contains the Dir6D of G*tream4H.

    • 9he storage G*torageH contains three members G*tream4H G*treamH and G*tream2H. 9hese director!

    entries are organised in a separate red+blac" tree. 9he director! entr! of G*torageH contains the Dir6D of the

    root node of this tree.

    *ee http://en.%i"ipedia.org/%i"i/RedFblac"Ftree .

    42

    http://en.wikipedia.org/wiki/Red_black_treehttp://en.wikipedia.org/wiki/Red_black_treehttp://en.wikipedia.org/wiki/Red_black_treehttp://en.wikipedia.org/wiki/Red_black_treehttp://en.wikipedia.org/wiki/Red_black_treehttp://en.wikipedia.org/wiki/Red_black_tree

  • 8/9/2019 Comp Doc File Format

    14/25

    3 Director!

    9his results in the fact that each director! entr! contains up to three Dir6Ds: 9he first is the Dir6D of the left child of the

    red+blac" tree containing this entr! the second is the Dir6D of the right child in the tree and $if this entr! is a storage&

    the third is the Dir6D of the root node of another red+blac" tree containing all sub streams and sub storages.

    7odes are compared b! name to decide %hether the! become the left or right child of another node:

    • A node is less than another node if the name is shorter and greater if the name is longer.

    • 6f both names ha'e the same length the! are compared character b! character $case insensiti'e&.

    ?(amples: 9he name G#-QH is less than the name GA=CD?,H because the length of the former name is

    shorter $regardless of the fact that the character is greater than the character A&. 9he name GA=CD?H is less

    than the name GA=C,H because the lengths of both names are eIual and comparing the names sho%s that the

    fourth character of the former name is less then the fourth character of the latter name.

    41

  • 8/9/2019 Comp Doc File Format

    15/25

    3. Director! ?ntries

    4. Director) +ntries

    4..1 Director) +ntr) (tructure

    9he size of each director! entr! is e(actl! 4< b!tes. 9he formula to calculate an offset in the director! stream from a

    Dir6D is as follo%s:

    &ir_e'tr(_!os"Dir$D% M Dir$D  4<

    Contents of the director! entr! structure:

    Offset (i/e Contents

    0 ;1 Character arra! of the name of the entr! al%a!s 4;+bit 8nicode characters %ith trailing

    zero character $results in a ma(imum name length of 24 characters&

    ;1 *ize of the used area of the character buffer of the name $not character count& including

    the trailing zero character $e.g. 4 for a name %ith 5 characters: $5N4& M 4&

    ;; 4 9!pe of the entr!: 00H M ?mpt! 03H M Loc"=!tes $un"no%n&01H M 8ser storage 04H M Propert! $un"no%n&

    02H M 8ser stream 05H M Root storage

    ;3 4 7ode colour of the entr!: 00H M Red 01H M =lac" 

    ;< 1 Dir6D of the left child node inside the red+blac" tree of all direct members of the parent

    storage $if this entr! is a user storage or stream ➜3.4& J4 if there is no left child

    3 1 Dir6D of the right child node inside the red+blac" tree of all direct members of the parent

    storage $if this entr! is a user storage or stream ➜3.4& J4 if there is no right child

    3; 1 Dir6D of the root node entr! of the red+blac" tree of all storage members $if this entr! is a

    storage ➜3.4& J4 other%ise

    ; 1 8ser flags $not of interest in the follo%ing ma! be all 0&

    400 < 9ime stamp of creation of this entr! $➜3..2&. )ost implementations do not %rite a 'alid

    time stamp but fill up this space %ith zero b!tes.

    40< < 9ime stamp of last modification of this entr! $➜3..2&. )ost implementations do not %rite

    a 'alid time stamp but fill up this space %ith zero b!tes.

    44; 1 *ec6D of first sector or short+sector if this entr! refers to a stream $➜3..& *ec6D of first

    sector of the short+stream container stream $➜;.4& if this is the root storage entr! 0

    other%ise

    40 1 9otal stream size in b!tes if this entr! refers to a stream $➜3..& total size of the short+

    stream container stream $➜;.4& if this is the root storage entr! 0 other%ise

    41 1 7ot used

    4.. (tarting Position of a (tream

    9he director! entr! of a stream contains the *ec6D of the first sector or short+sector containing the stream data. All

    streams that are shorter than a specific size gi'en in the header $➜1.4& are stored as a short+stream thus inserted into the

    short+stream container stream $➜;.4&. 6n this case the *ec6D specifies the first short+sector inside the short+stream

    container stream and the short+sector allocation table $➜;.& is used to build up the *ec6D chain $➜2.& of the stream.

    45

  • 8/9/2019 Comp Doc File Format

    16/25

    3 Director!

    4..$ &ime (tamp

    9he time stam% field is an unsigned ;1+bit integer 'alue that contains the time elapsed since 4;04+San+04 00:00:00

    $regorian calendar2&. One unit of this 'alue is eIual to 400 nanoseconds $40 J3 seconds&. 9hat means each second the

    time stamp 'alue %ill be increased b! 40 million units.

    #hen calculating the date from a time stamp the correct rules of leap !ear handling ha'e to be respected1:

    • a !ear di'isible b! 1 is a leap !ear

    • %ith the e(ception that a !ear di'isible b! 400 is not a leap !ear $e.g. 4>00 %as no leap !ear&

    • %ith the e(ception that a !ear di'isible b! 100 is a leap !ear $e.g. 000 %as a leap !ear&.

    ?(ample: 9he time stamp 'alue is 01A5E403C2D5C00H.

    Calculation step Formula *esult

    Con'ersion to decimal t0 M 444>150

    )inutes in an hour r*i' M t2 modulo ;0 r*i' M 20

    Remaining entire hours t3 M t2 / ;0 t3 M 2>33

    Remaining da!s in !ear 4>33 t5 M t4 J $number of da!s from

    4;04+San+04 to 4>33+San+04&

    t5 M 423111 J 423224 M 442

    ?ntire months from 4>33+San+04 r*o'th M 4 N number of full months in t5 r*o'th M 4 N 2 M 1 M April

    Remaining da!s in month April t, M t5 J $number of da!s from

    4>33+San+04 to 4>33+Apr+04&

    t, M 442 J >0 M 2

    Resulting da! of month April r&a( M 4 N t, r&a( M 4 N 2 M 1

    9he final result is 4>33+Apr+1 04:20:00. uess %hat it is

    2 *ee http://en.%i"ipedia.org/%i"i/regorianFcalendar.1

    *ee http://en.%i"ipedia.org/%i"i/LeapF!ear for some bac"ground information.5 Qou ma! use !our fa'ourite date/time manipulation librar! to perform the follo%ing steps.

    4;

    http://en.wikipedia.org/wiki/Gregorian_calendarhttp://en.wikipedia.org/wiki/Gregorian_calendarhttp://en.wikipedia.org/wiki/Gregorian_calendarhttp://en.wikipedia.org/wiki/Leap_yearhttp://en.wikipedia.org/wiki/Leap_yearhttp://en.wikipedia.org/wiki/Leap_yearhttp://en.wikipedia.org/wiki/Gregorian_calendarhttp://en.wikipedia.org/wiki/Gregorian_calendarhttp://en.wikipedia.org/wiki/Gregorian_calendarhttp://en.wikipedia.org/wiki/Leap_yearhttp://en.wikipedia.org/wiki/Leap_yearhttp://en.wikipedia.org/wiki/Leap_year

  • 8/9/2019 Comp Doc File Format

    17/25

    < ?(ample

    5 +,ample

    9his chapter sho%s a possible %a! to open a compound document file. 9he file that is processed here is a simple spread+

    sheet document in )icrosoft ?(cel file format %ritten b! OpenOffice.org Calc.

    5.1 Compound Document eader

    9he first step is to read the compound document header $➜1.4&. 9he first 54 b!tes of the file ma! loo" li"e this:

    00000000H  D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 0000000010H  00 00 00 00 00 00 00 00 3B 00 03 00 FE FF 0 0000000020H  0, 00 00 00 00 00 00 00 00 00 00 00 01 00 00 0000000030H  0A 00 00 00 00 00 00 00 00 10 00 00 02 00 00 0000000040H  01 00 00 00 FE FF FF FF 00 00 00 00 00 00 00 0000000050H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000000,0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF00000070H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

    000000-0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF0000000H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000000A0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000000B0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000000C0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000000D0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000000E0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000000F0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF00000100H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF00000110H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF00000120H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF00000130H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF00000140H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

    00000150H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000001,0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF00000170H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000001-0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF0000010H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000001A0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000001B0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000001C0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000001D0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000001E0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000001F0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

    43

  • 8/9/2019 Comp Doc File Format

    18/25

    < ?(ample

    4& < b!tes containing the fi(ed compound document file identifier:

    00000000H  D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00

    & 4; b!tes containing a uniIue identifier follo%ed b! 1 b!tes containing a re'ision number and a 'ersion number.

    9hese 'alues can be s"ipped:

    00000000H  D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 0000000010H  00 00 00 00 00 00 00 00 3B 00 03 00 FE FF 0 00

    2& b!tes containing the b!te order identifier. 6t should al%a!s consist of the b!te seIuence FEH FFH:

    00000010H  00 00 00 00 00 00 00 00 3B 00 03 00 FE FF 0 00

    1& b!tes containing the size of sectors b!tes containing the size of short+sectors. 9he sector size is 54 b!tes and

    the short+sector size is ;1 b!tes here:

    00000010H  00 00 00 00 00 00 00 00 3B 00 03 00 FE FF 0 0000000020H  0, 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00

    5& 40 b!tes %ithout 'alid data can be ignored:

    00000020H  0, 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00

    ;& 1 b!tes containing the number of sectors used b! the sector allocation table $➜5.&. 9he *A9 uses onl! one sector

    here:

    00000020H  0, 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00

    3& 1 b!tes containing the *ec6D of the first sector used b! the director! $➜3&. 9he director! starts at sector 40 here:

    00000030H  0A 00 00 00 00 00 00 00 00 10 00 00 02 00 00 00

  • 8/9/2019 Comp Doc File Format

    19/25

  • 8/9/2019 Comp Doc File Format

    20/25

    < ?(ample

    5.- (hort3(ector #llocation &able

    9he **A9 $➜;.& starts at sector and consists onl! of this one sector as specified in the header. 9his is in line %ith the

    *A9 that contains the End Of !ain SecID at position . 9he *ec6D chain of the **A9 is therefore K J. *ector

    starts at file offset 00000,00H M 452; $➜1.2& and ma! loo" li"e this:

    00000,00H  01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 0000000,10H  05 00 00 00 0, 00 00 00 07 00 00 00 0- 00 00 0000000,20H  0 00 00 00 0A 00 00 00 0B 00 00 00 0C 00 00 0000000,30H  0D 00 00 00 0E 00 00 00 0F 00 00 00 10 00 00 0000000,40H  11 00 00 00 12 00 00 00 13 00 00 00 14 00 00 0000000,50H  15 00 00 00 1, 00 00 00 17 00 00 00 1- 00 00 0000000,,0H  1 00 00 00 1A 00 00 00 1B 00 00 00 1C 00 00 0000000,70H  1D 00 00 00 1E 00 00 00 1F 00 00 00 20 00 00 0000000,-0H  21 00 00 00 22 00 00 00 23 00 00 00 24 00 00 0000000,0H  25 00 00 00 2, 00 00 00 27 00 00 00 2- 00 00 0000000,A0H  2 00 00 00 2A 00 00 00 2B 00 00 00 2C 00 00 0000000,B0H  2D 00 00 00 FE FF FF FF 2F 00 00 00 FE FF FF FF

    00000,C0H  FE FF FF FF 32 00 00 00 33 00 00 00 34 00 00 0000000,D0H  35 00 00 00 FE FF FF FF FF FF FF FF FF FF FF FF00000,E0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF00000,F0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF00000700H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF00000710H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF00000720H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF00000730H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF00000740H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF00000750H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000007,0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF00000770H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000007-0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF0000070H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000007A0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000007B0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000007C0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000007D0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000007E0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF000007F0H  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

    9his results in the follo%ing *ec6D arra! for the **A9:

    Arra!

    inde(es 0 4 2 1 5 ; 3 < > 40 44 14 1 12 11 15 1; 13 1< 1> 50 54 5 52 51

    *ec6D arra! 4 2 1 5 ; 3 < > 40 44 4 1 12 11 15 J 13 J J 50 54 5 52 J J4

    All short+sectors starting %ith sector 51 are not used $special Free SecID %ith 'alue J4&.

    0

  • 8/9/2019 Comp Doc File Format

    21/25

  • 8/9/2019 Comp Doc File Format

    22/25

    < ?(ample

    3& 4; b!tes containing a uniIue identifier follo%ed b! 1 b!tes containing additional flags and t%o time stamps

    < b!tes each containing the creation time and last modification time of the storage $ ➜3..2&. 9his data can be

    s"ipped:

    00001,50H  10 0- 02 00 00 00 00 00 C0 00 00 00 00 00 00 4,00001,,0H  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000001,70H  00 00 00 00 03 00 00 00 -0 0D 00 00 00 00 00 00

    & 1 b!tes %ithout 'alid data can be s"ipped:

    00001,70H  00 00 00 00 03 00 00 00 -0 0D 00 00 00 00 00 00

    5.. (econd Director) +ntr)

    9he second director! entr! $%ith Dir6D 4& ma! loo" li"e this:

    00001,-0H  57 00 ,F 00 72 00 ,B 00 ,2 00 ,F 00 ,F 00 ,B 0000001,0H  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000001,A0H  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000001,B0H  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000001,C0H  12 00 02 00 02 00 00 00 04 00 00 00 FF FF FF FF00001,D0H  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000001,E0H  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000001,F0H  00 00 00 00 00 00 00 00 51 0B 00 00 00 00 00 00

    6mportant data is highlighted. 9he name of this entr! is G#or"boo"H it represents a stream the Dir6D of the left child

    node is the Dir6D of the right child node is 1 the *ec6D of the first sector is 0 and the stream size is00000=54 M 3 b!tes. 9he stream is shorter than 10>; b!tes therefore it is stored in the short+stream container

    stream.

    5..$ *emaining Director) +ntries

    9he remaining director! entries are read similar to the e(amples abo'e resulting in the follo%ing director!:

    DirID Name &)pe DirID of

    left child

    DirID of

    right child

    DirID of

     first

    member

    (ecID of

     first sector

    (tream

    si/e

    #llocation

    table

    0 Root ?ntr! root none none 4 2 215; *A9

    4 #or"boo" stream 1 none 0 3 **A9

    T01HUCompOb stream 2 none none 1; 32 **A9

    2 T01HUOle stream none none none 1< 0 **A9

    1 T05HU*ummar!6nformation stream none none none 1> 24 **A9

    5 empt!

    ; empt!

    3 empt!

  • 8/9/2019 Comp Doc File Format

    23/25

    ;

    b!tes $the minimum size of standard streams specified in the header ➜1.4& therefore the! are stored in the short+stream

    container stream and the **A9 is used to build the *ec6D chains of the streams.

    DirID (tream name #llocation table (ecID chain

    0 S!ort-stream container stream; *A9 K2 1 5 ; 3 J

    4 #or"boo" **A9 K0 4 2 1 5 12 11 15 J

    T01HUCompOb **A9 K1; 13 J

    2 T01HUOle **A9 K1 50 54 5 52 J

    5.. (hort3(tream Container (tream

    9he short+stream container stream is read b! concatenating all sectors specified in the *ec6D chain of the root storage

    entr! in the director!. 6n this e(ample the sectors 2 1 5 ; 3 ha'e to be read in this order resulting in a stream

    %ith a size of 25

  • 8/9/2019 Comp Doc File Format

    24/25

    > lossar!

    6 7lossar)

    &erm Description Chapter

    =!te order 9he order in %hich single b!tes of a bigger data t!pe are represented

    or stored.

    ➜1.

    Compound document ,ile format used to store se'eral obects in a single file obects can

    be organised hierarchicall! in storages and streams'

    ➜4.

    Compound document header *tructure in a com%ound document  containing initial settings.   ➜1.4

    Control stream Stream in a com%ound document  containing internal control data.   ➜5 ➜; ➜3

    Director! List of director& entries for all storages and streams in a compound

    document.

    ➜3.4

    Director! entr! Part of the director! containing rele'ant data for a storage or a

    stream.

    ➜3.

    Director! entr! identifier $Dir6D& ero+based inde( of a director& entr&.   ➜3.4

    Director! stream Sector c!ain containing the director&.   ➜3.4

    Dir6D ero+based inde( of a director& entr& $short for Gdirector& entr&

    identifier H&.

    ➜3.4

    ?nd Of Chain *ec6D *pecial sector identifier  used to indicate the end of a SecID c!ain.   ➜2.4

    ,ile offset Ph!sical position in a file.   ➜1.2

    ,ree *ec6D *pecial sector identifier  for unused sectors.   ➜2.4

    eader *hort for Gcom%ound document !eader H.   ➜1.4

    )aster sector allocation table

    $)*A9&

    SecID c!ain containing sector identifiers of all sectors used b! the

    sector allocation table.

    ➜5.4

    )*A9 *hort for Gmaster sector allocation tableH.   ➜5.4

    )*A9 *ec6D *pecial sector identifier  used to indicate that a sector  is part of the

    master sector allocation table.

    ➜2.4

    Red+blac" tree 9ree structure used to organise direct members of a storage.   ➜3.4

    Root storage =uilt+in storage that contains all other obects $storages and

    streams& in a com%ound document .

    Root storage entr!  Director& entr& representing the root storage.   ➜3.4

    *A9 *hort for Gsector allocation tableH.   ➜5.

    *A9 *ec6D *pecial sector identifier  used to indicate that a sector  is part of the

    sector allocation table.

    ➜2.4

    *ec6D ero+based inde( of a sector  $short for Gsector identifier H&.   ➜2.4

    *ec6D chain An arra! of sector identifiers $*ec6Ds& specif!ing the sectors that are

    part of a sector c!ain and thus enumerates all sectors used b! a

    stream.

    ➜2.

    *ector Part of a compound document %ith fi(ed size that contains an! "ind

    of stream $user stream or control stream& data.

    ➜2.4

    1

  • 8/9/2019 Comp Doc File Format

    25/25

    > lossar!

    &erm Description Chapter

    *ector allocation table $*A9& Arra! of sector identifiers containing the SecID c!ains of all user

    streams and a fe% internal control streams.

    ➜5.

    *ector chain An arra! of sectors that forms a stream as a %hole.   ➜2.

    *ector identifier $*ec6D& ero+based inde( of a sector .   ➜2.4

    *hort+sector Part of the s!ort-stream container stream %ith fi(ed size thatcontains one part of a s!ort-stream.

    ➜;.4

    *hort+sector allocation table

    $**A9&

    Arra! of sector identifiers containing the SecID c!ains of all s!ort-

    streams.

    ➜;.

    *hort+stream A user stream shorter than a specific size.   ➜;

    *hort+stream container stream An internal stream that contains all s!ort-streams.   ➜;.4

    **A9 *hort for Gs!ort-sector allocation tableH.   ➜;.

    *torage Part of a com%ound document  used to separate streams into different

    groups similar to directories in a file s!stem.

    *tream Part of a com%ound document  containing user data or internal

    control data similar to files in a file s!stem.

    *tream offset irtual position in a stream.   ➜;.4 ➜3.

    9ime stamp alue specif!ing date and time.   ➜3..2

    8ser stream Stream in a com%ound document  containing user data.   ➜

    5