a 1 cycle-per-byte xml accelerator
DESCRIPTION
A 1 Cycle-Per-Byte XML Accelerator. Zefu Dai, Nick Ni and Jianwen Zhu Presented by Zefu Dai. University of Toronto. What is XML. Extensible Markup Language A Platform independent tool for data exchange and representation Widely used in: Web service Database system Scientific application - PowerPoint PPT PresentationTRANSCRIPT
2010-2-19 University of Toronto 1
A 1 Cycle-Per-Byte XML Accelerator
Zefu Dai, Nick Ni and Jianwen Zhu
Presented by Zefu Dai
University of Toronto
2010-2-19 University of Toronto 2
What is XML Extensible Markup
Language
A Platform independent tool for data exchange and representation
Widely used in:- Web service
- Database system
- Scientific application
- …
<?xml version = “1.0” encoding = “UTF-8” ?><!-- this is an example xml document --><University> <Department name = “ECE”> <Students> <freshman>310</freshman> <sophomore>298</sophomore> <junior>213</junior> <senior>178</senior> <graduate>86</graduate> … </Students> <Professors> <professor name=“Mike” field=“network”/> … </Professors> </Department> …</University>
2010-2-19 University of Toronto 3
Performance Threat: XML Parsing70 mins loading 3 GB XML file, 26x slower than loa
ding plain text
>1s per bank transaction, how many transactions per day?
Average 175 K insts parsing 1KB XML data (IBM XML4C)
With network speed reaching tens of Gbps, XML Parsing speed outstands network improvement as the performance bottleneck
2010-2-19 University of Toronto 4
Previous work Cycle Per Byte (CPB) = Average cycle to process each byte of
XML data
Multi-core Acceleration- Require a pre-parsing process, done sequentially- 30 CPB on a 4-core processor
SIMD Acceleration- without in memory tree construction and validation- 6-15 CPB
Hardware Accelerator- Most commercial products do not reveal performance metric and
design details- 10-40 CPB
2010-2-19 University of Toronto 5
Our Design Causes of the parsing slowdown
- Text-based Data Stream- Variable-length string comparison- Poor memory performance due to streaming and memory back-tracing
An XML Parsing Accelerator implemented in FPGA- Fixed-length string operation- Optimized circuits for string comparison- Common case optimized stallable pipeline- data structure for high bandwidth on-chip memory
Achieve 1 CPB processing speed and saturate 1 Gbps Ethernet link, running at 125 MHz
2010-2-19 University of Toronto 6
Outlines BackgroundHigh-level architectureDesign DetailsEvaluation
2010-2-19 University of Toronto 7
Tasks of XML ParserWell-formed Checking
- Check if the document confirms to XML syntax rules
Schema Validation- Check if the document confirms to XML semantic rules
specified in DTD or Schema files
DOM Construction- Capture the parental relationship between elements and
attributes and store them into memory in Document Object Model (DOM) format
2010-2-19 University of Toronto 8
Well-formed Checking exampleHas an unique
root element
<?xml version = “1.0” encoding = “UTF-8” ?><!-- this is an example xml document --><University> <Department name = “ECE”> <Students> <freshman>310</freshman> <sophomore>298</sophomore> <junior>213</junior> <senior>178</senior> <graduate>86</graduate> … </Students> <Professors> <professor name=“Mike” field=“network”/> … </Professors> </Department> …</University>
2010-2-19 University of Toronto 9
Well-formed Checking exampleHas an unique
root element
Elements must be closed and nested properly
<?xml version = “1.0” encoding = “UTF-8” ?><!-- this is an example xml document --><University> <Department name = “ECE”> <Students> <freshman>310</freshman> <sophomore>298</sophomore> <junior>213</junior> <senior>178</senior> <graduate>86</graduate> … </Students> <Professors> <professor name=“Mike” field=“network”/> … </Professors> </Department> …</University>
2010-2-19 University of Toronto 10
Well-formed Checking exampleHas an unique root
element
Elements must be closed and nested properly
Unique attributes within an element
…
<?xml version = “1.0” encoding = “UTF-8” ?><!-- this is an example xml document --><University> <Department name = “ECE”> <Students> <freshman>310</freshman> <sophomore>298</sophomore> <junior>213</junior> <senior>178</senior> <graduate>86</graduate> … </Students> <Professors> <professor name=“Mike” field=“network”/> … </Professors> </Department> …</University>
2010-2-19 University of Toronto 11
XML Schema ExampleSpecify permitted
child elements/attributes
<?xml version =“1.0”?><xs:schema xmlns:xs=“http://www.w3.org/XMLSchema”> <xs:element name=“University”> <xs:complexType> <xs:element name=“Department” minOccurs=“2” > <xs:complexType> <xs:sequence> <xs:element name=“Students”>
<xs:complexType> <xs:all> <xs:element name=“freshman” type=“xs:string” /> <xs:element name=“sophomore” type=“xs:string” /> <xs:element name=“junior” type=“xs:string” /> <xs:element name=“senior” type=“xs:string” /> <xs:element name=“graduate” type=“xs:string” /> </xs:all> </xs:complexType> </xs:element>
<xs:element name=“Professors” type=“professorType”/> </xs:sequency> </xs:complexType> </xs:element> </xs:complexType> </xs:element></xs:schema>
2010-2-19 University of Toronto 12
XML Schema ExampleSpecify permitted
child elements/attributes
Specify type of content
<?xml version =“1.0”?><xs:schema xmlns:xs=“http://www.w3.org/XMLSchema”> <xs:element name=“University”> <xs:complexType> <xs:element name=“Department” minOccurs=“2” > <xs:complexType> <xs:sequence> <xs:element name=“Students”>
<xs:complexType> <xs:all> <xs:element name=“freshman” type=“xs:string” /> <xs:element name=“sophomore” type=“xs:string” /> <xs:element name=“junior” type=“xs:string” /> <xs:element name=“senior” type=“xs:string” /> <xs:element name=“graduate” type=“xs:string” /> </xs:all> </xs:complexType> </xs:element>
<xs:element name=“Professors” type=“professorType”/> </xs:sequency> </xs:complexType> </xs:element> </xs:complexType> </xs:element></xs:schema>
2010-2-19 University of Toronto 13
XML Schema ExampleSpecify permitted
child elements/attributes
Specify type of content
Specify occurrence limit
…
<?xml version =“1.0”?><xs:schema xmlns:xs=“http://www.w3.org/XMLSchema”> <xs:element name=“University”> <xs:complexType> <xs:element name=“Department” minOccurs=“2” > <xs:complexType> <xs:sequence> <xs:element name=“Students”>
<xs:complexType> <xs:all> <xs:element name=“freshman” type=“xs:string” /> <xs:element name=“sophomore” type=“xs:string” /> <xs:element name=“junior” type=“xs:string” /> <xs:element name=“senior” type=“xs:string” /> <xs:element name=“graduate” type=“xs:string” /> </xs:all> </xs:complexType> </xs:element>
<xs:element name=“Professors” type=“professorType”/> </xs:sequency> </xs:complexType> </xs:element> </xs:complexType> </xs:element></xs:schema>
2010-2-19 University of Toronto 14
DOM ConstructionCreate in-memory tree
structure for XML
Provide application accesses through tree operations
Root
University
ElementDepartment
Element
Department
AttributeName
ECE
Text
Element
Students
Element
Professors
Elementjunior
Text213
Elementprofessor
AttributeName
AttributeField
…
… …
Mike
Text
network
Text
2010-2-19 University of Toronto 15
Outlines BackgroundHigh-level architectureDesign DetailsEvaluation
2010-2-19 University of Toronto 16
Top Level Diagram
Character Scanner
Token Extractor
Token Handler
DOM Constructor
Rule Match Unit
Rule Check Unit
Write Buffer
XML Cycle Buffer
RNTRHT RCT
MemoryController
Well-formed Checking Stage
Schema Validation Stage
DOM Construction Stage
to DRAM
XML Doc Ethernet
1Gbps
8b
8b 8b 64b
64b 64b64b
256b 128b 64b
32b 32b
FIFO FIFO
2010-2-19 University of Toronto 17
Top Level Diagram
Character Scanner
Token Extractor
Token Handler
DOM Constructor
Rule Match Unit
Rule Check Unit
Write Buffer
XML Cycle Buffer
RNTRHT RCT
MemoryController
Well-formed Checking Stage
Schema Validation Stage
DOM Construction Stage
to DRAM
XML Doc Ethernet
1Gbps
8b
8b 8b 64b
64b 64b64b
256b 128b 64b
32b 32b
FIFO FIFO
<Elem attr=‘xyz’> content</elem>
2010-2-19 University of Toronto 18
Top Level Diagram
Character Scanner
Token Extractor
Token Handler
DOM Constructor
Rule Match Unit
Rule Check Unit
Write Buffer
XML Cycle Buffer
RNTRHT RCT
MemoryController
Well-formed Checking Stage
Schema Validation Stage
DOM Construction Stage
to DRAM
XML Doc Ethernet
1Gbps
8b
8b 8b 64b
64b 64b64b
256b 128b 64b
32b 32b
FIFO FIFO<Elem attr=‘xyz’>content</Elem>
2010-2-19 University of Toronto 19
Top Level Diagram
Character Scanner
Token Extractor
Token Handler
DOM Constructor
Rule Match Unit
Rule Check Unit
Write Buffer
XML Cycle Buffer
RNTRHT RCT
MemoryController
Well-formed Checking Stage
Schema Validation Stage
DOM Construction Stage
to DRAM
XML Doc Ethernet
1Gbps
8b
8b 8b 64b
64b 64b64b
256b 128b 64b
32b 32b
FIFO FIFO<Elem attr=‘xyz’> content </Elem>
2010-2-19 University of Toronto 20
Top Level Diagram
Character Scanner
Token Extractor
Token Handler
DOM Constructor
Rule Match Unit
Rule Check Unit
Write Buffer
XML Cycle Buffer
RNTRHT RCT
MemoryController
Well-formed Checking Stage
Schema Validation Stage
DOM Construction Stage
to DRAM
XML Doc Ethernet
1Gbps
8b
8b 8b 64b
64b 64b64b
256b 128b 64b
32b 32b
FIFO FIFO
Elem attr xyz content
Elem attr xyz content
2010-2-19 University of Toronto 21
Top Level Diagram
Character Scanner
Token Extractor
Token Handler
DOM Constructor
Rule Match Unit
Rule Check Unit
Write Buffer
XML Cycle Buffer
RNTRHT RCT
MemoryController
Well-formed Checking Stage
Schema Validation Stage
DOM Construction Stage
to DRAM
XML Doc Ethernet
1Gbps
8b
8b 8b 64b
64b 64b64b
256b 128b 64b
32b 32b
FIFO FIFO
Elem attr xyz content
H(Elem) H(attr)
Elem attr xyz content
rule name rule content
2010-2-19 University of Toronto 22
Top Level Diagram
Character Scanner
Token Extractor
Token Handler
DOM Constructor
Rule Match Unit
Rule Check Unit
Write Buffer
XML Cycle Buffer
RNTRHT RCT
MemoryController
Well-formed Checking Stage
Schema Validation Stage
DOM Construction Stage
to DRAM
XML Doc Ethernet
1Gbps
8b
8b 8b 64b
64b 64b64b
256b 128b 64b
32b 32b
FIFO FIFO
Elem attr
xyz content
Elem
attr content
xyz
rule name
rule content
2010-2-19 University of Toronto 23
Outlines BackgroundHigh-level architectureDesign DetailsEvaluation
2010-2-19 University of Toronto 24
Recurring Idioms (Dwarfs)Identified 3 recurring computational idioms
(referred to as Dwarfs) - One-to-one String Matching
- One-to-many String Membership Test
- One-to-many String Search
One of the major reasons accounting for low performance
2010-2-19 University of Toronto 25
Dwarf I: One-to-one String Matching
Tests if a subject string equals to a reference string
Example: correct nesting
The string is variable-length- Not efficient on conventional architecture
Solution: memory stack- Convert variable-length string comparison to fixed-length
character comparison
2010-2-19 University of Toronto 26
Dwarf II: One-to-many String Membership Test
Tests if a subject string equals to any member of a set of reference strings
Example: unique attribute within an element
String comparison against all previously arrived attributes belonging to the same element- Expensive memory back-tracing
Solution: Bloom Filter- achieved in one memory lookup
<student name=“john” gender=“m”, hobby=“guitar” field=“math”>
2010-2-19 University of Toronto 27
Dwarf III: One-to-many String Search
“Finds” a subject string among a set of reference strings (different to just “test”)
Example: Search for corresponding schema rule
string comparison against all candidates
- Undeterministic look up time
Solution: Balance Routing Table Scheme Achieved in one memory lookup
2010-2-19 University of Toronto 28
Dwarf II: Bloom FilterExample: attribute name uniqueness checkingCommon case: attribute name is unique
- Filter out obvious cases using Bloom Filter- Lookup into a bit array instead of compare strings
Uncommon case: attribute name may already exists- Stall the entire design- Do all necessary string comparisons to confirm the
existences of the incoming sting- Assumption: low occurring rate (high cost)
2010-2-19 University of Toronto 29
Solution II: Bloom FilterFor each attribute name:
- Generate N independent hash codes
- Look up the bit array
- Update the bit array
0 0 0 0 0 0 0 0 0 0
Current set = {}<student name=“john” gender=“m”, hobby=“guitar” field=“math”>
2010-2-19 University of Toronto 30
Solution II: Bloom FilterFor each attribute name:
- Generate N independent hash codes
- Look up the bit array
- Update the bit array
0 1 0 0 0 0 1 0 0 0
Current set = {name}<student name=“john” gender=“m”, hobby=“guitar” field=“math”>
2010-2-19 University of Toronto 31
Solution II: Bloom FilterFor each attribute name:
- Generate N independent hash codes
- Look up the bit array
- Update the bit array
0 1 0 1 0 0 1 0 1 1
Current set = {name, gender, hobby}<student name=“john” gender=“m”, hobby=“guitar” field=“math”>
2010-2-19 University of Toronto 32
Solution II: Bloom FilterFor each attribute name:
- Generate N independent hash codes
- Look up the bit array
- Update the bit array
Unique!0 1 0 1 0 0 1 0 1 1
Current set = {name, gender, hobby}
Input = field
<student name=“john” gender=“m”, hobby=“guitar” field=“math”>
2010-2-19 University of Toronto 33
Solution II: Bloom FilterFor each attribute name:
- Generate N independent hash codes
- Look up the bit array
- Update the bit array
False Positive!0 1 0 1 0 0 1 0 1 1
Current set = {name, gender, hobby}
Input = field
<student name=“john” gender=“m”, hobby=“guitar” field=“math”>
2010-2-19 University of Toronto 34
Bloom Filter ImplementationImplement the Bloom Filter algorithm in a pipeline
- Attribute name usually has multiple characters- Allow multiple processing cycles for each attribute name
HashCodeGenerator
Input character
…
0 31
0
k
h2
h1
hk
… … … …
Attribute name end Addr_valid Data_valid
update
positive
Bit ArrayIndexing Stage
Hash code Generating Stage
Matching Stage
Output
2010-2-19 University of Toronto 35
Outlines BackgroundHigh-level architectureDesign DetailsEvaluation
2010-2-19 University of Toronto 36
Experimental SetupSoftware XML parsers test
XML Parsing Accelerator testbed
Hardware and software platform Tested XML parsing librariesIntel Core 2 Quad Q9300 (2.5GHz,
6MB L2 Cache)2GB DDR2-800 MemoryDebian Linux 2.6.18-6 x86-64GNU C 4.1.2
Xerces-c 2.8.0 x86-64Libxml2DOM4J-1.6JAVA API for XML Processing
(JAXP) 1.6.0
8b
XML Engine
Ethernet M
ac
asyn_fifo
MC
UART
125MHz
8b
8b
cmd
data
Display
DDR2 Memory
Xilinx Virtex - 5 XC5VSX50T
125MHz 200MHz
Laptop1Gbps SGMII
UDP
2010-2-19 University of Toronto 37
BenchmarksGroup Benchmark XML Size (KB) XSD Size (KB) Source
DOM Parsing
Security 3 - Intel Corporation
Structure 12 - codesynthesis
Tpox 15 - tpox
Hl7 136 - hl7-testharness
Qedeq 211 - qedeq.org
Xmark 116,000 - xml-benchmark
Schema Validation
CustomInfo 1 2 Intel Corporation
CDCatalog 105 2 w3schools
Workflow 13 10 qedeq.org
2010-2-19 University of Toronto 38
Test ResultsMetric: Raw Throughput (Gbps)
Benchmark JAXP DOM4J Libxml2 Xerces-c XPA XPAmax
Security 0.199 0.059 0.294 0.100 1.000 1.040Structure 0.274 0.110 0.202 0.091 1.000 1.040Tpox 0.292 0.099 0.264 0.124 1.000 1.040Hl7 0.415 0.189 0.360 0.128 1.000 1.040Qedeq 0.481 0.221 0.338 0.133 1.000 1.040Xmark 0.550 0.256 0.416 0.187 1.000 1.040Average_par 0.373 0.158 0.314 0.127 1.000 1.040CustomInfo 0.062 - 0.107 0.054 1.000 1.040CDCatalog 0.128 - 0.232 0.113 1.000 1.040Workflow 0.227 - 0.396 0.185 1.000 1.040Average_vld 0.161 - 0.283 0.134 1.000 1.040Average_all 0.267 0.158 0.299 0.131 1.000 1.040
2010-2-19 University of Toronto 39
Test ResultsMetric: Cycle Per Byte
Benchmark JAXP DOM4J Libxml2 Xerces-c XPA
Security 100.6 339.7 67.9 201.0 1.0Structure 73.1 181.3 99.1 220.5 1.0Tpox 68.5 201.3 75.9 161.0 1.0Hl7 48.2 106.0 55.6 155.8 1.0Qedeq 41.5 90.4 59.2 150.6 1.0Xmark 36.4 78.0 48.0 106.7 1.0Average_par 53.6 126.9 63.6 157.2 1.0CustomInfo 321.8 - 186.2 373.7 1.0CDCatalog 156.5 - 86.3 176.8 1.0Workflow 88.3 - 50.4 108.3 1.0Average_vld 124.4 70.6 148.8 1.0Average_all 75.0 126.9 66.9 152.9 1.0
2010-2-19 University of Toronto 40
Scalability Examination Bloom Filter efficiency
- Test Attribute Name Uniqueness circuit with generated test files- Count the number of false positives
Bloom Filter Google Key Words Wikipedia Key Words
Bit_Array 4k 8k 16k 4k 8k 16k
2 Hash Func.
64b 1 66 509 6 129 502
256b 0 5 60 1 8 56
1kb 0 1 6 1 2 2
2kb 0 0 1 0 0 0
3 Hash F
u. 256b 0 0 14 1 3 9
1kb 0 0 1 0 0 0
2kb 0 0 0 0 0 0
2010-2-19 University of Toronto 41
Implementation CostTarget Device: Xilinx Virtex-5 XC5VSX50T
LogicUtilization
Slice Register Slice LUT Block RAM
XPA 4455 (13%) 6594 (20%) 13 (11%)MC 1960 (6%) 1683 (5%) 5 (3%))EMAC 927 (2%) 712 (2%) 3 (2%)UART 151 (1%) 187 (1%) 2 (1%)TOTAL 7493 (22%) 9176 (28%) 23 (17%)XC5VSX50T 32640 32640 132
2010-2-19 University of Toronto 42
ConclusionFPGA is a valid contender in XML processing
- Low clock frequency requirement to achieve high throughput
- Scalable to process large XML documents
- Moderate hardware cost to achieve high performance
Future work- Fully conformance to XML specification
2010-2-19 University of Toronto 43
Questions?
Character Scanner
Token Extractor
Token Handler
DOM Constructor
Rule Match Unit
Rule Check Unit
Write Buffer
XML Cycle Buffer
RNTRHT RCT
MemoryController
Well-formed Checking Stage
Schema Validation Stage
DOM Construction Stage
to DRAM
XML Doc Ethernet
1Gbps
8b
8b 8b 64b
64b 64b64b
256b 128b 64b
32b 32b
FIFO FIFO