2011 11-mozcamp-111115062121-phpapp02

Post on 03-Aug-2015

227 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

PDF.JSJulian Viereck

@jviereckjviereck.dev@gmail.com

BespinSkywriter

Ace

BespinSkywriter

Ace

FirefoxDevTools

BespinSkywriter

Ace

FirefoxDevTools

ETH Zurich

BespinSkywriter

Ace

FirefoxDevTools

ETH Zurich

PDF.JS

?

Overview

• What is PDF.JS

• How PDF is structured

• Processing in PDF.JS

• Images & Fonts

• Infrastructure

• Problems & Todos

• Demo

What is PDF.JS

What is PDF.JS

• building faithful & efficient PDF renderer

What is PDF.JS

• building faithful & efficient PDF renderer

• HTML5 technology experiment

What is PDF.JS

• building faithful & efficient PDF renderer

• HTML5 technology experiment

• no native code

What is PDF.JS

• building faithful & efficient PDF renderer

• HTML5 technology experiment

• no native code

• secure (web sandbox)

What is PDF.JS

• building faithful & efficient PDF renderer

• HTML5 technology experiment

• no native code

• secure (web sandbox)

• Mozilla Labs Project - Open Source

Most vulnerable programs

Source: http://www.csis.dk/en/csis/news/3321

How PDF is structured

PDF file

How PDF is structuredHeader PDF version

PDF file

How PDF is structuredHeader

Body

[Objects]

sequence of objets

fonts, drawing cmds, images, words, bookmarks, form fields

PDF version

PDF file

How PDF is structuredHeader

Body

[Objects]

xRef Table

sequence of objets

fonts, drawing cmds, images, words, bookmarks, form fields

mapping objID ⇔ byte offset

PDF version

PDF file

root objID, xRef byte offset

root obj = ref to pages catalog

How PDF is structuredHeader

Body

[Objects]

xRef Table

Trailer

sequence of objets

fonts, drawing cmds, images, words, bookmarks, form fields

mapping objID ⇔ byte offset

PDF version

PDF file

Processing in PDF.JS

Processing in PDF.JS

• get plain Uint8Array via XHR2, build Stream

Processing in PDF.JS

• get plain Uint8Array via XHR2, build Stream

• new PDFDoc(stream): read xRef, root object

Processing in PDF.JS

• get plain Uint8Array via XHR2, build Stream

• new PDFDoc(stream): read xRef, root object

• page = PDFDoc.getPage(N)

Processing in PDF.JS

• get plain Uint8Array via XHR2, build Stream

• new PDFDoc(stream): read xRef, root object

• page = PDFDoc.getPage(N)

• page.startRendering(graphics)

Processing in PDF.JS

• get plain Uint8Array via XHR2, build Stream

• new PDFDoc(stream): read xRef, root object

• page = PDFDoc.getPage(N)

• page.startRendering(graphics)

• read & convert all PDF cmds ➟ IR

Processing in PDF.JS

• get plain Uint8Array via XHR2, build Stream

• new PDFDoc(stream): read xRef, root object

• page = PDFDoc.getPage(N)

• page.startRendering(graphics)

• read & convert all PDF cmds ➟ IR

IntermediateRepresentation

PartialEvaluator

Processing in PDF.JS

• get plain Uint8Array via XHR2, build Stream

• new PDFDoc(stream): read xRef, root object

• page = PDFDoc.getPage(N)

• page.startRendering(graphics)

• read & convert all PDF cmds ➟ IR

IntermediateRepresentation

PartialEvaluator

Processing in PDF.JS

• get plain Uint8Array via XHR2, build Stream

• new PDFDoc(stream): read xRef, root object

• page = PDFDoc.getPage(N)

• page.startRendering(graphics)

• read & convert all PDF cmds ➟ IR

• load required objects (fonts, images)

IntermediateRepresentation

PartialEvaluator

Processing in PDF.JS

• get plain Uint8Array via XHR2, build Stream

• new PDFDoc(stream): read xRef, root object

• page = PDFDoc.getPage(N)

• page.startRendering(graphics)

• read & convert all PDF cmds ➟ IR

• load required objects (fonts, images)

• graphics.executeIR(IR)

IntermediateRepresentation

CanvasGraphics

PartialEvaluator

Processing in PDF.JS

• get plain Uint8Array via XHR2, build Stream

• new PDFDoc(stream): read xRef, root object

• page = PDFDoc.getPage(N)

• page.startRendering(graphics)

• read & convert all PDF cmds ➟ IR

• load required objects (fonts, images)

• graphics.executeIR(IR)

IntermediateRepresentation

Why IR?Data

Why IR?Partial

EvaluatorData

Why IR?Partial

EvaluatorData

Why IR?Partial

Evaluator“get page 2”

Data

Why IR?Partial

Evaluator“get page 2”

Data

builds

Why IR?Partial

Evaluator

draw(obj#3, dict.x, dict.y

)

“get page 2”Data

builds

Why IR?Partial

Evaluator

draw(obj#3, dict.x, dict.y

)

“get page 2”Data

Graphics

builds

Why IR?Partial

Evaluator

draw(obj#3, dict.x, dict.y

)

“get page 2”Data

Graphics

builds

Why IR?Partial

Evaluator

draw(obj#3, dict.x, dict.y

)

“get page 2”Data

Graphics

builds

drawing cmds

Why IR?Partial

Evaluator

draw(obj#3, dict.x, dict.y

)

“get page 2”Data

Graphics

buildsobj#3?dict.x, .y?

drawing cmds

Why IR?Partial

Evaluator

draw(obj#3, dict.x, dict.y

)

“get page 2”Data

Graphics

buildsobj#3?dict.x, .y?

drawing cmds

Why IR?Partial

Evaluator

draw(obj#3, dict.x, dict.y

)

“get page 2”Data

Graphics

buildsobj#3?dict.x, .y?

obj#3 = ”foo”x = 20y = 30

drawing cmds

Why IR?Partial

Evaluator

draw(obj#3, dict.x, dict.y

)

“get page 2”Data

Graphics

buildsobj#3?dict.x, .y?

obj#3 = ”foo”x = 20y = 30

drawing cmds

Why IR?Partial

Evaluator

draw(obj#3, dict.x, dict.y

)

“get page 2”Data

Graphics

buildsobj#3?dict.x, .y?

obj#3 = ”foo”x = 20y = 30

draw oncanvas

drawing cmds

Problem Processing

Problem Processing

• Extracting data slow (compressed)

Problem Processing

• Extracting data slow (compressed)

• Transform data (images) slow

Problem Processing

• Extracting data slow (compressed)

• Transform data (images) slow

• Sometimes a lot of objects on page

Problem Processing

• Extracting data slow (compressed)

• Transform data (images) slow

• Sometimes a lot of objects on page

➡ Freezes UI

Problem Processing

• Extracting data slow (compressed)

• Transform data (images) slow

• Sometimes a lot of objects on page

➡ Freezes UI

➡ Use WebWorker

Problem Processing

• Extracting data slow (compressed)

• Transform data (images) slow

• Sometimes a lot of objects on page

➡ Freezes UI

➡ Use WebWorker

➡ :( no direct memory access, postMessage

Data

MainThread

Web Worker

PartialEvaluatorData

MainThread

Web Worker

PartialEvaluatorData

“get page 2”

data

MainThread

Web Worker

PartialEvaluatorData Data

“get page 2”

data

MainThread

Web Worker

PartialEvaluatorData

builds

Data“get page 2”

data

MainThread

Web Worker

PartialEvaluator

draw(obj#3, dict.x, dict.y

)

Data

builds

Data“get page 2”

data

MainThread

Web Worker

PartialEvaluator

draw(obj#3, dict.x, dict.y

)

Data

builds

Data“get page 2”

data

MainThread

Web Worker

PartialEvaluator

draw(obj#3, dict.x, dict.y

)

Data

builds

Data“get page 2”

data

draw(“foo”, 20, 30

)

MainThread

Web Worker

PartialEvaluator

draw(obj#3, dict.x, dict.y

)

Data

builds

Data“get page 2”

data

draw(“foo”, 20, 30

)

MainThread

Web Worker

IR

PartialEvaluator

draw(obj#3, dict.x, dict.y

)

Data

Graphics

builds

Data“get page 2”

data

draw(“foo”, 20, 30

)

MainThread

Web Worker

IR

PartialEvaluator

draw(obj#3, dict.x, dict.y

)

Data

Graphics

builds

Data“get page 2”

data

draw(“foo”, 20, 30

)

MainThread

Web Worker

IR

IR cmds

PartialEvaluator

draw(obj#3, dict.x, dict.y

)

Data

Graphics

builds

Data“get page 2”

data

draw(“foo”, 20, 30

)

MainThread

Web Worker

IR

IR cmds

PartialEvaluator

draw(obj#3, dict.x, dict.y

)

Data

Graphics

builds

draw oncanvas

Data“get page 2”

data

draw(“foo”, 20, 30

)

MainThread

Web Worker

IR

IR cmds

5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj

5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj

PartialEvaluator

5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj

PartialEvaluator xRef, catalog, resources+

5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj Graphics

PartialEvaluator xRef, catalog, resources+

setGState: [ LW: 10 ]dependency: [ font0 ]setFont: font0, 12beginTextmoveText: 100, 700showText: “Hello World!”endTextmoveTo: 50, 600lineTo: 400, 600stroke

5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj Graphics

PartialEvaluator xRef, catalog, resources+

setGState: [ LW: 10 ]dependency: [ font0 ]setFont: font0, 12beginTextmoveText: 100, 700showText: “Hello World!”endTextmoveTo: 50, 600lineTo: 400, 600stroke

5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj Graphics

PartialEvaluator xRef, catalog, resources+

setGState: [ LW: 10 ]dependency: [ font0 ]setFont: font0, 12beginTextmoveText: 100, 700showText: “Hello World!”endTextmoveTo: 50, 600lineTo: 400, 600stroke

5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj Graphics

PartialEvaluator xRef, catalog, resources+ IR

Images

Images• JPEG streams:

Images• JPEG streams:

• DOMImg.src = 'data:image/jpeg;base64,' + window.btoa(bytesToString(bytes));

Images• JPEG streams:

• DOMImg.src = 'data:image/jpeg;base64,' + window.btoa(bytesToString(bytes));

• If not JPEG stream:

Images• JPEG streams:

• DOMImg.src = 'data:image/jpeg;base64,' + window.btoa(bytesToString(bytes));

• If not JPEG stream:

• read bytes, convert to colorspace

Images• JPEG streams:

• DOMImg.src = 'data:image/jpeg;base64,' + window.btoa(bytesToString(bytes));

• If not JPEG stream:

• read bytes, convert to colorspace

• imgData = canvas.getImageData()

Images• JPEG streams:

• DOMImg.src = 'data:image/jpeg;base64,' + window.btoa(bytesToString(bytes));

• If not JPEG stream:

• read bytes, convert to colorspace

• imgData = canvas.getImageData()

• fillWithPixelData(bytes, imgData)

Images• JPEG streams:

• DOMImg.src = 'data:image/jpeg;base64,' + window.btoa(bytesToString(bytes));

• If not JPEG stream:

• read bytes, convert to colorspace

• imgData = canvas.getImageData()

• fillWithPixelData(bytes, imgData)

• canvas.putImageData(imgData)

Jpeg, but...

Jpeg, but...

• no natives support for CMYK Jpeg

Jpeg, but...

• no natives support for CMYK Jpeg

➡ use JS implementation

Jpeg, but...

• no natives support for CMYK Jpeg

➡ use JS implementation

• no native support for Jpeg 2000

Jpeg, but...

• no natives support for CMYK Jpeg

➡ use JS implementation

• no native support for Jpeg 2000

➡ use EMScripten: C-Lib ➟ JS

Jpeg, but...

• no natives support for CMYK Jpeg

➡ use JS implementation

• no native support for Jpeg 2000

➡ use EMScripten: C-Lib ➟ JS

‣ works, but not that performant

Fonts

Fonts

• There are lots of different font formats!

Fonts

• There are lots of different font formats!

• fonts are converted to OpenType

Fonts

• There are lots of different font formats!

• fonts are converted to OpenType

• use CSS: @font-face { font-family:'font0'; src:url(data:font/opentype;base64, ...)

Fonts

• There are lots of different font formats!

• fonts are converted to OpenType

• use CSS: @font-face { font-family:'font0'; src:url(data:font/opentype;base64, ...)

• some fonts can’t be converted :(

Fonts

• There are lots of different font formats!

• fonts are converted to OpenType

• use CSS: @font-face { font-family:'font0'; src:url(data:font/opentype;base64, ...)

• some fonts can’t be converted :(

• paint them

Fonts

Type I convert to Type II

Type II “use directly”

Type III paint ourself

CDI convert to Type II

Fonts

Type I convert to Type II

Type II “use directly”

Type III paint ourself

CDI convert to Type II

still needto repair

fonts!

Infrastructure

Infrastructure• Using GitHub

Infrastructure• Using GitHub

• Issue Tracker

Infrastructure• Using GitHub

• Issue Tracker

• Pull Requests

Infrastructure• Using GitHub

• Issue Tracker

• Pull Requests

• Wiki

Infrastructure• Using GitHub

• Issue Tracker

• Pull Requests

• Wiki

• Update gh-pages on every push

Infrastructure• Using GitHub

• Issue Tracker

• Pull Requests

• Wiki

• Update gh-pages on every push

• Testing:

Infrastructure• Using GitHub

• Issue Tracker

• Pull Requests

• Wiki

• Update gh-pages on every push

• Testing:

• In Pull Request: “@pdfjsbot test”

Infrastructure• Using GitHub

• Issue Tracker

• Pull Requests

• Wiki

• Update gh-pages on every push

• Testing:

• In Pull Request: “@pdfjsbot test”

• Runs tests on AC2 instance

Infrastructure

Infrastructure

• AreWePdfYet?

Infrastructure

• AreWePdfYet?

• Take top100 PDFs from Google

Infrastructure

• AreWePdfYet?

• Take top100 PDFs from Google

• render the first 5 pages each

Infrastructure

• AreWePdfYet?

• Take top100 PDFs from Google

• render the first 5 pages each

• compare to Preview

Infrastructure

• AreWePdfYet?

• Take top100 PDFs from Google

• render the first 5 pages each

• compare to Preview

• http://people.mozilla.com/~bdahl/corpusreport/test/ref/

Todo = Help :)

Worker Canvas

'Read-Only' Memory Web Worker

Faster Canvas Rendering

CMYK JpegJpeg2000

Font Load Event

WebPrint API

XHR Range Support

Font Support

SVG Backend

(text selection [Gecko])

“HTML5” Backend

Search | Selection | Copy

Input Forms

More Parts Of Spec

Improve Viewer

Pref & MemoryAnalysis

Improve Test Infrastructure

More Testing!

More Testing

• use PDF.JS extension!

• http://mozilla.github.com/pdf.js/extensions/firefox/pdf.js.xpi

• report broken PDFs!

• help us categorize issues

Feedback Feature

Demo

Github: https://github.com/mozilla/pdf.js

Twitter: @pdfjs

Mailing List: https://groups.google.com/group/mozilla.dev.pdf-js/topics

IRC: irc.mozilla.org #pdfjs

Engineering Weekly Call:

Thursday - 10:00am PDT, 17:00 UTC

ReadmeIssuesWiki

Q & A

top related