pdf generation with itext presented by greg holling
TRANSCRIPT
PDF Generation with iText
Presented by Greg Holling
What is iText?
Java/C# library Open source
Generates PDF on-the-fly Servlet- and JSP-friendly
PDF can be generated by a servlet Supports lots of PDF functionality
Bookmarks, watermarks PDF forms Digital signatures
The Good
Mature Library Deep & broad PDF support Open Source Easy to create “preview” PDF from servlet/JSP Active user base & mailing list Training & consulting available Lots of online examples Book “iText in Action” (preview 2nd edition)
The Bad
Javadoc (and code comments) are often sparse Need the book to use iText effectively eBook costs $35, and has problems
Current edition is somewhat dated New edition is incomplete (MEAP) Explanations are buried in examples Some information is difficult to find
The Ugly Multiple ways to accomplish something
Sort of like Unix, but... Sometimes one works, and the other doesn't
Responses on mailing list often begin with “did you read the book?”
Or the corollary: “That method isn't intended to be used that way...”
Example: getYLine() JavaDoc says “gets the Y Line” Book: no description, only a source example
Background Goal: Brochures for community college
students Students create brochures Admin customizes brochure look & feel Pricing: Monthly subscription for college
Web-based Deployed on Windows Server 2003 Original plan: shrinkwrap
Complex distribution and pricing Inexperienced sysadmins
Future: mobile deployment (students)
Software Stack
JDK 1.6 + Servlet/JSP Tomcat 6.0.28 Apache Commons File Upload 1.2.1 itext 2.0.8 Jdom opencsv 2.1 jUnit 4.8.2 [tagsoup, htmlcleaner, flying saucer, mongoDB]
DEMO
Student interface Generated PDF
Administrative interface PDF preview
Two Different Worlds
Display (esp. web) Print
Color Model RGB CMYK, Pantone
Photo Representation jpg/tiff/png jpg/eps
Explicit Layout Control Nice to have Critical
Fonts Arial/Helvetica/Whatever Helvetica Bold, NOT Bold Helvetica
Photos Highly compressed (<100k) Uncompressed (1 MB+)
Sizes Points DPI; point ~ 1/72”
Leading Huh??? Depends on context
Whitespace Dynamic layout Design element
Graphics (e.g. dotted line) Whatever is there Very precise segment length, endcap shape, ...
y=0 Top of page Bottom of page
iText “Hello World”
Cookie cutter steps: Create a new Document object
initializes margins, other generic properties Create a PdfWriter
associates a document with a file/stream Stream can be a ServletOutputStream
Open the document prepares for writing
Add content Close the document
iText Key Classes
Document – margins, orientation, etc. PdfReader – reads an existing PDF PdfWriter – low-level output
Can be written to BAOS / ServletOutputStream PdfContentByte - “layer”, for low-level output
Can be overlaid PdfStamper – add content to existing PDF PdfCopy – combine pages from PDF's
More Key Classes
Element – logical element Chunk – StringBuffer containing font info Phrase – ArrayList of Chunk, includes Leading Paragraph – Phrases + newline + alignment List, ListItem – Bulleted list Anchor – Hypertext link ColumnText – a column of text & images PdfPTable / PdfPCell – a table
Fonts Two primary font classes:
BaseFont Font name, embedded?, font file name
Font Font size, other modifiers
Font is used by most text-related classes BaseFont is used by PdfWriter
Font constructor takes a BaseFont or Font.FontFamily object
BaseFont for embedded fonts FontFamily for predefined fonts
Predefined Fonts
All PDF readers are required to handle these Readers may substitute a similar font
Helvetica => Arial, e.g. Use embedded fonts to avoid substitution
No space penalty for using these in PDF Fonts:
Courier, Helvetica, Symbol, Times, ZapfDingbats Bold & Italic variants for all except ZapfDingbats
Leading
Pronounced like “sledding” Origin: lead separator inserted above a line PDF (iText): spacing above a line of text Aliases: line spacing Note: 1 inch = 72 points (approx.) Note: Spacing before a paragraph is different
than leading Can be specified in points or % of font size
Embedded Fonts Obtain font information from a file
Adobe Type 1 (.afm, .pfm, .pfb), TrueType (.ttf), OpenType (.otf)
OpenType gives the best cross-platform behavior
Font file is specified in BaseFont constructor Increase PDF size
Only the glyphs used in PDF are embedded Size increase may still be significant, esp. CJK
Watch for licensing restrictions
Hypertext Links
Can be included in PDF To create:
Create a Chunk with the appropriate font color Chunk.setAction (new PdfAction(...)); Embed the Chunk in a Paragraph or other
iText element
Graphics PdfContentByte can create rudimentary
graphics Line segments, solid or dashed
Color, line end/cap style, dash style Filled or unfilled polygons
Fill color/tint can be specified
All units are relative to the edge of the page stroke() renders the graphics
Nothing is rendered until stroke() is called NOTE: LineSeparator can be used for a
horizontal line in the Document
iText and Java2D PdfTemplate.createGraphicsShapes() returns a
Graphics2D object Can be passed to a paint() method The template object can be passed to
PdfContentByte.addTemplate() Allows arbitrary Java2D graphics in PDF
AffineTransform can be passed to some iText methods:
addImage() setTextMatrix() Image/text scaling, rotation, trasformation
Images iText class: com.itextpdf.text.Image Image formats:
JPEG[2000], GIF, PNG, BMP, WMF, TIFF, JBIG2
Color models: RGB, CMYK NOTE: imageio throws an exception when
reading CMYK images
Operations: scaling, transparency, masking NOTE: Scaling doesn't reduce image quality or
size Just affects rendering Big image files => big PDF's
ColumnText
Logical column, positioned explicitly on the page
Rectangular or complex shape Content is added top-to-bottom go() renders content
Nothing happens until go() Can be used to make sure content will fit go(true) simulates output go(false) or go() renders content
PDF Preview in Servlet
PdfWriter constructor takes an OutputStream argument
Can be any OutputStream Including ServletOutputStream
This allows servlet to generate a preview PDF PdfWriter => ServletOutputStream
Small PDF's only temp file => ServletOutputStream
More flexibility, can be used for larger files
PdfStamper
Adds content to an existing PDF Can read and write stream or byte array
Allows chaining of PDF generation ops Content can be written on top or underneath Useful for:
Table of contents “Page x of y” in header/footer Watermarks or “Confidential” notation
General iText Cautions 72 points = 1” (approx) Units are float, not double Font + bold modifier ≠ bold font Spacing before paragraph ≠ leading Watch font licensing restrictions Images are automatically centered & resized if
they reside in a PdfCell ∑ Image size => PDF size (approx)
Scaling images doesn't affect PDF size Beware HTML caching, especially IE
PDF Size Big issue for this project Two primary things affect PDF size:
Images scaling doesn't affect size/resolution
Embedded fonts First example PDF was 10 MB+
Rejected by email server 5+ second download
Changing image size/resolution => 300k PDF Moral: Use small, low-res images
IE Browser Caching
IE Browser Caching GET requests only Symptoms: page not cleared Workaround: Use POST or HTTP headers
Also consider session.invalidate() Note: doesn't help with tabs
JSP workaround:<%
response.setHeader (“Cache-Control”, “no-cache”);
response.setHeader (“Pragma”, “no-cache”);
response.setDateHeader (“Expires”, -1);
%>
References
iText website: http://www.itextpdf.com/
Book: http://www.itextpdf.com/book/
Examples (from the book): http://www.itextpdf.com/examples/index.php