approaches to document/report generation
DESCRIPTION
Presents approaches for programmatically creating Office files. Targeted at developers. Presented at http://osdc.com.au/talks/generating-documents-tools-and-techniquesTRANSCRIPT
![Page 1: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/1.jpg)
Document GenerationDo’s and Don’ts
Jason HarropPlutext Pty Ltd
![Page 2: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/2.jpg)
www.docx4java.org
Where I’m coming from…
• docx4j is an ASLv2 library for (Microsoft) Open XML office documents (docx, pptx, xlsx)
• My company Plutext sponsors that project• docx4j started in 2007
![Page 3: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/3.jpg)
www.docx4java.org
Since its introduction in 2007, docx4j has become quite popular.
![Page 4: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/4.jpg)
www.docx4java.org
Comparables
tool Open XML SDK docx4j POI Aspose
vendor Microsoft Plutext Apache Aspose
language .NET (C# etc) Java Java Java
cost free free free expensive
open source no yes(ASL v2)
yes(ASL v2)
no
marshalling framework .NET JAXB
(even moXy)XML Beans JAXB
![Page 5: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/5.jpg)
www.docx4java.org
![Page 6: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/6.jpg)
www.docx4java.org
Choose your hub format; import/export from/to others
XHTML
docx?
docx
XHTML
PDF?
• If you need to replicate the appearance of existing Office documents, using the Microsoft formats as your “hub” will avoid lots of pain
• If you can, work with the OpenXML formats, not the legacy binary ones, or Word 2003 XML, or Word HTML
• LibreOffice/OpenOffice is a useful tool for conversion, driven by JODConverter
![Page 7: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/7.jpg)
www.docx4java.org
Open XML
• standardised via ECMA 376 and ISO/IEC 29500• includes XSD
– can generate strongly typed classes
Open Unzip Alter XMLOpen Unzip Unmarshal Manipulate
objects
![Page 8: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/8.jpg)
www.docx4java.org
Authoring time Generation time
What skills do authors
need?
data
docx
HTML
![Page 9: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/9.jpg)
www.docx4java.org
Approach 1:- Variable replacement.
This approach can also be used for pptx, xlsx
![Page 10: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/10.jpg)
www.docx4java.org
What could be simpler?
![Page 11: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/11.jpg)
www.docx4java.org
Ummm… not so fast.
1. spelling/grammar proofing
2. rsid
3. run formatting
![Page 12: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/12.jpg)
www.docx4java.org
Look for a solution which maintains integrity
• Typically a Word Add-In or macro which ensures integrity• This suggestion applies to approaches #2 and #3 as well
![Page 13: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/13.jpg)
www.docx4java.org
Additional requirement: repeating data (list items, table rows)
• can be done using some convention, for example:[#list developers as developer] ${developer.name}[/#list]
• many systems invent their own (eg HotDocs)• but freemarker or velocity template language can be used to
do this:– http://freemarker.sourceforge.net/– http://velocity.apache.org/
• for example:– XDocReport (FreeMarker or Velocity; open source)
• (this templating approach can also be used with OpenOffice documents)
![Page 14: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/14.jpg)
www.docx4java.org
Additional requirement: conditional content
• for example, XDocReport uses – [#if (Freemarker)– #if( (Velocity)
![Page 15: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/15.jpg)
www.docx4java.org
Additional requirement: images
• Now it is starting to get a bit trickier, because inserting an image requires:– adding an image part to the docx package– making a note of its rel id– replacing the placeholder with the image XML, including the rel id
![Page 16: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/16.jpg)
www.docx4java.org
Approach 2:- MERGEFIELD and other fields
• Fields are a long standing feature of Word, included in the Open XML specification
• so lots of documents use this (aka mail merge)• Various other useful field types eg IF• A partial solution to the integrity problems of Approach 1
![Page 17: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/17.jpg)
www.docx4java.org
But, two unpleasant XML hybrids (simple and complex)
<w:fldSimple w:instr=" MERGEFIELD name "> <w:r> <w:t>«name»</w:t> </w:r> </w:fldSimple> <w:r>
<w:fldChar w:fldCharType="begin"/>
<w:instrText xml:space="preserve">NAME</w:instrText>
<w:fldChar w:fldCharType="separate"/>
<w:r> <w:t>«name»</w:t> </w:r>
<w:fldChar w:fldCharType="end"/> </w:r>
![Page 18: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/18.jpg)
www.docx4java.org
Approach 3:- Content controls
![Page 19: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/19.jpg)
www.docx4java.org
Much nicer XML, and XPath binding
<w:sdt> <w:sdtPr> <w:alias w:val="name"/> <w:tag w:val="od:xpath=ribxv"/> <w:id w:val="13144269"/> <w:dataBinding w:xpath="/oda:answers/oda:answer[@id='name_Wt']" /> </w:sdtPr> <w:sdtContent> <w:r > <w:t>«name»</w:t> </w:r> </w:sdtContent> </w:sdt>
![Page 20: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/20.jpg)
www.docx4java.org
Content controls are nice
• Better solution integrity wise• Can bind via XPath to arbitrary XML • handles images• since Word 2007• can nest, so repeats/conditions work well
– unlike Approaches 1 & 2– table row friendly
• w:tag supports arbitrary data
.. But unique to Open XML. (Could/should a revised ODF support similar?)
![Page 21: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/21.jpg)
www.docx4java.org
Repeats/conditions
• applies to content inside• w:dataBinding doesn’t support these• so create your own semantics• OpenDoPE is one way• use w:tag for implementation• need an editing tool to insert repeats/conditions
– for OpenDoPE, there are Word Add-Ins designed for technical and non-technical users
• at generation time, need code to support them– docx4j does this, and other OpenXML libraries could be extended to
support
• can support complex documents (nested repeats etc)
![Page 22: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/22.jpg)
www.docx4java.org
Choose your poison
• docx4j supports all three approaches– but content controls are strongly recommended
• other libraries offer more or less support for each approach
![Page 23: Approaches to document/report generation](https://reader033.vdocument.in/reader033/viewer/2022061218/54816f4bb379596f2b8b5cd0/html5/thumbnails/23.jpg)
www.docx4java.org
Thanks!