epub - a workshop for beginners

21
EPUB a Workshop by Beat Oderbolz

Upload: beat-oderbolz

Post on 05-Dec-2014

1.129 views

Category:

Education


1 download

DESCRIPTION

Presented at the ADL Working Group Meeting. Vienna, Austria: 11/06/2012

TRANSCRIPT

Page 1: EPUB - a workshop for beginners

EPUB a Workshop by Beat Oderbolz

Presenter
Presentation Notes
ADL WG Meeting in Vienna, November 2012 Beat Oderbolz, [email protected]
Page 2: EPUB - a workshop for beginners

Standard from the IDPF (International Digital Publishing Forum) An arrangement of several other standards, mainly: XHTML, CSS, XML and some more. 3 parts, addressing • Content (OPS - Open Publication Structure) • Package Metadata (OPF - Open Packaging Format) • Archive (OCF - OEBPS Container Format) It is powerful, straightforward, and non-proprietary

What is EPUB?

Presenter
Presentation Notes
EPUB is a standard from the International Digital Publishing Forum. It is an arrangement of several other standards (mainly: XHTML, CSS, XML, NCX, DCMI). There are three parts, addressing: content, package metadata, and archive (OPS, OPF, and OCF). It is powerful, straightforward, and non-proprietary. The IDPF is the Trade and Standards Organization for the Digital Publishing Industry. Members are a lot of organizations like American Booksellers Association American Library Association Apple Google Adobe
Page 3: EPUB - a workshop for beginners

+ Mimetype + META-INF + OEBPS

EPUP - Structure Text file: “application/epub+zip” - Folder - Folder (Open eBook Publication Structure)

TASKS • Get the file Example.epub from the folder Workpackage-EPUB. • Change its extension from .epub to .zip • Expand it – open the expanded folder.

Presenter
Presentation Notes
An EPUB file is a group of files that (conform to the OPS/OPF (Open Pulication Structure/Open Packaging Format) standards and) are wrapped in a ZIP file. If you unzip an EPUB file you will find three elements inside: a textfile called mimetype, a folder named META-INF and another folder (in most cases named OEBPS) The mimetype file must be a text document in ASCII that contains the string “application/epub+zip”. It must also be uncompressed, unencrypted, and the first file in the ZIP archive. This file provides a more reliable way for applications to identify the mimetype of the file than just the .epub extension. In other words: it tells an application what type of data it has to handle. Also, there must be a folder named META-INF, which contains the required file container.xml. This XML file points to the file defining the contents of the book, the content.opf file.
Page 4: EPUB - a workshop for beginners

+ Mimetype + META-INF

+ container.xml + OEBPS

EPUP - Structure Text file: “application/epub+zip” - Folder Points to the content.opf file - Folder (Open eBook Publication Structure)

Presenter
Presentation Notes
Also, there must be a folder named META-INF, which contains the required file container.xml. This XML file points to the file defining the contents of the book, the content.opf file, which lies in the OEBPS folder.
Page 5: EPUB - a workshop for beginners

+ Mimetype + META-INF

+ container.xml + OEBPS

+ content.opf + toc.ncx + Text + Images + Styles

EPUP - Structure Text file: “application/epub+zip” - Folder Points to the content.opf file - Folder (Open eBook Publication Structure) Metadata, file manifest, linear reading order (spine) Hierarchical Table of Contents - Folder - Folder - Folder

Presenter
Presentation Notes
Apart from mimetype and META-INF/container.xml, the other files (OPF, NCX, XHTML, CSS and images files) are traditionally put in a directory named OEBPS (Open eBook Publication Structure). The OPF (Open Packaging Format) specification's purpose is to "...[define] the mechanism by which the various components of an OPS publication are tied together and provides additional structure and semantics to the EPUB." This is accomplished by two XML files with the extensions .opf and .ncx: The OPF file, traditionally named content.opf, houses the EPUB book's metadata, file manifest, and linear reading order (“spine”). The metadata element contains all the metadata information for a particular EPUB file. Three metadata tags are required (though many more are available): title, language, and identifier (i.e. ISBN number). The manifest element lists all the files contained in the package. All XHTML (content documents), stylesheets, images or other media, embedded fonts, and the NCX file should be listed here. Only the .opf file itself, the container.xml, and the mimetype files are not included. The spine element lists all the XHTML content documents in their linear reading order. Also, any content document that can be reached through linking or the table of contents must be listed as well. The NCX file (Navigation Control file for XML), traditionally named toc.ncx, contains the hierarchical table of contents for the EPUB file.
Page 6: EPUB - a workshop for beginners

+ Mimetype + META-INF

+ container.xml + OEBPS

+ content.opf + toc.ncx + Text + chapter_001.xhtml + chapter_002.xhtml + chapter_003.xhtml + … + Images + image_001.png + image_002.jpg + image_003.gif + … + Styles + styles.css

EPUP - Structure Text file: “application/epub+zip” - Folder Points to the content.opf file - Folder (Open eBook Publication Structure) Metadata, file manifest, linear reading order (spine) Hierarchical Table of Contents - Folder Chapters (xhtml files) - Folder Pictures (png / jpg / gif) - Folder CSS Stylesheet(s)

Presenter
Presentation Notes
The folders contained in the OEBPS folder contain the actual resources of your EPUB: Chapters in the form of XHTML files Images, which can be jpgs, pngs or gifs Stylesheets I know, this was a lot of technical info – luckily you do not have to worry to much about that because the EPUB editor we will use will take care of all that stuff. But if you are trying to «reverse engineer» an EPUB to see how it was done, it is nice to know where to look…
Page 7: EPUB - a workshop for beginners

EPUB - Sigil

Presenter
Presentation Notes
As far as I know, Sigil is the only EPUB editor which edits EPUB files directly, without any un- and rezipping. In its current version Sigil supports EPUB2 but let’s you slip in some EPUB3 stuff (mostly concerning the stylesheet). For the moment, that’s enough for us. An EPUB in Sigil Left side: «Book Browser»: content of the OEBPS folder (remember?) – content.opf, toc.ncx, folders Text, Styles, Images (and some more) Middle: Main editing pane: WYSIWYG and/or HTML code of chapters (contents of the Text folder) Right side: Table of content, where you can define which titles will apear in your eBook’s index (the toc.ncx). TOC uses the <h> tags to create a hierarchy. On top: some formatting tools Under «Edit» in the menu-bar (not visible in this screenshot) you can find a quite powerful serach & replace engine.
Page 8: EPUB - a workshop for beginners

EPUB - Sigil

TASKS •Please open the file Example-for-Sigil.epub in Sigil •Confirm that it contains:

•3 chapters •1 big image in the first chapter, 1 small image in the second •1 stylesheet linked to the chapters

Page 9: EPUB - a workshop for beginners

The general form of an HTML element <tag attribute1="value1" attribute2="value2">content</tag> <a href=http://www.w3schools.com target="_blank">link</a> CSS - Cascading Style Sheets selector { property1: value1; property2: value2; ...; propertyn: valuen; }

EPUB – Sigil, HTML & CSS

p { font-weight: bold; margin-left: 1em; margin-right: 1em; text-align: justify; }

Presenter
Presentation Notes
HTML: HTML documents are composed entirely of HTML elements that, in their most general form have three components: a pair of tags, a "start tag" and "end tag"; some attributes within the start tag; and finally, any textual and graphical content between the start and end tags. Everything you see in an EPUB eBook is between HTML tags. HTML provides the basic structure of our eBook. CSS: Cascading Style Sheets (CSS) is a style sheet language used for describing the presentation semantics (the look and formatting) of a document written in a markup language (HTML in our case). CSS is designed primarily to enable the separation of document content (written in HTML or a similar markup language) from document presentation, including elements such as the layout, colors, and fonts. With a cleverly constructed stylesheet you can change the look of your whole EPUB eBook in a moment.
Page 10: EPUB - a workshop for beginners

EPUB – Sigil, HTML & CSS

www.w3schools.com HTML / HTML5 / CSS / JavaScript / XML / … Tutorials & References

O’Reilly Pocket References For HTML/XHTML and CSS

Presenter
Presentation Notes
For in-depth information and tutorials about HTML, HTML5 and CSS go www.w3schools. Or, if you prefer a printed reference, get the O’Reilly pocket references for CSS and HTML (also available as eBooks from Amazon…).
Page 11: EPUB - a workshop for beginners

EPUB – Sigil, HTML & CSS

TASKS Make the following changes to the EPUB: • Change the color of the <h1> tag • Reduce the indent of the main text to zero • In the stylesheet, create a class .bold (use .italic as basis) • Change all text in italic to bold • Create your own class(es) and use them on the content

Presenter
Presentation Notes
Page 12: EPUB - a workshop for beginners

3 basic possibilities: - Write directly into Sigil

- Copy / Paste

- Import HTML file

EPUB – Getting Text into Sigil

Presenter
Presentation Notes
There are 3 basic ways to get text into Sigil: The most comprensive one: write your text directly into Sigil: Corrections, minor updates, additions, etc. In most cases you will not write an eBook from scratch (and even if you do that, you probably won’t do that in Sigil) but create an eBook out of existing content. There are two ways to get big junks of text into Sigil: Copy / Paste and Import HTML files.
Page 13: EPUB - a workshop for beginners

EPUB – Getting Text into Sigil

TASKS • Open the folder Import into Sigil • Copy/paste the text of the first page of the pfd-file

A short History of NATO.pdf into the chapter section0003.xhtml • Import the html-file NATO – The Washington Treaty.html into

your eBook

Presenter
Presentation Notes
Let’s have a look at the code we got hear: the pdf doesn’t look so bad, but the html-import seems to be a big mess…
Page 14: EPUB - a workshop for beginners

• Clear out unnecessary code and text

• Link the stylesheet into your new chapter

• Headlines inside the correct <h> tags

• Body-text inside <p> tags (no <div>, no <span>)

• Format the text using the stylesheet formats

• Add formats to the stylesheet if needed

• Validate your EPUB and fix Errors

EPUB – Cleaning up & Formating

Presenter
Presentation Notes
If you import HTML files which you previously downloaded from the internet, you will see that you imported a lot of code you do not need – there’s a lot of clean-up to do. The same problem appears when you copy paste from HTML pages, but to a much lesser degree. So, after importing text into Sigil, you have some cleaning to do: Clear out all the code that you obviously don’t need Copy-paste the Link to the Stylesheet into your chapter Make sure that all the text is inside <p> tags (no <div>, no <span>), use search-replace Make sure that all of your headlines have the correct <h> tag Format the text to your liking using the Stylesheet formats, if necessary Add formats to the Stylesheet if needed Use the EPUB validator of Sigil and fix the remaining errors I normaly use copy-paste, import small junks and clean them up right away. So I do not get lost in an ocean content and code. Apple’s «Pages» can export to .epub
Page 15: EPUB - a workshop for beginners

EPUB – Cleaning up & Formating

TASKS Clean up the code of the two imports and reformat it. 1. Clear out unnecessary code and text

2. Link the stylesheet into your new chapter(s)

3. Headlines inside the correct <h> tags

4. All Body-text inside <p> tags (no <div>, no <span>)

5. Format the text using the stylesheet formats

6. Add formats to the stylesheet if needed

7. Validate your EPUB and fix Errors

Presenter
Presentation Notes
HTML Import: In WYSIWYG view, remove all the text you do not need. In most cases it makes sense to remove all formating first. Remove all the unecessary code Use search and replace to change tags Reformat the text using your stylesheet formats
Page 16: EPUB - a workshop for beginners

EPUB formating is like HTML coding in the early days…

…only worse.

Presenter
Presentation Notes
EPUB formating is like HTML coding in the early days… only worse.
Page 17: EPUB - a workshop for beginners

EPUB – The Golden Rule of Formating

Keep it simple!

Stay away from elaborate design ideas

Use a stylesheet with a limited set of styles

Clean code is nice code

Presenter
Presentation Notes
EPUB formating is like HTML coding in the early days… only worse. Actually, it is not that bad if you follow one simple golden rule: Keep it simple! This means: - Stay away from elaborate design ideas, or you will need the "bang head here" plaquette - use a stylesheet with a limited set of styles! Although it seems like additional work, when you fill in your texts, you will be glad you took it upon you. Latest, when your boss decides, that every text in italics should be bold instead... Clean code is nice code. And easier to maintain. So try to get rid of all the validation errors, also if they do not seem to cause any problems. Unless you really know, what you are doing… Anyways, to be sure that your content looks alright on a specific reading device, you will have to test it on that device. You have no idea what a certain reader will do to your formats
Page 18: EPUB - a workshop for beginners

• Pictures / Coverpage

• Creating a TOC

• Metadata

• Advanced Stuff • Embedding Fonts • Using EPUB3

EPUB – Sigil: the Remains of the Day…

Page 19: EPUB - a workshop for beginners

EPUB – Sigil: the Remains of the Day…

TASKS • Import the pictures sign_warning.png & IntroToEPUB-Cover.png

into Sigil • Use sign_warning.png somewhere in your eBook: aligned on

the left side, with text floating around it. • Use IntroToEPUB-Cover.png as a Cover Image for your eBook

• Enter some more titles - <h1>, <h2> and <h3> and create a TOC

• Enter the mandatory metadata

Page 20: EPUB - a workshop for beginners

Or: How to get your eBook onto reading devices

EPUB - Distribution

Presenter
Presentation Notes
Upload the eBook to a cloud service like Dropbox or Google Drive and download it from there (you need a client on your mobile device) Upload the eBook to ILIAS and use the browser of your mobile device to download it (beware: not all browsers recognise the .epub format!) Send the eBook via eMail and use your mobile devices eMail app to download it Connect your mobile device to your computer and use calibre to load it onto it <Additional things to try out: Try out distribution via Email or cloud Make eBook from Wikipedia Article Have a look at Intro to NATO.epub Show «Yellow Submarine» and «Life on Earth»>
Page 21: EPUB - a workshop for beginners

www.w3schools.com HTML / HTML5 / CSS / JavaScript / XML / … Tutorials & References

Resources…

http://code.google.com/p/sigil/ Multi-platform EPUB ebook editor «Sigil» Download & Documentation

http://calibre-ebook.com/ Multi-platform and open source e-book format conversion & library management application «Calibre» Download & Documentation

Presenter
Presentation Notes
Btw: Just last week Sigil had a major update (our version here: 0.5.3, newest version 0.6.0). The main changes concern the code-editor, which is has improved a lot in usability and functionality!