How to Produce an eBook in ePub Format

The Scent  of an ebook

An ebook will never smell as good as that old leather-bound volume of classics on grandfather’s bookshelf; but the ebook format, residing inside an ebook reader, will let you carry that entire volume in your pocket.

Before we get into the creation of an ebook, should we not get a feel for what browsing an ebook feels like? Download the Mozilla add-on for reading ePub books, and point your Firefox browser to one of the many sites that host DRM-free ePub content (see References). You will note that with this add-on, you can not only read ePub ebooks, but also create a repository of your favourite ebooks, either as saved local copies, or as bookmarks in Internet repositories.

ePub is an open ebook format developed and maintained by the International Digital Publishing Forum (IDPF). It is a zipped container format; if you change the extension from .epub to .zip, you can open it with an archive manager. Among other things, it contains a manifest of the contents, metadata information, and the book’s content as files in the XHTML 1.1 format.

We will not go into the intricacies of the format here — rather, we will concentrate on a workflow to produce an ePub ebook that passes standard-compliance checks. You can learn more about the ePub format from the IDPF website.

Preparing the content

Since the content of the book is stored as XHTML, the shortest way to an ePub book should start from a good WYSIWYG editor for XHTML. We will begin with OpenOffice.org Writer, because it gives us a familiar tool to work with and maintain the original source of our ebooks. Our workflow lets you concentrate on your book, rather than fiddle with the format. Another advantage of starting from OOo Writer is that ODF serves as a good parent format from which you can repurpose your content as a PDF document, a Web page, or (as in this case) an ePub ebook.

But for those of you who wear see-through wrist watches to admire the mechanisms as much as to read the time, we will briefly discuss a more direct approach, later on, which assumes a knowledge of the ePub format and XHTML/CSS. That said, you can use any word-processing tool here that generates good HTML or XHTML in the end.

The first tip is to keep things basic and uncomplicated. Too many bells and whistles can irritate the reader, and even confound the ebook reader software or device. All our tips apply to OOo Writer, unless otherwise noted.

  • ebook title: In OOo Writer, use the document properties form (under File > Properties…) to enter the name of the document or ebook. This becomes the title of the ebook you are producing.
  • Headings: Have at most two levels of hierarchy for headings — one for chapter headings, and another for subheadings within chapters (if you need them). In OOo Writer, as a style guideline, choose Heading 1 as the style for chapter headings, and Heading 2 for subheadings. These translate to the HTML elements <h1/> and <h2/>, respectively, in the ebook. The rest of the text can be Body Text or the equivalent. This discipline also helps generate a proper and correct “Table of Contents” for the ebook.
  • Tables: Tables can be inserted, too. Make sure you modify the table properties and set the table width to relative, with values of 80 per cent or 90 per cent.
  • Images: Images add spice to the otherwise monotonous text of an ebook. The cover image is by far the most important one — it helps readers distinguish a book in their ebook library. The recommended size for cover images is 590×750 pixels at 72 DPI or better. Images can be in the JPEG or PNG format. For images in your text, make sure they are resized, and are inserted in their “original size”. In the Crop tab in Picture Properties, check the box to keep picture in their original size. In the Options tab, enter Alternate Text for your image; this goes into the alt attribute of the XHTML <img> element. You might find it convenient to resize images within 30 per cent of the page width, and align them to the left, with text flowing to the right. Feel free to experiment and choose what works best for you and your ebook software.
  • Generating HTML: Once you have completed composing your work in OOo Writer, use the Save As option to save it as HTML, which is anything but XHTML 1.1-compliant; hence, there’s one more step that converts the HTML to XHTML 1.1 before you are ready to move on to the final stage of producing your ePub ebook.
  • Converting to XHTML: Conversion from HTML to XHTML is easy, thanks to the nice folks who have written html2xhtml, a neat tool for precisely this purpose. The websitealso provides an option to convert documents online. Assuming that you have downloaded, compiled, and installed this tool — and believe me, this is the better option while experimenting — you can use the following command for the conversion:
    html2xhtml -t 1.1 <input_file> -o <output_file>
  • XHTML validation: Once you have your XHTML 1.1 source, test it in your browser, to check if the styling and layout appear as you intended them to. It might be necessary to tweak the XHTML manually to ensure the text flows around a graphic. Finally, validate your XHTML 1.1 using the W3 Markup Validation Service. The result should be nothing short of an unqualified Pass with the Doctype as XHTML 1.1, and Encoding as UTF-8! If you do not get a Pass, or if the Doctype and Encoding are not detected correctly, it is better to iron things out before you proceed further.

Calibre

We will use Calibre — the ebook management system — to produce our ebook. Calibre is a very capable set of software tools, with both command-line and graphical user interfaces, which can produce ebooks in a wide variety of formats, convert them from one format to another, push ebooks to major ebook reader devices, and manage your collection of ebooks. However, as we agreed earlier, we are going to stick to the straight and the narrow path, and use an almost default subset of Calibre features, which are adequate to produce our ebook.

Calibre should be available in your distribution’s repositories. Go ahead and install it. After you have invoked Calibre and completed the initial setup, including specifying a location for your ebook repository, you can download some ePub books into your home directory. Next, add these to the Calibre repository by clicking Add Books, the first button on the toolbar. Once these books have been added, you can browse them using the View button, or edit/view their metadata with the eponymous tool bar button.

Producing your ebook

Let’s return to our task, after the brief digression of getting to know Calibre. We now have our book source as validated XHTML 1.1 with any accompanying images, all in one folder. Click Add Books in Calibre, and add the XHTML 1.1 source file to Calibre, which, on its part, collects the XHTML and associated images, and zips them into a folder. It should now appear as a new entry in the list on the Calibre home screen.

You are now ready for the final step of converting your content to ePub. Select the newly added entry, and click the Convert E-books button to start the conversion process.

Populating the metadata

The conversion window pops up, showing the metadata entry form. Enter the name of the book, its author, the tags denoting content or the genre and a brief summary of the book. An important field is the Author Sort field. If you want authors sorted by their last name in your ebook library list, then you should re-enter the author’s name in the Author Sort field in the “last-name, first-name” format.

You can also upload the cover image that you had prepared earlier. If you do not upload a cover image and leave all cover-related settings at their default settings, Calibre automatically generates a default cover for your book.

Detecting structure

Most other aspects of conversion, such as “Look & Feel”, “Structure Detection”, etc., are best left at their default values; nothing needs to be done with them. However, there is work to be done on the Table of Contents screen. Normally, Calibre auto-detects the chapters in a document, using a sophisticated XPath expression that looks for the words “Chapter”, “Part”, “Book” or the class="chapter" attribute. So, if all your chapter titles have any of these words, you have nothing to worry about.

However, to make chapter detection more generic, open the Table of Contents screen, and enter //h:h1 and //h:h2 as the XPath expressions for Level 1 TOC and Level 2 TOC, respectively. This falls in well with our document style guideline of using Heading 1 and Heading 2 as Level 1 and Level 2 headings, respectively. Also, bump up the Chapter threshold number to a value larger than the number of chapters in your ebook.

Final lap

Before you click OK, remember to choose the Output Format as ePub on the top right of the screen. Calibre will start the conversion process, and flag its completion. Your ePub ebook should be stored in your Calibre library (remember the location you specified earlier) in a folder named after its author, and will have the extension .epub.

There is one last thing to be done to tweak the metadata of the ebook. I could not get my version of Calibre (0.6.13) to populate the language attribute automatically. If your version has a similar problem, then you can use the command-line tool that is part of the Calibre installation. Issue the following command and you are done:

ebook-meta <ePub file> -l en

Checking standards compliance

It is very important to check your ePub ebook for standards compliance. This will help your ebook fulfil its potential by being rendered correctly on a wide variety of ebook readers. Open the ePub validation site in your browser, upload your ebook, and start the validation. Again, nothing short of an unqualified Pass should be your target. Congratulations! You have created your DRM-free ePub ebook!

Serving ePub ebooks

Now that you know how to create your own ebooks, and also to create a repository of other ebooks that you might have downloaded, would it not be logical to make them available over the Intranet to your colleagues (of course, keeping in mind the applicable licensing conditions)?

Calibre has one more surprise in its feature list; it has a Web server, based on CherryPy, which listens on port 8080 on localhost. Bring up the Preferences window, and select the Content Server screen. There is very little to change here, except providing a login ID and password (recommended) before clicking on Start Server.

Once the server is up, point your browser to http://localhost:8080 and see your collection of books arranged on a Web page. Since you already have the Firefox add-on to read ePub ebooks, it is just a matter of clicking an ebook hyperlink, and reading it right in your browser.

A direct approach

Sigil is a WYSIWYG editor for ePub. It lets you work directly and interactively with the XHTML in the ePub container. Before using Sigil, it is worth reading the documentation and its tutorial at the wiki, which not only introduces you to the ePub format, but also guides you on how to create an ebook, edit its metadata, and customise its TOC. Working directly with HTML gives you greater flexibility, but can also bog you down in mark-up details.

You can import text or HTML into Sigil, and start editing it after the four-step process of editing the TOC, editing the metadata, confirming images and any tables. You can also author the book directly using the Sigil editor. Once you have clean XHTML after following the steps described earlier, you can take either the Sigil or the Calibre route.

However, a hybrid approach might serve your purpose, in most cases. You can use the Calibre-based workflow we described earlier to produce your ebook, and later use Sigil to customise it to your requirements, by adding additional metadata fields or tuning the placement of images. Irrespective of the approach you use, remember to run a validation test on your completed work.

Epilogue

ePub produced with our workflow passes the ePub validation tests, and renders well on Calibre’s own ebook reader, as well as the Firefox ePub add-on. Your mileage may vary, depending on your ebook reader.

Once you are comfortable with producing ePub ebooks with this workflow, and are fascinated enough to continue further, you can experiment with the ebook production process. The Calibre command-line interface lets you automate the conversion process. You can also try out things such as adding custom CSS for a unique look-and-feel, placing images as part of chapter headings and TOC items, or creating custom title pages.

Now, dear author-publisher, have you caught the scent of that ebook?

References

All published articles are released under Creative Commons Attribution-NonCommercial 3.0 Unported License, unless otherwise noted.
Open Source For You is powered by WordPress, which gladly sits on top of a CentOS-based LEMP stack.

Creative Commons License.