Pdf

From MEPIS Documentation Wiki

Revision as of 16:10, 17 April 2012 by Jerry bond (Talk | contribs)
Jump to: navigation, search

Contents

Introduction

The Portable Document Format or PDF was developed by Adobe Systems in 1993. The main purpose of the effort was to develop a document format that would be independent of the application software, the operating system, and the computer hardware. When creating a new document, one usually uses an editing application such as a word processor, a paint program, a spreadsheet program, etc., all of which support formats that are designed for the editing process. In general, conversion to PDF is viewed as a last step in this document generation process; i.e., the dissemination of a final version of the document. Accordingly, most such editing applications have the capability of easily saving the document or drawing as a PDF file. A desirable characteristic of a final version format such as PDF is a limitation on the ability to change the document. Consequently, the PDF format tends to discourage easy editing once a document is converted to a PDF file.

It can be argued that PDF has slowly become the de facto standard for document exchange and, in 2008, the formerly proprietary format was open sourced, further increasing the use of PDF as a document standard. Of course, as the number of documents using the PDF format has increased over the years, there has also arisen an increasing need to be able to readily edit such documents. Adobe Systems has developed an application to address this issue, Adobe Acrobat. Acrobat is designed to read, write, and, to a large extent, edit PDF files. A popular non-Adobe PDF editor is Foxit. Both are commercial products and can be purchased for the Microsoft Windows operating system. Accordingly, it may be possible to run Acrobat and Foxit using Wine, a Windows emulation environment for Linux. Although the exact installation and possible limitations of doing this are beyond the scope of this document, more information can obtained from the WineHQ website. The only Adobe Systems PDF application native to Linux is the Acrobat Reader, which as the name implies can only read PDF files. Likewise, the only Foxit application native to Linux is the Foxit Reader. Although there is really no one comprehensive Linux solution for editing PDF files, there are a somewhat disjointed series of tools available that when used together can provide a fairly good solution to the problem of editing PDF documents.

Readers

As discussed in the previous section, proprietary readers are available to Linux users. You will be required to agree to an End User License Agreement in order to use the software.

  • As a product of Adobe Systems, Acrobat is probably the most feature rich reader currently available. It can be installed from the Mepis repositories using Synaptic or, if you want the latest version, it can be downloaded and installed from the Adobe Systems website as a .deb file.
  • The Foxit Reader is not currently available through the Mepis repositories, but can be downloaded from the Foxit website and installed as a .deb file.
  • There is also a Firefox Adobe Reader plugin provided by the adobereader-enu package wthat can be downloaded from this site. In the dropdown list of step 3 you can select a .deb instead of a .bin which is what you normally get at the regular download site.

The standard Mepis 8.5 installation installs the Okular document viewer as the default PDF reader. You can also find several open-source PDF readers in the Mepis repositories such as; KPDF (PDF viewer for KDE), Evince (uses Gnome libraries), or ePDFView (uses poppler libraries).

Editing PDF files

There are a series of tools available to the Linux user that can be used to edit PDF files. Unfortunately, no one tool is optimal for all PDF documents. You will find that the editing results can vary widely based on a range of factors such as the version of the PDF format being modified, the rendering engine, the options with which the original PDF file was created, what fonts were used to create the document, what system fonts are available, etc. Consequently, the user may have to try more than one approach to editing a PDF document to achieve the best results. The following is a summary of the three most common solutions to PDF editing in Linux. It is recommended that when editing a PDF file, the user try each of these approaches to determine which provides the best rendering for a given document. As an aside, it should be noted that most PDF documents are an assemblage of text and images where the text can be directly edited. However, some PDF files are comprised solely of images (usually page scans) in which no direct text editing is possible. In such a case, the user will need to edit the text in the same manner as editing an image.

Solution 1: PDFedit

PDFedit is a native Linux application based on the xpdf library with a QT3.x-based Graphical User Interface (GUI) that is currently being ported to QT4. It is not really a word processing-type editor as such, but rather is designed to allow direct editing access to the raw PDF file code. As a result, it is more oriented to the advanced user with knowledge of PDF file code constructs and supports extensive user-customized scripting based on the ECMAScript scripting language. For most users, this means there is a fairly steep learning curve to become proficient in the more advanced features. However, there are some basic GUI functions already installed for the casual user that are useful for small edits such as limited text changes or filling out form fields. PDFedit can be installed through Synaptic. For more information, you can visit the PDFedit website, the PDFedit Wiki, and the PDFedit Wikipedia entry.

Solution 2: OpenOffice Draw with PDF Import Extension

OpenOffice.org (OOo) is a complete open source office suite originally based on StarOffice and is comprised of Writer (word processor), Draw (drawing program), Calc (spreadsheet program), Base (database program), and Impress (presentation editor). The OOo suite can be downloaded and installed using Synaptic. Once OOo is installed, you will also have to download the OpenOffice PDF Import Extension in order to edit PDF files with OOo. To install the extension, first open OpenOffice Draw and select Tools -> Extension Manager. Click on the Add button, select the Sun PDF Import Extension, agree to the license, and close the window after installation is complete. You can now directly edit PDF documents by opening them with OOo Draw.

Solution 3: Inkscape with PDF Tool Kit

Inkscape is an open source vector-based drawing program (similar to Illustrator or CorelDraw) that loads and saves a subset of the SVG (Scalable Vector Graphics) file format and can be installed using Synaptic.

Method A: Open PDF File Directly with Inkscape

PDF files can be opened directly with Inkscape, but only one page at a time. So, if you are working with a single page document then you just open it, make the edits, and save the file back to PDF format. However, if you try to load a multi-page PDF document, Inkscape will ask which page of the document you want to open. You will need to open and save each page of the document as separate PDF files, leaving you with an individual file for each page. To re-assemble these individual files back into a single document, you will need to use PDF Took Kit (pdftk). Pdftk, which can be installed using Synaptic, is an open source set of tools for manipulating PDF documents. In this case, we need to use pdftk to re-assemble the pages of the document. This is accomplished by entering the following command sequence into the console command line interface...

  pdftk page1.pdf page2.pdf page3.pdf cat output merged_document.pdf

...where page1.pdf, page2.pdf, etc. are the names of the individual page files and merged_document.pdf is the name of the re-assembled output file.

Method B: Convert PDF File to SVG Before Opening

Sometimes a PDF document opened directly with Inkscape will not be rendered correctly. In such a case, you should try converting the document to SVG before opening it with Inkscape. To do this, you will need to install pdf2svg using Synaptic. Pdf2svg is a small command line utility that can split up a PDF document into individual SVG page files. To do this, you enter the following command sequence into the console command line interface...

   pdf2svg input.pdf output_page%d.svg all

...where input.pdf is the PDF document and output_page is the name for the page files. The %d simply appends a sequence number to each page file. Once converted, the files can be edited with Inkscape and re-assembled with pdftk.

Reducing PDF file size

Method 1

Open the pdf with Okular and print (CTRL-P) and change the printer Name to Print to File (PDF) and specify to desired output file name and path.

Method 2

Use the gs command below in the konsole replacing output.pdf and input.pdf to your filenames. The -dPDFSETTINGS= can be changed to the following (from lowest to highest resolution):

  • /screen
  • /ebook
  • /printer
  • /prepress
gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -sOutputFile=output.pdf input.pdf

Other Tools

Linux supports a range of conversion, annotating and printing tools for PDF files. A search of Synaptic provides an extensive listing of a number of such utilities. Some of the more widely used tools are listed in PDF section in Linux software link below.

Create a PDF form

You can create a PDF form using LibreOffice that users can fill out and print or save by making use of Form Controls. Click View > Tool Bars > Form Controls to pull up the controls (text box, list, etc.), then click on the control you want to use and draw a box in your document. When done, click File > Export as PDF, where Create as PDF Form will already be checked. See this extensive discussion for details.

Links

Debian Wiki
PDF section in Linux software
Wikipedia list of PDF software
Linuxquestions.org PDF wiki

Personal tools