Converting PDF to HTML

There is a program called pdftohtml to convert pdf to html file.This program can translate pdf documents into html format, translate pdf files into HTML or XML formats, combined with png images and supports encrypted pdf files. In Ubuntu gutsy this package in bundled with poppler-utils so we need to install this package first.

Install poppler-utils in Ubuntu

sudo aptitude install poppler-utils

This will complete the installation

Using pdftohtml

pdftohtml Syntax

pdftohtml [options] [pdf file]

Available options:
A summary of options are included below.

-h, -help – Show summary of options.

-f – first page to print

-l – last page to print

-q – don’t print any messages or errors

-v – print copyright and version info

-p – exchange .pdf links with .html

-c – generate complex output

-i – ignore images

-noframes – generate no frames. Not supported in complex output mode.

-stdout – use standard output

-zoom – zoom the pdf document (default 1.5)

-xml – output for XML post-processing

-enc – output text encoding name

-opw – owner password (for encrypted files)

-upw – user password (for encrypted files)

-hidden – force hidden text extraction

-dev – output device name for Ghostscript (png16m, jpeg etc)

-nomerge – do not merge paragraphs

-nodrm – override document DRM settings

pdftohtml Examples

pdftohtml test.pdf test.html

This command gives you a simple HTML file suitable for reading or copying the textual content of the PDF file. You can actually grab the text from your browser and paste it into other applications. It doesn’t produce any PNG files, so you won’t be able to see any embedded graphics. It’s a great utility if you just want to extract the text from an Adobe file.

If you want to see graphics, you’ll need to use the -c (as in “complex”) option:

pdftohtml -c test.pdf test.html

This option produces individual HTML files, one for each page of the PDF file, with the PNG references mixed in. The graphics in the original PDF file show up in a browser and the text part can be cut and pasted. The total size of the HTML and PNG files generated with the -c option tend to be roughly equivalent to that of the original PDF.

Source: Ubuntu Geek

6 thoughts on “Converting PDF to HTML”

Sudhir

March 25, 2009 at 11:33 am

Hi We are using this command from directly in linux but it has a problem in converting hyperlinks of pdf into html.. is there any solution..?

Loading...

Pingback: How to convert pdf to html on Ubuntu 9.04 « Computer Borders
vl

April 26, 2011 at 2:56 pm

You can also try Okdo Pdf to Html Converter.

http://bstdownload.com/reviews/okdo-pdf-html-converter-4/

Loading...

Gokhan

April 3, 2012 at 12:49 pm

Hi, i used this under ubuntu, it worked quiet. But it convert pages to png and adds some colorscheme image top of the page. how can i disable that?

Loading...

Inayat

October 23, 2013 at 1:05 pm

You can open and save PDF file as HTML in just two lines of codes in java by using the latest version of Aspose.PDF for Java

Loading...

Jeniffer

November 11, 2013 at 3:07 pm

Found this blog about converting PDF to HTML in just two lines of codes by using Aspose’s Java Library for PDF. I found this very helpful, hopefully you will like it also.

http://www.quora.com/Programming-Experts/Java-dotNET-PHP-Android-Development-Tips/Convert-PDF-Files-to-HTML-Pdf-to-Image-Conversion-in-Java-Applications?share=1

Loading...

	Book Review: Digital… on Book Review: Deep Work
	Colin Bowern on The Great Indian Developer Sum…
	Martin Goodnews on Book Review: Atomic Habits by…
	narendra enamala on How to show links in ADF Messa…
	Emadul islam on Book Review: Atomic Habits by…

Like this:

Related

6 thoughts on “Converting PDF to HTML”

Leave a ReplyCancel reply

Share this:

Like this:

Related

6 thoughts on “Converting PDF to HTML”

Leave a ReplyCancel reply

Discover more from Experiences Unlimited