Extract Text to HTML Format

Skip to end of metadata
Go to start of metadata
Extract Text to HTML Format

PdfExtractor class allows you to extract text to HTML format. You can use extractTextAsHTML method of PdfExtractor class. You can either extract text from all the pages or specify a range of pages. The extractTextAsHTML method will save the extracted text as HTML format.

This example shows you how to extract text from PDF to HTML format.

[Java]
//create PdfExtractor object
PdfExtractor extractor = new PdfExtractor();

//bind input pDF file
extractor.bindPdf("input.pdf");

//set start and end pages
extractor.setStartPage(1);
extractor.setEndPage(2);

//extract text
extractor.extractText();

//save extracted text as HTML
extractor.extractTextAsHTML("output.html");

//close PdfExtractor object
extractor.close();
 

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.