Extract Text Formatting Information

Skip to end of metadata
Go to start of metadata
Extract Text Formatting Information

If you want to extract the text along with the detailed text formatting information then you can use getFormattedText method of PdfExtractor class. This method returns an array of text segments. You can loop through all of these segments and get information of each segment. The returned information contains font name, font size, font color, X and Y coordinates.

This example shows you how to extract text formatting information from PDF file.

[Java]
//create PdfExtractor object
PdfExtractor extractor = new PdfExtractor();

//bind input pDF file
extractor.bindPdf("input.pdf");

//set start and end pages
extractor.setStartPage(1);
extractor.setEndPage(2);

//extract text
extractor.extractText();

//extract text segments
TextSegment[] segments = extractor.getFormattedText();

//get text information
for(int index = 0; index < segments.length; index++)
{
      TextSegment text = segments[index];
      System.out.println("Segment #"+index);
      System.out.println(text.getText());
      System.out.println(text.getFontName());
      System.out.println(text.getFontSize());
      System.out.println(text.getTextColor().toString());
      System.out.println(text.getX());
      System.out.println(text.getY());
}

//close PdfExtractor object
extractor.close();
 
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.