Extract Text Formatting Information
If you want to extract the text along with the detailed text formatting information then you can use getFormattedText method of PdfExtractor class. This method returns an array of text segments. You can loop through all of these segments and get information of each segment. The returned information contains font name, font size, font color, X and Y coordinates.
If you want to extract the text along with the detailed text formatting information then you can use getFormattedText method of PdfExtractor class. This method returns an array of text segments. You can loop through all of these segments and get information of each segment. The returned information contains font name, font size, font color, X and Y coordinates.
This example shows you how to extract text formatting information from PDF file.
[Java]
//create PdfExtractor object PdfExtractor extractor = new PdfExtractor(); //bind input pDF file extractor.bindPdf("input.pdf"); //set start and end pages extractor.setStartPage(1); extractor.setEndPage(2); //extract text extractor.extractText(); //extract text segments TextSegment[] segments = extractor.getFormattedText(); //get text information for(int index = 0; index < segments.length; index++) { TextSegment text = segments[index]; System.out.println("Segment #"+index); System.out.println(text.getText()); System.out.println(text.getFontName()); System.out.println(text.getFontSize()); System.out.println(text.getTextColor().toString()); System.out.println(text.getX()); System.out.println(text.getY()); } //close PdfExtractor object extractor.close();
