In many cases, it is required by users to extract the Text inside a PDF document. This feature can also be acheived by using a PDF Utility or software like Adobe Acrobat Reader/Writer. But for developers, who want to extract text from PDF documents programmatically, Aspose has provided a very easiest approach through Aspose.Pdf.Kit (Java Version).
Aspose.Pdf.Kit (Java Version ) provides a class PdfExtractor (Java Version ) that is used to extract content from any PDF document. To extract text from PDF document, follow these steps:
- Create an object of PdfExtractor (Java Version ) class by calling its empty constructor
- Provide the password for PDF document using Password (Java Version ) property of PdfExtractor (Java Version ) class.
- Bind the PDF document with extractor by calling BindPdf (Java Version ) method that would take the input PDF file path or a stream as argument.
- Call ExtractText (Java Version ) method that will extract text from PDF document.
- Finally call GetText (Java Version ) method to store the extracted text in a plain text file or a stream.
Code Snippet
[C#]
//Instantiate PdfExtractor object
PdfExtractor extractor = new PdfExtractor();
//Set Password for input PDF file
extractor.Password = "";
//Bind the input PDF document to extractor
extractor.BindPdf(".\\text.pdf");
//Extract text from the input PDF document
extractor.ExtractText();
//Save the extracted text to a text file
extractor.GetText(".\\text.txt");
[VB.NET]
'Instantiate PdfExtractor object
Dim extractor As PdfExtractor = New PdfExtractor()
'Set Password for input PDF file
extractor.Password = ""
'Bind the input PDF document to extractor
extractor.BindPdf(".\\text.pdf")
'Extract text from the input PDF document
extractor.ExtractText()
'Save the extracted text to a text file
extractor.GetText(".\\text.txt")
[Java]
//Instantiate PdfExtractor object
PdfExtractor extractor = new PdfExtractor();
//Bind the input PDF document to extractor
extractor.bindPdf(path + "Text.pdf");
//Extract text from the input PDF document
extractor.extractText();
//Save the extracted text to a text file
extractor.getText(path + "text.txt");