Image extraction from a PDF document is another exciting feature of Aspose.Pdf.Kit (Java Version ) for developers.
Aspose.Pdf.Kit (Java Version ) provides a class PdfExtractor (Java Version ) that is used to extract content from any PDF document. To extract Images from a PDF document, follow these steps:
- Create an object of PdfExtractor (Java Version ) class by calling its empty constructor
- Bind the PDF document with extractor by calling BindPdf (Java Version ) method that would take the input PDF file path or a stream as argument.
- Set the number of Start Page and End Page of the PDF document (using StartPage (Java Version ) and EndPage (Java Version ) properties of the PdfExtractor class) to provide the range of pages in which you want to extract images.
- Call ExtractImage (Java Version ) method that will extract image from PDF document.
- Call HasNextImage (Java Version ) method in a loop to iterate through the images in the PDF document. HasNextImage (Java Version ) method will check any next image in the PDF document and return true. If there is no next image in PDF document then false will be returned by the method.
- Finally, in the body of the loop, developers can call GetNextImage (Java Version ) method to store the image as a file (in jpg format) or a stream.
Code Snippet
[C#]
//Instantiate PdfExtractor object
PdfExtractor extractor = new PdfExtractor();
//Bind the input PDF document to extractor
extractor.BindPdf(".\\Images.pdf");
//Set the Number of Page in PDF document from where to start image extraction
extractor.StartPage = 1;
//Set the Number of Page in PDF document, where to end image extraction
extractor.EndPage = 2;
//Extract images from the input PDF document
extractor.ExtractImage();
//A variable to store the prefix (First Part usually file name) of the image file name
String prefix = ".\\Image";
//A variable to store the suffix (Last Part usually file extension) of the image file name
String suffix = ".jpg";
//A variable to count number of extracted images
int imageCount = 1;
//Calling HasNextImage method in while loop. When images will finish, loop will exit
while (extractor.HasNextImage())
{
//Call GetNextImage method to store image as a file
extractor.GetNextImage(prefix + imageCount + suffix);
//Incrementing image counter variable
imageCount++;
}
[VB.NET]
//Instantiate PdfExtractor object
Dim extractor As PdfExtractor = New PdfExtractor()
//Bind the input PDF document to extractor
extractor.BindPdf(".\\Images.pdf")
//Set the Number of Page in PDF document from where to start image extraction
extractor.StartPage = 1
//Set the Number of Page in PDF document, where to end image extraction
extractor.EndPage = 2
//Extract images from the input PDF document
extractor.ExtractImage()
//A variable to store the prefix (First Part usually file name) of the image file name
Dim prefix As String = ".\\Image"
//A variable to count number of extracted images
Dim suffix As String = ".jpg"
//A variable to count number of extracted images
Dim imageCount As Integer = 1
//Calling HasNextImage method in while loop. When images will finish, loop will exit
While extractor.HasNextImage()
//Call GetNextImage method to store image as a file
extractor.GetNextImage(prefix + imageCount + suffix)
//Incrementing image counter variable
imageCount = imageCount + 1
End While
[Java]
//Instantiate PdfExtractor object
PdfExtractor extractor = new PdfExtractor();
//Bind the input PDF document to extractor
extractor.bindPdf(path + "Image.pdf");
//Extract images from the input PDF document
extractor.extractImage();
String suffix = ".jpg";
int imageCount = 1;
while (extractor.hasNextImage()) {
extractor.getNextImage(path + imageCount + suffix);
imageCount++;
}