Generate Thumbnail Images from PDF Documents

Skip to end of metadata
Go to start of metadata
The Adobe Acrobat SDK is a set of tools that help you develop software that interacts with Acrobat technology. The SDK contains header files, type libraries, simple utilities, sample code, and documentation.

Using the Acrobat SDK, you can develop software that integrates with Acrobat and Adobe Reader in several ways:

  • JavaScript — Write scripts, either in an individual PDF document or externally, to extend the functionality of Acrobat or Adobe Reader.
  • Plug-ins — Create plug-ins that are dynamically linked to and extend the functionality of Acrobat or Adobe Reader.
  • Interapplication communication — Write a separate application process that uses interapplication communication (IAC) to control Acrobat functionality. DDE and OLE are supported on Microsoft® Windows®, and Apple events/AppleScript on Mac OS®. IAC is not available on UNIX®.

Aspose.Pdf for .NET provides a lot of the same functionality, freeing you from dependence on Adobe Acrobat Automation. This article shows how to generate thumbnail images from PDF documents using first the Acrobat SDK and then Aspose.Pdf.

Developing Application using Acrobat Interapplication communication API

Think of the Acrobat API as having two distinct layers that use Acrobat Interapplication Communication (IAC) objects:

  • The Acrobat application (AV) layer. The AV layer lets you control how the document is viewed. For example, the view of a document object resides in the layer associated with Acrobat.
  • The portable document (PD) layer. The PD layer provides access to the information within a document, such as a page. From the PD layer you can perform basic manipulations of PDF documents, such as deleting, moving, or replacing pages, as well as changing annotation attributes. You can also print PDF pages, select text, access manipulated text, and create or delete thumbnails.

As our intent is to convert PDF pages into thumbnail images, so we are focusing more over IAC. The IAC API contains objects such as PDDoc, PDPage, PDAnnot, and others, which enable the user to deal with the portable document (PD) layer. The following code sample scans a folder and converts PDF pages into thumbnail images. With using the Acrobat SDK, we could also read the PDF metadata and retrieve the number of pages in the document.

Acrobat Approach

In order to generate the thumbnail images for each document, we have used the Adobe Acrobat 7.0 SDK and the Microsoft .NET 2.0 Framework.

The Acrobat SDK combined with the full version of Adobe Acrobat exposes a COM library of objects (sadly the free Adobe Reader does not expose the COM interfaces) that can be used to manipulate and access PDF information. Using these COM objects via COM Interop, load the PDF document, get the first page and render that page to the clipboard. Then, with the .NET Framework, copy this to a bitmap, scale and combine the image and save the result as a GIF or PNG file.

Once Adobe Acrobat is installed, use regedit.exe and look under HKEY_CLASSES_ROOT for entry entry called AcroExch.PDDoc.

The registry showing the AcroExch.PDDDoc entry

C#
// Acrobat objects
Acrobat.CAcroPDDoc pdfDoc;
Acrobat.CAcroPDPage pdfPage;
Acrobat.CAcroRect pdfRect;
Acrobat.CAcroPoint pdfPoint;

AppSettingsReader appSettings = new AppSettingsReader();

string pdfInputPath = appSettings.GetValue("pdfInputPath", typeof(string)).ToString();
string pngOutputPath = appSettings.GetValue("pngOutputPath", typeof(string)).ToString();

string templatePortraitFile = Application.StartupPath + @"\pdftemplate_portrait.gif";
string templateLandscapeFile = Application.StartupPath + @"\pdftemplate_portrait.gif"; ;

try
{
    // Get list of files to process from the input path
    // Could change to read list from database instead
    string[] files = Directory.GetFiles(pdfInputPath, "*.pdf");

    for(int n = 0; n < files.Length; n++)
    {
        string inputFile = files[n].ToString();
        string outputFile = pngOutputPath + files[n].Substring(files[n].LastIndexOf(@"\") + 1).Replace(".pdf", ".png");

        /* Could skip if thumbnail already exists in output path
        FileInfo fi = new FileInfo(inputFile);
        if (!fi.Exists) {} */

        // Create the document (Can only create the AcroExch.PDDoc object using late-binding)
        // Note using VisualBasic helper functions, have to add reference to DLL in
        // C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\Microsoft.VisualBasic.dll
        // Will always be available as .NET framework ships with all
        pdfDoc = (Acrobat.CAcroPDDoc)Microsoft.VisualBasic.Interaction.CreateObject("AcroExch.PDDoc", "");

        int ret = pdfDoc.Open(inputFile);

        if (ret == 0)
        {
            throw new FileNotFoundException();
        }

       // Get the number of pages (to be used later if you wanted to store that information)
       int pageCount = pdfDoc.GetNumPages();

       // Get the first page
       pdfPage = (Acrobat.CAcroPDPage)pdfDoc.AcquirePage(0);
       pdfPoint = (Acrobat.CAcroPoint)pdfPage.GetSize();

       pdfRect = (Acrobat.CAcroRect)Microsoft.VisualBasic.Interaction.CreateObject("AcroExch.Rect", "");

       pdfRect.Left = 0;
       pdfRect.right = pdfPoint.x;
       pdfRect.Top = 0;
       pdfRect.bottom = pdfPoint.y;

       // Render to clipboard, scaled by 100 percent (ie. original size)
       // Even though we want a smaller image, better for us to scale in .NET
       // than Acrobat as it would greek out small text
       // see http://www.adobe.com/support/techdocs/1dd72.htm
       pdfPage.CopyToClipboard(pdfRect, 0, 0, 100);

       IDataObject clipboardData = Clipboard.GetDataObject();

       if (clipboardData.GetDataPresent(DataFormats.Bitmap))
       {
           Bitmap pdfBitmap = (Bitmap)clipboardData.GetData(DataFormats.Bitmap);

           // Size of generated thumbnail in pixels
           int thumbnailWidth = 45;
           int thumbnailHeight = 59;

           string templateFile;

           // Switch between portrait and landscape
           if (pdfPoint.x < pdfPoint.y)
           {
               templateFile = templatePortraitFile;
           }
           else
           {
               templateFile = templateLandscapeFile;
               // Swap width and height (little trick not using third temp variable)
               thumbnailWidth = thumbnailWidth ^ thumbnailHeight;
               thumbnailHeight = thumbnailWidth ^ thumbnailHeight;
               thumbnailWidth = thumbnailWidth ^ thumbnailHeight;
           }

          // Load the template graphic
          Bitmap templateBitmap = new Bitmap(templateFile);

          // Render to small image using the bitmap class
          Image pdfImage = pdfBitmap.GetThumbnailImage(thumbnailWidth, thumbnailHeight, null, IntPtr.Zero);

          // Create new blank bitmap (+ 7 for template border)
          Bitmap thumbnailBitmap = new Bitmap(thumbnailWidth + 7, thumbnailHeight + 7,
          System.Drawing.Imaging.PixelFormat.Format32bppArgb);

          // To overlayout the template with the image, we need to set the transparency
          // http://www.sellsbrothers.com/writing/default.aspx?content=dotnetimagerecoloring.htm
          templateBitmap.MakeTransparent();

          using (Graphics thumbnailGraphics = Graphics.FromImage(thumbnailBitmap))
          {
              // Draw rendered pdf image to new blank bitmap
              thumbnailGraphics.DrawImage(pdfImage, 2, 2, thumbnailWidth, thumbnailHeight);

              // Draw template outline over the bitmap (pdf with show through the transparent area)
              thumbnailGraphics.DrawImage(templateBitmap, 0, 0);

             // Save as .png file
             thumbnailBitmap.Save(outputFile, System.Drawing.Imaging.ImageFormat.Png);

             Console.WriteLine("Generated thumbnail... {0}", outputFile);
          }

          pdfDoc.Close();

          // Not sure how why it is to do this, but Acrobat is not the best behaved COM object
          // see http://blogs.msdn.com/yvesdolc/archive/2004/04/17/115379.aspx
          Marshal.ReleaseComObject(pdfPage);
          Marshal.ReleaseComObject(pdfRect);
          Marshal.ReleaseComObject(pdfDoc);
      }
   }
}
catch (System.Exception ex)
{
Console.Write(ex.ToString());
}

Aspose.Pdf for .NET Approach

Aspose.Pdf for .NET provides extensive support for dealing with PDF documents. It also supports the capability to convert the pages of PDF documents to a variety of image formats. The functionality described above can easily be achieved using Aspose.Pdf for .NET.

Aspose.Pdf has distinct benefits:

  • You don't need to have Adobe Acrobat installed on your system to work with PDF files.
  • Using Aspose.Pdf for .NET is simple and easy to understand as compared to Acrobat Automation.

If we need to convert PDF pages into JPEGs, the Aspose.Pdf.Devices namespace provides a class named JpegDevice for rendering PDF pages into JPEG images. Please take a look over the following code snippet.

C#
// Retrieve names of all the PDF files in a particular directory
string[] fileEntries = Directory.GetFiles(@"D:\pdftest\", "*.pdf");

// Iterate through all the files entries in array
for(int counter=0; counter<fileEntries.Length; counter++)
{
    //Open document
    Document pdfDocument = new Document(fileEntries[counter]);

    for (int pageCount = 1; pageCount <= pdfDocument.Pages.Count; pageCount++)
    {
        using (FileStream imageStream = new FileStream(@"D:\pdftest\image_"+counter.ToString() +"_"+ pageCount + ".jpg", FileMode.Create))
        {
            //Create Resolution object
            Aspose.Pdf.Devices.Resolution resolution = new Aspose.Pdf.Devices.Resolution(300);
            //JpegDevice jpegDevice = new JpegDevice(500, 700, resolution, 100);
            JpegDevice jpegDevice = new JpegDevice(45,59,resolution, 100);
            //Convert a particular page and save the image to stream
            jpegDevice.Process(pdfDocument.Pages[pageCount], imageStream);
            //Close stream
            imageStream.Close();
        }
    }
}
VB.NET
' Retrieve names of all the PDF files in a particular directory
       Dim fileEntries() As String = Directory.GetFiles("D:\pdftest\", "*.pdf")
       ' Iterate through all the files entries in array
       For counter As Integer = 0 To counter < fileEntries.Length
           'Open document
           Dim pdfDocument As New Document(fileEntries(counter))
           For pageCount As Integer = 1 To pdfDocument.Pages.Count
               Using imageStream As New FileStream("D:\pdftest\image_" & counter.ToString() & "_" & pageCount & ".jpg", FileMode.Create)
                   'Create Resolution object
                   Dim resolution As New Aspose.Pdf.Devices.Resolution(300)
                   'JpegDevice jpegDevice = new JpegDevice(45, 59, resolution,100);
                   Dim jpegDevice As New Aspose.Pdf.Devices.JpegDevice(45, 59, resolution, 100)
                   'Convert a particular page and save the image to stream
                   jpegDevice.Process(pdfDocument.Pages(pageCount), imageStream)
                   'Close stream
                   imageStream.Close()
               End Using
           Next pageCount
       Next counter
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.