Sign UpSign Up   Sign InSign In Welcome Guest,
Live Chat Live Chat

PDF to TIF Conversion Issue

Last post 08-06-2010, 2:13 PM by aspose.notifier. 6 replies.
Sort Posts: Previous Next
  •  06-25-2009, 12:45 PM 185492

    PDF to TIF Conversion Issue .NET

    Attachment: Present (inaccessible)
    Hi,
    I have a requirement to convert an Image PDF to a Text PDF (searchable text)
    I am thinking of Aspose PDF Kit to convert the PDF to a TIF image and later user an OCR software to convert it to searchable PDF.

    I am using the following code to save a PDF document as TIF image.
    ---------------------Code------------------
    Aspose.Pdf.Kit.PdfConverter converter = new Aspose.Pdf.Kit.PdfConverter();
    converter.BindPdf(@"1448.pdf");
    converter.Resolution = 300;
    converter.DoConvert();
    //This is the typical way
    converter.SavaAsTIFF(@"1448.tif");
    ---------------------------------------------------------
    It takes a long time to execute the method: SaveASTIFF() [around 6 mins]
    And the the o/p was an empty TIF file (0 Kb)

    I am attaching the PDF that gave this problem along with.

    Please let me know your thoughts.

    Thanks.!



     
  •  06-26-2009, 7:46 AM 185615 in reply to 185492

    Re: PDF to TIF Conversion Issue

    Hi Sudheer,

    I have reproduced and logged this issue as PDFKITNET-9469 in our issue tracking system. Our team will be looking into the matter and you'll be updated via this forum as the issue is resolved.

    I would also like to add two more points for future reference:

    1. The default value of the Resolution property is 150. The higher resolution, the slower converting speed will be.

    2. In your code you have used SavaAsTiff method which is obsolete now. From now onwards, please use SaveAsTiff method; though, the problem is occuring with both methods, but we'll support SaveAsTiff method in future.

    We're sorry for the inconvenience.

    Regards,

     


    Shahzad Latif - [Follow me on Twitter!]
    Support Developer/Developer Evangelist
    Aspose Sialkot Team
    Aspose - Your File Format Experts

    Keep in touch! We're on Twitter and Facebook
     
  •  06-26-2009, 9:20 AM 185633 in reply to 185615

    Re: PDF to TIF Conversion Issue

    Thanks.
    I found that the attached PDF to TIF conversion using SaveAsTiff() works with resolution = 150.
    I am OK with 150.

    Is there a way to check pro grammatically whether a PDF is a Text or Image  PDF?
    I found this example:
    http://www.aspose.com/documentation/file-format-components/aspose.pdf.kit-for-.net-and-java/find-whether-pdf-file-contains-images-or-text-only.html

    But extractor.GetText() returns a few characters even for an Image PDF.
    Is there a more reliable method for identifying Text PDFs?





     
  •  06-26-2009, 12:48 PM 185667 in reply to 185633

    Re: PDF to TIF Conversion Issue

    Hi Sudheer,

    The article you mentioned was written to provide a workaround to find whether the PDF file contains text or image only. I'm afraid that there is no direct method to perform this functionality. However, this way, if the file doesn't contain any text explicitly it shouldn't return any text in the output. Can you please share the sample PDF with us, so we could test it at our end and help you out?

    Regards,

     


    Shahzad Latif - [Follow me on Twitter!]
    Support Developer/Developer Evangelist
    Aspose Sialkot Team
    Aspose - Your File Format Experts

    Keep in touch! We're on Twitter and Facebook
     
  •  07-03-2009, 3:31 PM 186848 in reply to 185667

    Re: PDF to TIF Conversion Issue

    Attachment: Present (inaccessible)
    I am attaching a PDF that just contains an image.
    So in this case, I shouldn't get any text at all after the extraction.

    extractor.ExtractText();
    //Save the extracted text to a text file
    extractor.GetText(ms);
    // Check if the MemoryStream length is greater than or equal to 1
    if (ms.Length >= 1)   

    Here I am getting ms.Length = 1.
    So I can't really distinguish b/w image & text PDF and do convert the image PDF to text PDF only if required.

    Please let me know your thoughts.

    Thanks.!
     
  •  07-06-2009, 8:48 AM 187057 in reply to 186848

    Re: PDF to TIF Conversion Issue

      

    Hi Sudheer,

    I have checked the code and the file as well. I have found that the file doesn't contain any text though, it contains two lines and a carriage return i.e. \n\r\n. So, there are two empty lines in the PDF which are being treated as text when extracted. You can check it using the following code:

     ms.Flush();
     ms.Position = 0;
     System.IO.StreamReader sr = new System.IO.StreamReader(ms);
      string s = sr.ReadToEnd();

    I would suggest you to check the output via your code to see that whether the PDF contains some valuable text or not. I'm afraid, Aspose.Pdf.Kit doesn't provide any method to check this.

    We're sorry for the inconvenience. If you have any further questions, please do let us know.

    Regards,

     


    Shahzad Latif - [Follow me on Twitter!]
    Support Developer/Developer Evangelist
    Aspose Sialkot Team
    Aspose - Your File Format Experts

    Keep in touch! We're on Twitter and Facebook
     
  •  08-06-2010, 2:13 PM 252774 in reply to 185492

    Re: PDF to TIF Conversion Issue

    The issues you have found earlier (filed as 9469) have been fixed in this update.


    This message was posted using Notification2Forum from Downloads module by aspose.notifier.
     
View as RSS news feed in XML