Aspose.Pdf.Kit hangs indefinitely during text extraction

Last post 05-16-2012, 3:38 AM by nausherwan.aslam. 13 replies.
Sort Posts: Previous Next
  •  01-19-2012, 5:52 PM 356363

    Aspose.Pdf.Kit hangs indefinitely during text extraction .NET

    Attachment: Present (inaccessible)

    Hello,

    I'm currently test driving the pdf kit for my company and I've noticed some problematic behaviour, as suggested in the Subject of this post.

    For example:

    Aspose.Pdf.Document doc = new Aspose.Pdf.Document(@"blabla.pdf");

    Aspose.Pdf.Text.TextOptions.TextExtractionOptions options = new Aspose.Pdf.Text.TextOptions.TextExtractionOptions(Aspose.Pdf.Text.TextOptions.TextExtractionOptions.TextFormattingMode.Pure);

    Aspose.Pdf.Text.TextAbsorber absorber = new Aspose.Pdf.Text.TextAbsorber(options);

    Aspose.Pdf.DocumentInfo info = doc.Info; // works

    string version = doc.Version; // works

    doc.Pages.Accept(absorber); // hangs

     

    Sample files attached. They all seem to have the same thing in common, namely they are version1.2 pdfs.

    I should add that Acrobat 9 Pro can extract the text from some of them but it fails for others.

     
  •  01-20-2012, 4:18 AM 356454 in reply to 356363

    Re: Aspose.Pdf.Kit hangs indefinitely during text extraction

    Hi Cosmin,

    Thank you for sharing the template files and sample code.

    I am able to reproduce your mentioned issue after an initial test. Your issue has been registered in our issue tracking system with issue id: PDFNEWNET-33165. We will notify you via this forum thread regarding any update against your reported issue.

    Sorry for inconvenience,


    Nausherwan Aslam
    Support Developer,
    Aspose Sialkot Team
    Contact Us
     
  •  02-01-2012, 4:23 PM 358788 in reply to 356454

    Re: Aspose.Pdf.Kit hangs indefinitely during text extraction .NET

    Hi guys.

    My company is prepared to acquire a document conversion library in order to automate some tasks related to our 5.000.000+ documents. My role is to programmatically evaluate different technologies and suggest the best one for our needs. 

    The finalists are Aspose Total and the equivalent document conversion technologies from activePDF. Personally I was inclined to choose Aspose,  but the bug listed above has really pushed me into reconsidering: 

    I write processes that are supposed to run continuously for months and as you can imagine, I cannot afford to find my process stalled when I come back at work after a weekend and the overall productivity down 20% just because the library's methods time out instead of throwing an exception.

    Detaching secondary threads with hard coded time outs to assess if a PDF file hangs the conversion or not is an absolute NO-NO for what we need to do (think performance impact and missing convertible files due to size or/and temporary down spikes in network speed for example)

    I guess that my question is: WHEN will this be fixed ? If the issue can be patched within the next days, I'm prepared to suggest the purchase of your libraries. But if this bug will take months to be fixed, unfortunately I will need to suggest the competition. I'd be very sad to do so, because I LOVE the way Aspose works, when it works, but really I have no choice.

    What would you guys advise ?

    Thanks.
     
  •  02-02-2012, 1:53 AM 358867 in reply to 358788

    Re: Aspose.Pdf.Kit hangs indefinitely during text extraction

    Hi Cosmin,

    Thank you for your interest in Aspose.Pdf for .NET.

    I have requested our development team to share an ETA regarding your reported issue. As soon as I get a response, I will update you.

    Also, we believe in providing the best support to our customers and we are continuously improving our products to provide more and better features for our customers. We are sorry for the inconvenience this issue has caused to you and we will definitely look into your issue and fix it soon. However, our development schedule has many new feature requests, updates and bug fixings from other customers too, so we will update you regarding the ETA of the fix according to our development schedule (as soon as it is shared by the development team).

    Sorry for the inconvenience,


    Nausherwan Aslam
    Support Developer,
    Aspose Sialkot Team
    Contact Us
     
  •  02-03-2012, 3:40 AM 359138 in reply to 358788

    Re: Aspose.Pdf.Kit hangs indefinitely during text extraction

    Hi Cosmin,

    I have got a response from the development team and they have fixed the issue your reported. The fix will be a part of our upcoming official release of Aspose.Pdf for .NET v6.7. The release is in testing process at the moment and will be available for download soon. You will be notified via this forum thread once the version is available for download.

    Thank you for being patient,


    Nausherwan Aslam
    Support Developer,
    Aspose Sialkot Team
    Contact Us
     
  •  02-10-2012, 5:12 AM 360661 in reply to 356363

    Re: Aspose.Pdf.Kit hangs indefinitely during text extraction

    The issues you have found earlier (filed as PDFNEWNET-33165) have been fixed in this update.


    This message was posted using Notification2Forum from Downloads module by aspose.notifier.
     
  •  03-14-2012, 9:58 PM 368323 in reply to 360661

    Re: Aspose.Pdf.Kit hangs indefinitely during text extraction .NET

    Attachment: Present (inaccessible)

    Hi. Unfortunately I must reopen this issue, simply because I'm encountering it again. I was using an anterior version of the Pdf Kit, part of the Aspose.Total package, but upon downloading the latest Aspose.Total release (the one from 12th of March, including Aspose.Pdf for .NET 6.8.0) I'm able to reproduce it against it as well.

    Immagine this code:

                   TextAbsorber tA = null;

                    Document doc = null;

                     try
                        {

                            tA = new TextAbsorber(o); // extraction options set to PURE
                            doc = new Document(fPath); // hangs here
                            doc.Pages.Accept(tA); // I never get here
                        }
                        catch { }
                        finally
                        {
                            if (!(doc == null)) doc.Dispose();
                            if (!(tA == null)) tA = null;
                        }

     

    The archive contains the problematic files. All of them are PDF version 1.3 , as in the previous case.

     
  •  03-15-2012, 2:39 AM 368384 in reply to 368323

    Re: Aspose.Pdf.Kit hangs indefinitely during text extraction

    Hello Cosmin,

    Thanks for contacting support.

    I have again tested the scenario with Aspose.Pdf for .NET 6.8.0 while using the sample PDF documents that you have shared and I am able to notice the same problem. For the sake of correction, I have re-opened the earlier logged issue PDFNEWNET-33165. We will further investigate this issue while using these PDF documents and will keep you updated on the status of correction. Please be patient and spare us little time. We are really sorry for your inconvenience.

    Nayyer Shahbaz
    Support Developer, Aspose Sialkot Team
    About Us
    Contact Us

    Keep in touch! We're on Twitter and Facebook
     
  •  03-21-2012, 9:35 PM 369970 in reply to 368384

    Re: Aspose.Pdf.Kit hangs indefinitely during text extraction

    codewarior ,

    I really hope you guys will manage to fix this for good this time. I'm guessing that is totally possible to "convince" your code to give a meaningful exception instead of bluntly hanging up.

    Thanks.
     
  •  03-22-2012, 2:29 AM 370015 in reply to 369970

    Re: Aspose.Pdf.Kit hangs indefinitely during text extraction

    Hello Cosmin,

    Thanks for sharing your thoughts.

    Our first priority is to make text extraction algorithm smart enough that it can deal with most types of PDF files. However as per your suggestion, if the process is taking long enough or its hanged right in the middle of text extraction, a decent error message should be displayed. We will definitely consider this suggestion during the resolution of this problem. Please be patient and spare us little time. We are sorry for your inconvenience.

    Nayyer Shahbaz
    Support Developer, Aspose Sialkot Team
    About Us
    Contact Us

    Keep in touch! We're on Twitter and Facebook
     
  •  04-14-2012, 7:05 PM 375651 in reply to 370015

    Re: Aspose.Pdf.Kit hangs indefinitely during text extraction

    I see that Aspose.Pdf for .NET 6.9.0 has been released and no word of this issue. When should we espect this bug to be fixed, it's becoming really annoying. :/

    Thanks
     
  •  04-15-2012, 6:14 AM 375674 in reply to 375651

    Re: Aspose.Pdf.Kit hangs indefinitely during text extraction

    Hello Cosmin,

    Thanks for your patience.

    I am pleased to share that the issue related to application hanging during text extraction is resolved in latest release version of Aspose.Pdf for .NET 6.9.0. Please try using it and in case the problem setill persists or you have any further query, please feel free to contact. We are sorry for your inconvenience.

    More along, I have observed that the source PDF files that you have shared only contain images. So you may consider extracting images from PDF file using following code snippet and then you may consider using Aspose.OCR for .NET to recognize text inside images.

    [C#]
    //open document
    Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(@"D:\pdftest\hangs\hang8.pdf");
    for (int pageCount = 1; pageCount <= pdfDocument.Pages.Count; pageCount++)
    {
     //extract a particular image
      XImage xImage = pdfDocument.Pages[pageCount].Resources.Images[1];
      FileStream outputImage = new FileStream(@"D:\pdftest\hangs\hang8-output" + pageCount.ToString() + ".jpg", FileMode.Create);
      //save output image
      xImage.Save(outputImage, System.Drawing.Imaging.ImageFormat.Jpeg);
      outputImage.Close();
    }
    //save updated PDF file
    pdfDocument.Save("output.pdf");

    Nayyer Shahbaz
    Support Developer, Aspose Sialkot Team
    About Us
    Contact Us

    Keep in touch! We're on Twitter and Facebook
     
  •  05-15-2012, 5:57 PM 383027 in reply to 375674

    Re: Aspose.Pdf.Kit hangs indefinitely during text extraction .NET

    Attachment: Present (inaccessible)
    No, it's not fixed. I've bumped into it AGAIN. As you can see, I reported this in JANUARY, and 5 months later nothing of consequence has been achieved. Your methods still hang instead of timing out or throwing meaningful exceptions.

    This is becoming embarrassing for me since I'm the one who recommended this library to my company.

    Good job Aspose.

    PS: sample file attached
     
  •  05-16-2012, 3:38 AM 383126 in reply to 383027

    Re: Aspose.Pdf.Kit hangs indefinitely during text extraction

    Hi Cosmin,

     

    First of all please accept our apologies for the inconvenience and thank you for sharing the template file with us. I am able to generate the issue using your template file. I have registered a high priority issue in our issue tracking system with issue id: PDFNEWNET-33694. I have also asked the development team to look into this issue and share the feedback and details regarding the cause of the issue. I will update you once I get a feedback from the development side.

     

    We are sorry for the inconvenience,


    Nausherwan Aslam
    Support Developer,
    Aspose Sialkot Team
    Contact Us
     
View as RSS news feed in XML