Sign UpSign Up   Sign InSign In Welcome Guest,
Live Chat Live Chat

Some texts are missed while using PdfExtractor to extract text

Last post 03-24-2009, 11:30 PM by forever. 19 replies.
Page 1 of 2 (20 items)   1 2 Next >
Sort Posts: Previous Next
  •  10-01-2008, 4:39 PM 146293

    Some texts are missed while using PdfExtractor to extract text

    I am using Aspose.Pdf.Kit with Assembly version 2.6.4 for Aspose.Pdf.Kit.dll.  The product version for this file is "2007.10.28".

    I wrote very simple code like following to extract text out of PDF files.

    PdfExtractor extractor = new PdfExtractor();

    extractor.BindPdf(pdfFile);

    extractor.ExtractTextMode = 1;

    extractor.ExtractText();

    extractor.GetText(textFileName);

    The above code get what I want for some PDF files.  However, for certain PDF files, the text output skipped some text.

    Can you shed some light here?  I will post the PDF file I was using in my next post.

    Thanks

    Roger

     

     
  •  10-01-2008, 4:47 PM 146296 in reply to 146293

    Re: Some texts are missed while using PdfExtractor to extract text

    Attachment: Present (inaccessible)

    Here is the PDF file I was using and the text file that was extracted from the PDF file.  As you can tell, the extracted text file missed "Average Delta" and "Expected Close Ratio"

     
  •  10-01-2008, 5:43 PM 146305 in reply to 146296

    Re: Some texts are missed while using PdfExtractor to extract text

    Hello Roger,

    I have tested the issue and I’m able to reproduce the same problem. I have logged it in our issue tracking system as PDFKITNET-6091. We will investigate this issue in detail and will keep you updated on the status of a correction. We apologize for your inconvenience.

     

    FYI: PdfExtractor is in Beta version. Some features may not be supported well and we may be not able to fix them in short time.


    Nayyer Shahbaz
    Support Developer, Aspose Sialkot Team
    About Us
    Contact Us

    Keep in touch! We're on Twitter and Facebook
     
  •  10-02-2008, 11:43 AM 146408 in reply to 146305

    Re: Some texts are missed while using PdfExtractor to extract text

    Thanks for the information and working on a fix.  I want to express the urgency here.  We are trying to get production ready and this is the last show-stopper.

    Please also help clarify a confusion here.  We are using version 2.6.4 of Aspose.Pdf.Kit.  PDfExtractor is part of this package.  How can it be in Beta?  Are you sure you are looking at the right software package?

    Thanks

    Roger

     

     
  •  10-02-2008, 3:58 PM 146432 in reply to 146408

    Re: Some texts are missed while using PdfExtractor to extract text

    Hello Roger,

    The current version of Aspose.Pdf.Kit is 3.2.0.0 and PdfExtractor has been marked as Beta in this version. Our development team is working hard to get this feature working properly. As soon as we have made some progress, we will keep you updated with the status.


    Nayyer Shahbaz
    Support Developer, Aspose Sialkot Team
    About Us
    Contact Us

    Keep in touch! We're on Twitter and Facebook
     
  •  10-03-2008, 10:00 AM 146536 in reply to 146432

    Re: Some texts are missed while using PdfExtractor to extract text

    Hi Nayyer,

    Thanks for the explanation.  We really need this issue to be addressed soon since it is holding our production release.

    I found a bit more information about this problem and I want to share it here.  I hope this can help your developer to work on a fix or suggest any workaround for us at this moment.

    Here is a bit background information.  We programatically generate RDL file and deploy it to Reporting Services 2005.  Then ask the RS server to render the report in PDF.  Here is the RDL snippet that matters:

    <TableCell>

        <ReportItems>

                <Textbox Name = "Name for Average Delta">

                       <Value> = CStr("Average" + Environment.NewLine + "Delta")</Value>

    As you can tell, it is desirable to have a line break (or NewLine) between "Average" and "Delta" so the rendered PDF would show the header in two lines.  However, the PdfExtractor could not handle a new line.  If I change the last line of the above RDL snippet to:

                        <Value>=CStr("Average Delta")</Value>

    The PdfExractor can then get the "Average Delta" out of the PDF.  However, the PDF file is not in the correct format since it renders "Average Delta" in one line and the requirement is two lines.

    So, I suggest that you relay the above informaiton to your expertises in both PdfExtractor and ReportingService.  They may be able to figure the problem quick or suggest a workaround for me.

    Can you give me your phone number so I can contact you and work with your developer?  We really meed to address this issue since it is holding our production release.  Whatever it takes, we need a fix.

    Thanks

    Roger

     

     
  •  10-05-2008, 11:21 AM 146611 in reply to 146536

    Re: Some texts are missed while using PdfExtractor to extract text

    Hi Roger,

    Thanks for the important information. We will try our best to resolve the issue and inform you this week,whether we could resolve it or not.

    Thanks,

    Felix Liu
    Developer
    Aspose Changsha Team
    About Us
    Contact Us
     
  •  10-15-2008, 7:58 PM 148128 in reply to 146611

    Re: Some texts are missed while using PdfExtractor to extract text

    Hi Roger,

     We are still working on the issue but have encountered some technical difficulty. I hope we could resolve the issue in next month. Sorry for the inconvenience.

    Thanks,


    Felix Liu
    Developer
    Aspose Changsha Team
    About Us
    Contact Us
     
  •  10-16-2008, 5:27 PM 148276 in reply to 148128

    Re: Some texts are missed while using PdfExtractor to extract text

    Is there a way to speed this up so we can get the fix by the end of this month (October)?

    Look forward to the fix!  Thanks.

     
  •  10-16-2008, 8:56 PM 148289 in reply to 148276

    Re: Some texts are missed while using PdfExtractor to extract text

    Hi,

    The issue has been upgraded to a higher PRI, and we will try to speed that up.

    Thanks,

    Felix Liu
    Developer
    Aspose Changsha Team
    About Us
    Contact Us
     
  •  01-07-2009, 9:26 AM 159051 in reply to 148289

    Re: Some texts are missed while using PdfExtractor to extract text

    Any update?  Our clients are still waiting for the fix.

     

    Thanks

    Roger

     
  •  01-07-2009, 11:59 PM 159134 in reply to 159051

    Re: Some texts are missed while using PdfExtractor to extract text

    Hi Roger,

    Sorry for that we have not made any progress on the issue PDFKITNET-6091 yet, and the issue had been moved to our difficult issues library in which the amount of the unresolved difficult issues add up to 37. We will centralize our power to resolve those difficult issues from April to June this year, and our goal is to resolve 80% of them. I hope this issue could be resolved during that time. 

    We are really sorry for the inconvenience you have experienced, and your understanding and patience will be greatly appreciated.

    Thanks,


    Felix Liu
    Developer
    Aspose Changsha Team
    About Us
    Contact Us
     
  •  01-09-2009, 2:30 AM 159393 in reply to 159134

    Re: Some texts are missed while using PdfExtractor to extract text

    Hi Roger,

    I discussed this issue again with the developers. We find the text can't be extracted correctly even if by Adobe Acrobat, so we think this file may be very special. Since you are urgent we decide to spend more time on this issue. I hope we can get good result late this month or early next month. Sorry for the inconvenience.


    Tommy Wang
    Lead Developer
    Aspose Changsha Team
    About Us
    Contact Us
     
  •  01-12-2009, 9:30 AM 159726 in reply to 159393

    Re: Some texts are missed while using PdfExtractor to extract text

    Appreciate your effort!  Look forward to a solution.
     
  •  02-04-2009, 3:23 PM 163495 in reply to 159726

    Re: Some texts are missed while using PdfExtractor to extract text

    Hi Tommy,

    I wanted to follow up and see how you are coming on a solution for Roger?  You had mentioned you would reply by either the end of January or early this month and have not yet done so.  Please let us know your progress and anticipated release date.  I appreciate your time and help very much.


    Danny Cooper
    Team Leader, Aspose Texas Team
    Contact Us
     
Page 1 of 2 (20 items)   1 2 Next >
View as RSS news feed in XML