Sign In  Sign Up Live-Chat

Support for getting information from existing pdf document.

Last post 06-28-2007, 7:14 PM by GeorgieYuan. 45 replies.
Page 1 of 4 (46 items)   1 2 3 4 Next >
Sort Posts: Previous Next
  •  12-01-2006, 1:51 PM 62610

    Support for getting information from existing pdf document.

    Hi,

    I haven't yet get myself very familiarized with Aspose.PDF.  I have the impression that it handles mainly creating new pdf docs.

    Does it support manipulating existing document? Say, for example: I want to do a word count on the entire doc (or on each page), is it possible?  Another example is: If I want to extract all the text content of a pdf, and extract comment/note from an existing pdf?

    Thanks !

     
  •  12-01-2006, 5:45 PM 62616 in reply to 62610

    Re: Support for getting information from existing pdf document.

    The features you need is supported by Aspose.Pdf.Kit so I move this post to this forum.

    Please refer to PdfExtractor . Word counting and text extracting is supported but comment/note extracting is not supported yet.


    Tommy Wang
    Lead Developer
    Aspose Changsha Team
    About Us
    Contact Us
     
  •  04-13-2007, 12:27 PM 73114 in reply to 62616

    Re: Support for getting information from existing pdf document.

    Is there any estimation on when notes/comments extraction can be supported?

    Thanks!

     
  •  04-13-2007, 9:52 PM 73150 in reply to 73114

    Re: Support for getting information from existing pdf document.

    hi,becky_bai,

    Thank you for considering aspose products, and notes/comments extraction will be supported in the next hotfix version. Moreover, I want to know which information of comments you need, for example, rectangle, contents, createdate, popup flag, etc.

     


    Allen Wen
    Developer
    Aspose Changsha Team

     
  •  04-18-2007, 9:10 AM 73592 in reply to 73150

    Re: Support for getting information from existing pdf document.

    I will need to retrieve the following annotations:

    1. notes created by the Adobe Acrabat's note tool ( the balloon)

    2. free text created by using Adobe's Textbox tool.

    Also, is it possible to retrieve text elements by page?

    Thanks!

     

     
  •  04-20-2007, 1:37 PM 73882 in reply to 73150

    Re: Support for getting information from existing pdf document.

    Any estimate on the features I described in my last post?
    Thanks!
     
  •  04-20-2007, 5:39 PM 73899 in reply to 73882

    Re: Support for getting information from existing pdf document.

    hi,you can download the new dll of Aspose.pdf.kit2.4.1.In PdfContentEditor.cs, ExtractAnnotations() support to  extract  the content of the annotations specified type from a existing pdf document. Now the supported annotation types include "Text","Highlight", "Squiggly", "Strikeout" and "Underline". You can try to use it, if any other questions ,please dont hesitate to notify me.

    Allen Wen
    Developer
    Aspose Changsha Team

     
  •  04-25-2007, 9:08 AM 74361 in reply to 73899

    Re: Support for getting information from existing pdf document.

    Thanks I will try it shortly.

    One other question is, we need to ability to do word count per page, or to retrieve text elements per page, is it going to be implemented in the near future?

     

     

     
  •  04-25-2007, 9:44 AM 74370 in reply to 74361

    Re: Support for getting information from existing pdf document.

    Hi,

    It is difficult to get word count for some of the languages (such as Chinese) so we have no plans in short to support this feature. We only support to extract text from the whole PDF File. We won't recommend it, but if you want to use this feature then please refer to:

    http://www.aspose.com/wiki/default.aspx/Aspose.Pdf.Kit/CountingWords.html

    And if you need a work around then split Pdf to multiple PDFs having single page each. And then extract the text from each PDF File, so you can counter the text exracted from each page.

    Thanks.

    Adeel Ahmad
    Support Developer
    Aspose Changsha Team
    http://www.aspose.com/Wiki/default.aspx/Aspose.Corporate/ContactChangsha.html

     
  •  05-15-2007, 4:38 PM 76746 in reply to 74370

    Re: Support for getting information from existing pdf document.

    We actually don't have to do word count per page, understanding the problem with counting Asian characters. However, it is an important feature that we provide extracted text per page, or just a boolean representing whether text element exists on a page. Can this feature be implemented?

    Thanks!

     

     
  •  05-16-2007, 3:40 AM 76828 in reply to 76746

    Re: Support for getting information from existing pdf document.

    We will investigate this issue to see if we can support extracting text per page.
    Tommy Wang
    Lead Developer
    Aspose Changsha Team
    About Us
    Contact Us
     
  •  05-16-2007, 8:01 AM 76867 in reply to 76828

    Re: Support for getting information from existing pdf document.

    Could you let me know the estimated date of this feature if you are planning to support it?
     
  •  05-16-2007, 8:26 AM 76873 in reply to 76867

    Re: Support for getting information from existing pdf document.

    I have discussed with the developers and we think this feature is not difficult to support. We will soon give you a ETA of the feature.
    Tommy Wang
    Lead Developer
    Aspose Changsha Team
    About Us
    Contact Us
     
  •  05-16-2007, 10:30 AM 76907 in reply to 73899

    Re: Support for getting information from existing pdf document.

    Attachment: Present (inaccessible)

    I was trying the extract annotation function, I can't extract popup baloon and free text notes. Please see the attached two pdf files. These two types of annotations are what we want to extract.

    Thanks!

     
  •  05-16-2007, 11:30 AM 76922 in reply to 76907

    Re: Support for getting information from existing pdf document.

    Hi,

    1. I have checked and found that with the file named "File4_TextNotes.pdf" . Annotation are extracted. with the code :

    PdfContentEditor editor = new PdfContentEditor();

    string TestPath = @"D:\AsposeTest\TestData\";

    editor.BindPdf(TestPath + "File4_FreeTextNotes.pdf");

    string[] annotType ={ "Text", "Highlight" };

    ArrayList annotList = editor.ExtractAnnotations(1, 2, annotType);

    for (int i = 0; i < annotList.Count; i++)

    {

    Hashtable currentNode = (Hashtable)annotList[i];

    object partValue = null;

    foreach (string partName in currentNode.Keys)

    {

    partValue = currentNode[partName];

    if (partValue is string)

    {

    Console.WriteLine(partName + ":" + currentNode[partName].ToString());

    }

    }

    foreach (string partName in currentNode.Keys)

    {

    partValue = currentNode[partName];

    if (partValue is Hashtable)

    {

    Console.WriteLine(partName);

    Hashtable hashTable = (Hashtable)partValue;

    if (partName.Equals("contents-richtext"))

    Console.WriteLine(hashTable["Rc"].ToString().Substring(21));

    else

    {

    foreach (string name in hashTable.Keys)

    {

    Console.WriteLine(name + ":" + hashTable[name].ToString());

    }

    }

    }

    }

    }

    Console.ReadKey(false);

    2. I have checked with the file named "File4_FreeTextNotes.pdf"and found that it is not the noteType we support but it is the Text Box. I will discuss this issue with the developer and will let you know as soon as solution is found.

    Thanks.

    Adeel Ahmad
    Support Developer
    Aspose Changsha Team
    http://www.aspose.com/Wiki/default.aspx/Aspose.Corporate/ContactChangsha.html

     
Page 1 of 4 (46 items)   1 2 3 4 Next >
View as RSS news feed in XML