Duplicate content extracted from PPT table

Last post 07-01-2009, 12:21 PM by alcrus. 1 replies.
Sort Posts: Previous Next
  •  07-01-2009, 11:56 AM 186442

    Duplicate content extracted from PPT table .NET

    Attachment: Present (inaccessible)
    Hello,

    In the attached PPT document, we have two slides with one table each. When we extract content from thess tables, we get a lot of duplicate text. 

    The first table seems to have a matrix of 6 columns x 17 rows. Visually it only has 5 columns (we suspect that there a merged/hidden column at index 2). So we have a duplicate content from this hidden column. More strikingly, whenever we have empty/merged cells in some rows, we get duplicate text from these "empty" cells.

    Our program really looks at each item in the table by its row/column coordinates, as shown in the accompanying .DOC file.

    Is this a bug, or is there a way to better navigate a table by considering individual cell properties to skip irrelevant ones?
     
  •  07-01-2009, 12:21 PM 186445 in reply to 186442

    Re: Duplicate content extracted from PPT table

    Hello,

    If you need only extract text without changing any table's properties it's better to cast Table to GroupShape and iterate all shapes inside. Find all Rectangles and extract text. By the way, it should work much quicker. Cells usually saved in ppt format in reverse order so you should start iteration from the last shape.

    Alexey Zhilin
    Team Leader
    Aspose Tyumen Team
    About Us
    Contact Us
     
View as RSS news feed in XML