Extract Content between Pages

Hi Alicia,

I suppose this is happening due to the reason stated in my last post, the table contained with the page spans across multiple pages.

I think I have a nice solution to this, I will provide you with some code within a day.

Thanks,

Thanks! I am using the version that I downloaded with the link provided:
https://releases.aspose.com/words/net/

Is this the correct version?

Hello,

Thank you for additional information.
Yes, the link is right. The latest version of our product 9.8.0.0.
Please wait a little longer, Adam will give you the code.

Any luck with this?

Thanks for your help!

Hi there,

Thanks for your inquiry.

I’m afraid this is still a work in progress. I haven’t had time to finish coding it yet. I will look into doing this in the weekend. I appreciate your patience.

Thanks,

Hi Alicia,

Thanks for waiting.

Please find attached an upated version of the PageNumberFinder class. This update includes a new method SplitNodesAcrossPages that you can use to beable to extract pages into separate document properly.

You can use the code like below to extract pages to an external document. The SplitNodes method will split the sections of the document which contain content across multiple pages into separate sections, which are one per page. You can then extract each page by extracting each section and insert it into a new document.

Document doc = new Document("Document.docx");
// Set up the document which pages will be copied to. Remove the empty section.
Document dstDoc = new Document();
dstDoc.RemoveAllChildren();

PageNumberFinder finder = new PageNumberFinder(doc);

// Split nodes which are found across pages.
finder.SplitNodesAcrossPages(true);

// Copy all content including headers and footers from the specified pages into the destination document.
ArrayList pageSections = finder.RetrieveAllNodesOnPage(3, 5, NodeType.Section);

foreach (Section section in pageSections)
    dstDoc.AppendChild(section);

dstDoc.Save(dataDir + "Document Out.docx");

If you have any issues, please attach your document here for testing.

Thanks,

Thank you, Adam.

I copied your code and replaced the old PageFinder.cs with the one you supplied. When I run the code, however, I am getting the following error: “The newChild was created from a different document than the one that created this node.” on this line:

dstDoc.AppendChild(section)

Have I forgotten a step?

Hi

Thanks for your request. I think, you should just use NodeImprter in this case to import section to the destination document:

Document doc = new Document("Document.docx");
// Set up the document which pages will be copied to. Remove the empty section.
Document dstDoc = new Document();
dstDoc.RemoveAllChildren();

PageNumberFinder finder = new PageNumberFinder(doc);

// Split nodes which are found across pages.
finder.SplitNodesAcrossPages(true);

// Copy all content including headers and footers from the specified pages into the destination document.
NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.UseDestinationStyles);
for (int page = 3; page <= 5; page++)
{
    List<Node> pageSections = finder.RetrieveAllNodesOnPage(page, true, NodeType.Section);
    foreach (Section section in pageSections)
    {
        dstDoc.AppendChild(importer.ImportNode(section, true));
    }
}

dstDoc.Save(dataDir + "Document Out.docx";

Best regards,

Can you provide me with any information as to when this feature will be supported? Is there a scheduled release date?

AndreyN:
Hi

Thanks for your request. Word document is flow document and does not contain any information about its layout into lines and pages. Therefore, technically there is no “Page” concept in Word document.

Aspose.Words uses our own Rendering Engine to layout documents into pages. And we have plans to expose layout information. Your request has been linked to the appropriate issue. You will be notified as soon as this feature is supported.

Also, I think, as a workaround you can try using PageNumberFinder class suggested by Adam in this thread:

https://forum.aspose.com/t/58199

Best regards,

I was speaking in reference to the above post…

Hi

Thanks for your request. Unfortunately, the issue is not planed yet. So I cannot provide you a reliable estimate regarding this feature. We will consider exposing layout information of node in future, but no timeframe is available yet.

Best regards,

The code provided seems to be working to my specifications, with just one flaw. The page numbers are being reset in the cloned documents. I need to retain the original page numbers. Is this possible?

Thank you so much for all of the assistance.

Hi there,

Thanks for your inquiry.

Could you please attach your input and code here which allows me to reproduce the issue? I will take a closer look into this for you.

Thanks,

here is the code and a sample:

public static Document ExtractContentBetweenPages(Document srcDoc, int fromPage, int toPage)
{
    // Set up the document which pages will be copied to. Remove the empty section.
    Document dstDoc = new Document();
    dstDoc.RemoveAllChildren();
    PageNumberFinder finder = new PageNumberFinder(srcDoc);
    // Split nodes which are found across pages.
    finder.SplitNodesAcrossPages(true);
    // Copy all content including headers and footers from the specified pages into the destination document.
    NodeImporter importer = new NodeImporter(srcDoc, dstDoc, ImportFormatMode.UseDestinationStyles);
    for (int page = fromPage; page <= toPage; page++)
    {
        List<Node> pageSections = finder.RetrieveAllNodesOnPage(page, true, NodeType.Section);
        foreach (Section section in pageSections)
        {
            //dstDoc.AppendChild(section);
            dstDoc.AppendChild(importer.ImportNode(section, true));
        }
    }
    return dstDoc;
}

Thanks for this additional information.

This was a minor bug which I have fixed, please try downloading the class again.

Thanks,

Thank you for this, it seems to have fixed the problem!

I am seeing one other problem now, however. The code (as a whole) does not seem to work with DOCX files. I get the following attached error in the debug.

Hi

Thanks for your request. Could you please attach a sample document that causes this problem?

Best regards,

simple test document attached. this produces the error that I included in the above post.

Hi there,

Thanks for your inquiry.

I can’t reproduce any problem on my side. Make sure that values you pass to your method are within the valid page range (1 to 4 in the case of your document).

Thanks,

I am definitely using a valid page range. May I ask, are you running this code in a Windows Form or a Web form? Because it actually does work in a Windows form…however, I need it to work in a Web form. It doesn’t make any sense to me why the DOCX files don’t work in both…the code is identical. I must be missing SOMETHING.