Sign In  Sign Up Live-Chat

Splitting Word document into different files

Last post 05-14-2008, 4:55 AM by alexey.noskov. 9 replies.
Sort Posts: Previous Next
  •  05-08-2008, 6:27 AM 126000

    Splitting Word document into different files

    Hello.

    We need to split a Word document into different files. For each entry of the TOC there must be a seperate file. Can this be done with your framework?
    We are using Java as Programming language.

    Thanks a lot.

    This message was posted using Page2Forum from Work with Ranges - Aspose.Words for .NET and Java
     
  •  05-08-2008, 8:23 AM 126023 in reply to 126000

    Re: Splitting Word document into different files

    Attachment: Present (inaccessible)

    Hi

     

    Thanks for your request. Yes I think that you can achieve this using Aspose.Words. For example see the attached document and the following code:

     

    //Open document

    Document doc = new Document("in.doc");

    //Get collection of Paragraphs

    NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);

    Paragraph par = null;

    int docIndex = 0;

    //Loop through all paragraphs in the document

    for (int parIndex = 0; parIndex < paragraphs.getCount(); parIndex++)

    {

        par = (Paragraph)paragraphs.get(parIndex);

        //If Paragraph style = HEADING_1 then copy content to the new document

        if (par.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1)

        {

            //Create new document

            Document outDoc = new Document();

            Node currentNode = par;

            while (currentNode != null)

            {

                //Import Node

                Node importedNode = outDoc.importNode(currentNode, true, ImportFormatMode.KEEP_SOURCE_FORMATTING);

                //insert node into the new document

                outDoc.getFirstSection().getBody().appendChild(importedNode);

                //If next node=null then move to the next section

                if (currentNode.getNextSibling() == null)

                {

                    //Get next section

                    Section currrentSection = (Section)currentNode.getAncestor(NodeType.SECTION).getNextSibling();

                    //If next section != null then get its first child

                    if (currrentSection != null)

                        currentNode = currrentSection.getBody().getFirstChild();

                    else

                        break; //else exit from while

                }

                else

                {

                    //Get next node

                    currentNode = currentNode.getNextSibling();

                }

                //Check if current node is paragraph

                if (currentNode.getNodeType() == NodeType.PARAGRAPH)

                {

                    //Check if its style is HEADING_1

                    if (((Paragraph)currentNode).getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1)

                    {

                        //If so then set par index and exit while

                        parIndex = paragraphs.indexOf(currentNode) - 1;

                        break;

                    }

                }

            }

            //Save output document

            outDoc.save("Section_" + String.valueOf(docIndex) + ".doc");

            //increase docIndex

            docIndex++;

        }

    }

     

    I hope this could help you.

     

    Best regards.


    Alexey Noskov
    Developer/Technical Support
    Aspose Auckland Team
     
  •  05-09-2008, 2:38 AM 126169 in reply to 126023

    Re: Splitting Word document into different files

    Hi.

    Thanks for the fast reply. I tried the above code and it works fine for small documents. We have a very large document (about 130 MB) that has to be split in several documents. When I load that document it seems that only the first part of the document is parsed. When I copy a part of the document into a new Word doc and parse this one, it is working fine again.

    Is this a restriction of the evaluation version?

    Thanks.

    Thomas
     
  •  05-09-2008, 4:32 AM 126191 in reply to 126169

    Re: Splitting Word document into different files

    Hi

     

    Thanks for your request. Aspose.Words in evaluation mode limits the maximum document size to several hundred paragraphs. Please see the following link to learn more about limitations.

    http://www.aspose.com/documentation/file-format-components/aspose.words-for-.net-and-java/evaluate-aspose-words.html

     

    If you want to test Aspose.Words without evaluation version limitations, you can also request a 30-Day Temporary License. See the following link.

    http://www.aspose.com/corporate/temporary-license.aspx

     

    Best regards.


    Alexey Noskov
    Developer/Technical Support
    Aspose Auckland Team
     
  •  05-09-2008, 7:35 AM 126221 in reply to 126191

    Re: Splitting Word document into different files

    Ok. With the temporary license it works fine.

    Now there is one more open point. How can I copy the page layout (e.g. landscape) , the pagestyle and the background image to the splitted documents.

    Thanks.

    Thomas
     
  •  05-09-2008, 9:16 AM 126247 in reply to 126221

    Re: Splitting Word document into different files

    Hi

     

    Thanks for your request. Yes, of course you can achieve this. Please try using the following code:

     

    //Open document

    Document doc = new Document("in.doc");

    //Get collection of Paragraphs

    NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);

    Paragraph par = null;

    int docIndex = 0;

    //Loop through all paragraphs in the document

    for (int parIndex = 0; parIndex < paragraphs.getCount(); parIndex++)

    {

        par = (Paragraph)paragraphs.get(parIndex);

        //If Paragraph style = HEADING_1 then copy content to the new document

        if (par.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1)

        {

            //Create new document

            Document outDoc = new Document();

            //Remove sections from document

            outDoc.removeAllChildren();

            Node currentNode = par;

            //import section from src document without its children

            Section srcSect = (Section)outDoc.importNode(currentNode.getAncestor(NodeType.SECTION), true, ImportFormatMode.KEEP_SOURCE_FORMATTING);

            outDoc.appendChild(srcSect);

            srcSect.getBody().removeAllChildren();

            while (currentNode != null)

            {

                //Import Node

                Node importedNode = outDoc.importNode(currentNode, true, ImportFormatMode.KEEP_SOURCE_FORMATTING);

                //insert node into the new document

                outDoc.getLastSection().getBody().appendChild(importedNode);

                //If next node=null then move to the next section

                if (currentNode.getNextSibling() == null)

                {

                    //Get next section

                    Section currrentSection = (Section)currentNode.getAncestor(NodeType.SECTION).getNextSibling();

                    //If next section != null then get its first child

                    if (currrentSection != null)

                    {

                        Section newSect = (Section)outDoc.importNode(currrentSection, true, ImportFormatMode.KEEP_SOURCE_FORMATTING);

                        outDoc.appendChild(newSect);

                        newSect.getBody().removeAllChildren();

                        currentNode = currrentSection.getBody().getFirstChild();

                    }

                    else

                    {

                        break; //else exit from while

                    }

                }

                else

                {

                    //Get next node

                    currentNode = currentNode.getNextSibling();

                }

                //Check if current node is paragraph

                if (currentNode.getNodeType() == NodeType.PARAGRAPH)

                {

                    //Check if its style is HEADING_1

                    if (((Paragraph)currentNode).getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1)

                    {

                        //If so then set par index and exit while

                        parIndex = paragraphs.indexOf(currentNode) - 1;

                        break;

                    }

                }

            }

            //Save output document

            outDoc.save("Section_" + String.valueOf(docIndex) + ".doc");

            //increase docIndex

            docIndex++;

        }

    }

     

    I hope this could help you.

     

    Best regards.


    Alexey Noskov
    Developer/Technical Support
    Aspose Auckland Team
     
  •  05-13-2008, 2:43 AM 126617 in reply to 126247

    Re: Splitting Word document into different files

    Thanks! That did it.

    But there is another problem. At the beginning of each splitted document there is a Page Break. How can I remove this when creating the splits. Just remove the first Node/Paragraph Node?

    //Thomas
     
  •  05-13-2008, 4:34 AM 126631 in reply to 126617

    Re: Splitting Word document into different files

    Hi

     

    Thanks for your inquiry. Maybe this occurs because there are page breaks between sections in your document. Could you attach your document or part of the document for testing? I will investigate this issue and try to help you.

     

    Best regards.


    Alexey Noskov
    Developer/Technical Support
    Aspose Auckland Team
     
  •  05-14-2008, 2:07 AM 126831 in reply to 126631

    Re: Splitting Word document into different files

    Attachment: Present (inaccessible)
    Ok. I created a document that shows the problem.

    When splitting the document using the above code, The first split has a carriage return at the beginning and the second document a Page Break.

    Can this be solved?

    Thanks.

    Thomas Ospelt
     
  •  05-14-2008, 4:55 AM 126867 in reply to 126831

    Re: Splitting Word document into different files

    Hi

     

    Thank you for additional information. I think that you can try using the following code to resolve this problem.

     

     

    //Get first Paragraph

    Paragraph firstPar = outDoc.getFirstSection().getBody().getFirstParagraph();

    //Remove PageBreaks in the first paragraph

    for (int runIndex = 0; runIndex < firstPar.getRuns().getCount(); runIndex++)

    {

        firstPar.getRuns().get(runIndex).setText(firstPar.getRuns().get(runIndex).getText().replace("\f", ""));

    }

    //Save output document

    outDoc.save("Section_" + String.valueOf(docIndex) + ".doc");

    //increase docIndex

    docIndex++;

     

    Hope this helps.

     

    Best regards.


    Alexey Noskov
    Developer/Technical Support
    Aspose Auckland Team
     
View as RSS news feed in XML