Sign In  Sign Up Live-Chat

Performance issue in Splitting PDF files

Last post 05-26-2008, 7:23 AM by Felix.Liu. 12 replies.
Sort Posts: Previous Next
  •  01-03-2008, 12:21 AM 107570

    Performance issue in Splitting PDF files

    Hi,

    We have a requirement to split PDF files into smaller files using bookmarks. Also while splitting we need to start from a page immediately before the page on which the bookmark is located. We are using Aspose.Pdf.Kit to do that. But splitting each file based on bookmarks is taking too much time (varies between 10 - 30 secs) per file. As a requirement we have to process around 40,000 files. If splitting each file takes that much time we will be doomed.

    Following is a code snippet used for doing that. Can any one from the Aspose team help us out urgently please?

    Thanks & Regards,

    Partha

    ------------------------------------------------------------------------------------------------------------------------

    String InputFile = "Somefile.pdf"
    String RootPath = "C:\"
    string bookmarkXMLPath = Path.Combine(RootPath, "bookmark.xml");

    PdfContentEditor _editorSourceFile = new PdfContentEditor();

    _editorSourceFile.BindPdf(InputFile);

    // generating XML file thats having bookmark detail of input PDF.
    _editorSourceFile.ExportBookmarksToXML(bookmarkXMLPath);
    _editorSourceFile = null;


    XmlDocument doc = new XmlDocument();
    doc.Load(bookmarkXMLPath);

    string FileNameWithoutExtn = Path.GetFileNameWithoutExtension(InputFile);
    XmlNodeList nl = doc.SelectNodes("//Title//@Page");
    PdfFileEditor pdfEditor = new PdfFileEditor();

    int FileNumber = 0;

    for (int j = 0; j < nl.Count; j++)
    {
     int start = -1, end = -1;
     string[] attValues1 = nl[j].Value.Split(new char[] { ' ' });
     start = Convert.ToInt32(attValues1[0]) - 1; // Page Number
     DateTime startTime = DateTime.Now;
     FileNumber++;
     string FileName = String.Format("{0}{1}.pdf", FileNameWithoutExtn, FileNumber.ToString());
     FileName = System.IO.Path.Combine(RootPath, @"SplitResult/" + FileName);
     if (j == nl.Count-1)
     {
      pdfEditor.SplitToEnd(InputFile, start, FileName);
     }
     else
     {
      string[] attValues2 = nl[j + 1].Value.Split(new char[] { ' ' });
      end = Convert.ToInt32(attValues2[0]) - 2; // Page Number
      pdfEditor.Extract(InputFile, start, end, FileName);
      intTotalFilesSplitted++;
     }
     TimeSpan creationTime = DateTime.Now - startTime;
    }
    doc = null;
    File.Delete(bookmarkXMLPath);

     
  •  01-03-2008, 1:46 AM 107575 in reply to 107570

    Re: Performance issue in Splitting PDF files

    Dear Partha,

    Thank you for considering Aspose.

    We are working on the performance issue. I hope we can improve the performance greatly in the next release. We will notify you in this thread when we made progress. Sorry for the inconvenience.

    Tommy Wang
    Lead Developer
    Aspose Changsha Team
    About Us
    Contact Us
     
  •  01-07-2008, 6:07 AM 107898 in reply to 107575

    Re: Performance issue in Splitting PDF files

    Hello Tommy,

    Any idea when are you planning to have the next release? If possible, can you send us some hotfix so that we can meet our project timelines?

    Thanks & Regards,

    Partha

     
  •  01-07-2008, 7:21 AM 107904 in reply to 107898

    Re: Performance issue in Splitting PDF files

    Dear Partha,

    Sorry I can't provide the exact time of the new release. I will discuss with the developer and try to provide a ETA ASAP. We will try our best to provide a beta version to you before the new official version.

    Tommy Wang
    Lead Developer
    Aspose Changsha Team
    About Us
    Contact Us
     
  •  01-18-2008, 12:02 AM 109513 in reply to 107904

    Re: Performance issue in Splitting PDF files

    Hi  Partha,

    We have found a solution to split file more efficiently, and a new method named PdfFileEditor.SplitToBulks will be added in our product. If you still need this urgently, then please let us know, and we could offer you a beta.

    Thanks,


    Felix Liu
    Developer
    Aspose Changsha Team
    About Us
    Contact Us
     
  •  01-21-2008, 10:07 PM 109884 in reply to 109513

    Re: Performance issue in Splitting PDF files

    Hello Felix,

    This is great news. Can you please send us a beta version ?

    Thanks & Regards,

    Partha

     
  •  01-22-2008, 3:44 AM 109916 in reply to 109884

    Re: Performance issue in Splitting PDF files

    Attachment: Present (inaccessible)

    Dear Partha,

    It's my pleasure. Please download the attachment.

    We also test the new method based your sample, as following:

       DateTime startTime = DateTime.Now;
       string InputFile = SourcePath + "Temp\\test.pdf";
       string RootPath = OutPath + "Temp\\";
       string bookmarkXMLPath = Path.Combine(RootPath, "bookmark.xml");
       PdfContentEditor _editorSourceFile = new PdfContentEditor();
       _editorSourceFile.BindPdf(InputFile);
       _editorSourceFile.ExportBookmarksToXML(bookmarkXMLPath);
       _editorSourceFile = null;
       XmlDocument doc = new XmlDocument();
       doc.Load(bookmarkXMLPath);
       XmlNodeList nl = doc.SelectNodes("
    //Title//@Page");
       PdfFileEditor pdfEditor = new PdfFileEditor();
       int length1 = nl.Count-1;
       int[][] pagenumber = new int[length1][];
    // the start and end pages to split with, like :  int[][] { new int[] { 10, 20 }, new int[] { 15, 25 }}
       
    for (int i = 0; i < length1; i++)
       {
        string[] attValues1 = nl[i].Value.Split(new char[] { ' ' });
        pagenumber[i] = new int[2];
        pagenumber[i][0] = Convert.ToInt32(attValues1[0]) - 1;
        string[] attValues2 = nl[i + 1].Value.Split(new char[] { ' ' });
        pagenumber[i][1]=Convert.ToInt32(attValues2[0]) - 2;
       }

       MemoryStream[] outBuffer = pdfEditor.SplitToBulks(InputFile, pagenumber); // the new method
       int fileNum = 0; 
       
       string FileWithoutExtn =Path.Combine(RootPath,@"SplitResult/"+ Path.GetFileNameWithoutExtension(InputFile));
       foreach (MemoryStream aStream in outBuffer)
       {
        fileNum++;
        FileStream outStream = new FileStream(FileWithoutExtn +"-" + fileNum.ToString() + ".pdf", FileMode.Create);
        aStream.WriteTo(outStream);
        outStream.Close();
       }
       string[] attValues3 = nl[length1].Value.Split(new char[] { ' ' });
       int location = Convert.ToInt32(attValues3[0]) - 1;
       fileNum++;
       pdfEditor.SplitToEnd(InputFile, location,FileWithoutExtn + "-" + fileNum.ToString() + ".pdf");

       TimeSpan creationTime = DateTime.Now - startTime;
       Console.WriteLine("(split new)Time=" + creationTime);
       File.Delete(bookmarkXMLPath);


    Felix Liu
    Developer
    Aspose Changsha Team
    About Us
    Contact Us
     
  •  04-28-2008, 6:01 AM 124349 in reply to 109916

    Re: Performance issue in Splitting PDF files

    Hello Felix,

    The hotfix that you provided was working pretty well for us until a week back. The issue is the ExportBookmarksToXML method throws up System.OutOfMemoryException when the PDF file size is in excess of 300 MB. This is causing quite a lot of concern as the PDF file processing in the project has come to a grinding halt.

    I have checked similar methods available in other tools and all of them works properly. Can you please look into this urgently and let us know when we can expect a solution ?

    Also, the hotfix that we got (v2.7.0.5) is based on .NET 1.1 framework. Is there a corresponding version for .NET 2.0 framework ?

    Thanks & Regards,

    Partha

     
  •  04-28-2008, 8:17 AM 124372 in reply to 124349

    Re: Performance issue in Splitting PDF files

    Hi Partha,

    I will check the issue and discuss it with our other developers. We would provide an ETA about it ASAP if we could support such a big file with the method.

    We provide dlls both for .NET 1.1&2.0 in offical(formal) version, please download the latest version(3.0.0.0) from our download pages.

    Best regards,


    Felix Liu
    Developer
    Aspose Changsha Team
    About Us
    Contact Us
     
  •  05-07-2008, 8:11 AM 125813 in reply to 124372

    Re: Performance issue in Splitting PDF files

    Hello Felix,

    Any update on the issue ?

    Thanks & Regards,

    Partha

     
  •  05-07-2008, 8:38 PM 125916 in reply to 125813

    Re: Performance issue in Splitting PDF files

    Dear Partha,

    We have encountered some technical difficulities on the performance issues of ExportBookmarksToXML, and I am afraid we could not enhance it in short time.

    Anyway, we will still work on this issue, and we will inform you in this post when any improvements could be made.

    Thanks,


    Felix Liu
    Developer
    Aspose Changsha Team
    About Us
    Contact Us
     
  •  05-26-2008, 6:02 AM 128624 in reply to 125916

    Re: Performance issue in Splitting PDF files

    Hello Felix,

    Any luck on this ?

    Thanks & Regards,

    Partha

     
  •  05-26-2008, 7:23 AM 128638 in reply to 128624

    Re: Performance issue in Splitting PDF files

    Hi Partha,

    We have made some improvement on some features of disposing bookmarks but not the performance on that. I'm sorry to tell you that we could not improve the performance of that function in short time.

    Anyway, we will still working on this issue(I must have forgetton to tell you that we had created a issue PDFKITNET-5030 for this enhacement).

    Thanks & Sorry for the inconvenience,


    Felix Liu
    Developer
    Aspose Changsha Team
    About Us
    Contact Us
     
View as RSS news feed in XML