Memory consumption when running large cell operations

Last post 03-12-2010, 4:06 AM by nausherwan.aslam. 15 replies.
Page 1 of 2 (16 items)   1 2 Next >
Sort Posts: Previous Next
  •  02-16-2010, 11:23 PM 222581

    Memory consumption when running large cell operations .NET

    Hi,

    We have an application that uses Apose.Cells to both clear and update cell vaues in speadsheets. When we run large loops of these instructions - updating 4-6 million cells, we find the memory consumption grows to a working set of multiple GB ;

    My question is, is this normal? Is there a way to work with Aspose components so that memory usage is a little more modest?

    The code we have looks like

    sheet.Cells[iRow, iCol].PutValue(value)

    where there are two for loops around this statement controllingthe iRow and iCol variables.

    Any thoughts or advice would be much appreciated - or general hints on how to get the best execution performance with Aspose when working with large amounts of data.

    Thanks,

    James

    Filed under: Memory;Performance
     
  •  02-17-2010, 1:33 AM 222594 in reply to 222581

    Re: Memory consumption when running large cell operations

    Attachment: Present (inaccessible)
    Hi James,

    Well, when you update or fill millions of cells in a worksheet, you need to have sufficient amount of memory for the operation as it is a big task to deal with. Aspose.Cells will handle it but one thing you need to make sure for performance considerations:

    Populate data first by row then by column will certainly improve the overall performance to certain extent. This would also make your program run faster. See the document for reference:
    http://www.aspose.com/documentation/.net-components/aspose.cells-for-.net/populate-data-first-by-row-then-by-column.html


    In the latest versions/fixes, we did enhance the performance of the product to handle big operations or work with larger files. I have attached the latest fix v4.8.2.6 for you, kindly try it.

    Thank you.

    Amjad Sahi
    Support Developer,
    Aspose Sialkot Team
    Contact Us
     
  •  02-17-2010, 2:47 AM 222612 in reply to 222594

    Re: Memory consumption when running large cell operations

    Thanks - we will try the new library now - and report back shortly. Is there any advantage or benefit to saving a file part way through a multi-million cells update ? (regarding memory consumption)

    The reason for asking, is that in some other libraries I have used, the equivelant of PutValue() was caching the updates in an in-memory list until flushed to disk. Once flushed to disk - this list was emptied, and additional updates started building up this list again. I can see this is trading memory pressure for IO pressure - but if you have more time than memory - would this offer any benefit?

    ( The example I am thinking of is during a 4 million cell update, write to disk every 1 million cells for example )

    Thanks,

    James

     
  •  02-18-2010, 5:39 AM 222833 in reply to 222594

    Re: Memory consumption when running large cell operations .NET

    Attachment: Present (inaccessible)

    Hi,

    We have tried new library.

    Still huge memory consumption is issue for us.

    I have attached sample code that imports 74MB CSV file to excel file, please use test files supplied in "Test-Files" folder.

    Now, is there anything that can be done to reduce the memory demand downwards?

     

    And yes, this supplied code runs in the "Windows service" on our server, once the memory exceeds 1.6Gb - we get a .NET CLR exception on a 64 bit machine

     

    I hope this helps for your investigation,

    Cheers,

     
  •  02-18-2010, 8:07 AM 222863 in reply to 222833

    Re: Memory consumption when running large cell operations

    Hi,

    Thanks for providing us further details with sample project.

    We will look into your issue and get back to you soon.

    Thank you.

    Amjad Sahi
    Support Developer,
    Aspose Sialkot Team
    Contact Us
     
  •  02-22-2010, 12:37 AM 223238 in reply to 222863

    Re: Memory consumption when running large cell operations


    Hi,

    After analyzing your issue with your project, there are 124060 * 68 cells in the CSV file and it would be 124060 * 68 * 2 cells in the memory after copying the range. A workbook which contains 124060 * 68 cells would certainly consume about 400M memory in this test project. Moreover, the two workbooks will consume about 800M memory.

    I think you may try:

    1) Could you import CSV file to workbook and apply some formatting to the cells.

    2) Please clear the source workbook after copying. Please check following sample code.

    //Copy cell data from a source ranges
    targetRange.Copy(sourceRange);
    sheet = null;
    sourceRange = null;
    book = null;
    GC.Collect();


    Thank you.


    Amjad Sahi
    Support Developer,
    Aspose Sialkot Team
    Contact Us
     
  •  02-22-2010, 11:00 PM 223455 in reply to 223238

    Re: Memory consumption when running large cell operations

    Attachment: Present (inaccessible)

    Hi Amjad,

    Thanks for your reply and suggestion.

    I have tried assigning range, sheet, book to null. It frees memory immediately, after a huge spike of memory consumption in "performance" graph.

    However, our issue here is to keep the load on memory low.

    I have attached updated Code zip here.

    Any improvement on reducing memory demand would be very helpful here.

    Cheers,

     
  •  02-22-2010, 11:06 PM 223456 in reply to 223455

    Re: Memory consumption when running large cell operations .NET

    An additional thought ;

    Would it be better to construct two for loops ( row and column  ) and do a series of PutValue() statements - and save the target file every [say] 10000 updates ? Would this cause the in-memory buffer that holds cell changes to be flushed - thereby reducing the peak load on memory ? I have seen this technique used in other systems - wondering if this might be an approach we can use here?

    Thanks,

    James

     

     
  •  02-23-2010, 5:08 AM 223521 in reply to 223456

    Re: Memory consumption when running large cell operations

    Attachment: Present (inaccessible)
    Hi,

    Please try the new fix 4.8.2.7. We have added a new method Range.GetCellOrNull();
    1)Test1-ImportCSV
    Please remove all reference to the source file after copying data from the source file.

    2)Test02-SaveLargeFile
    Please check if the source cell is null before copying the data.
    If you call Range[row,column] to get the cell, we have to create a Cell instance in the memory.

    3)Test03-Update
    Sorry we have no good idea.


    4)Test04-CopyCells
    Please check if the source cell is null before copying the data.

    Please check the attached file.

    Thank you.


    Amjad Sahi
    Support Developer,
    Aspose Sialkot Team
    Contact Us
     
  •  02-25-2010, 10:57 PM 224237 in reply to 223521

    Re: Memory consumption when running large cell operations

    Hi ,

    Thanks for the reply. When importing large text (csv) files, we are doing this by:

    1. opening both the source and target files

    2. creating a range in the source and target files

    3. using the copy function on the range object - targetrange.Copy(sourcerange)

    Is this the best approach when copy a large block of data from src to target? Given that we are not working with these ranges on a cell by cell basis - I am not sure how we should exploit the GetCellOrNull() method.

    Are you suggesting we should move to a different approach - which is two loops nested - copying cell by cell through the ranges - and then use this GetCellOrNull() method?

    Thanks in advance for your help.

    Regards,

    James

     

     
  •  03-02-2010, 12:58 AM 224787 in reply to 224237

    Re: Memory consumption when running large cell operations

    Hi,

    If you call targetrange.Copy(sourcerange), there are source and target files together in the memory. I am afraid, we have no good idea for this.

     

    Could you import CSV file to workbook and apply some formatting to the cells?

    For example:

      private void Test01_ImportCSV()

            {

                try

                {

                    StartTest("Test01_ImportCSV");

     

                    var book = new Workbook();

                    Range targetRange = null;

     

                    //Open text file

                    book.Open(txtSourceFile.Text, ',');

     

                    WriteOutput("CSV file opened");

     

                    //If file couldn't be parsed, exception is thrown)

                    if (book.Worksheets[0] == null) throw new Exception("Couldn't parse text file");

     

                    //Create range from all text file data

                    var sourceRange = book.Worksheets[0].Cells.CreateRange(0, 0, book.Worksheets[0].Cells.MaxDataRow + 1, book.Worksheets[0].Cells.MaxDataColumn + 1);

     

                    WriteOutput("Opened CSV file's worksheet");

     

                    if (sourceRange == null) throw new Exception("Couldn't create source range in text file");

     

                    //Get the target workbook

                    var targetworkbook = new Workbook();

     

                    targetworkbook.Open(txtTargetFile.Text);

     

                    WriteOutput("Opened target workbook");

     

                    //Getting target sheet

                    var sheet = targetworkbook.Worksheets[0];

     

                    WriteOutput("Extracted target worksheet");

     

                    targetRange = sheet.Cells.CreateRange(0, 0, book.Worksheets[0].Cells.MaxDataRow + 1, book.Worksheets[0].Cells.MaxDataColumn + 1);

     

                    WriteOutput("Created target range");

     

                    ////Copy cell data from a source ranges

                    //targetRange.Copy(sourceRange);

                    sourceRange.CopyStyle(targetRange);

     

     

                    WriteOutput("Copied CSV data to target");

     

                    book.Save(txtTargetFile.Text);

     

                    WriteOutput("Saved workbook");

                 

                    targetRange = null; targetworkbook = null;

                    FinishTest();

                }

                catch (Exception ex)

                {

                    HandleException("Importing a CSV file", ex);

                }

            }


    Thank you.


    Amjad Sahi
    Support Developer,
    Aspose Sialkot Team
    Contact Us
     
  •  03-02-2010, 1:59 PM 224950 in reply to 224787

    Re: Memory consumption when running large cell operations

    Thanks for your response.

    Can you please tell me if the ASPOSE.Cells library relases memory when you do a "Save file" operation on a target workbook?

    The reason for asking is that I am thinking that an alternative approach here might be to do a whole series of GetValue(row,col) -> PutValue(row,col) calls instead of using the range copy  - and save the file every [say] 10000 calls to PutValue. If the "save file" operation releases memory ( as the library no longer needs to cache the changes made by PutValue ) then that might be a way for us to process large files?

    We are still stuck with the problem that importing a 76Mb CSV file into a XLS file is getting an 'outofmemory' exception. The process is taking about 2GB of RAM when this occurs, even though it is running on an 8GB 64 bit machine.

    Thanks,

    James

     
  •  03-03-2010, 12:26 AM 225004 in reply to 224950

    Re: Memory consumption when running large cell operations

    Hi,

    Well, it takes about 500M memory when we run the following code:

    Workbook workbook = new Workbook();
    workbook.Open(@"F:\FileTemp\Very Big CSV File-73M.csv", ',');
    // workbook.SaveOptions.ClearData = true;
    workbook.Save(@"F:\FileTemp\dest.xls");

    Could you share a sample project to show how to import csv and export it to xls file?

    Thank you.


    Amjad Sahi
    Support Developer,
    Aspose Sialkot Team
    Contact Us
     
  •  03-04-2010, 8:19 PM 225533 in reply to 225004

    Re: Memory consumption when running large cell operations .NET

    Attachment: Present (inaccessible)

    Hi,

    Thanks again for your reply.

    I think the primary difference between the source code in your prior post, and what we need to do, is that we use ASPOSE components to open both a source CSV and target XL files, and then copy a large number of cells from the CSV to the XL file. We find that when the CSV file is 76Mb , and the target file is an empty workbook, the overall memory demand from the running process rises to 1.8GB - and then at some point an 'out-of-memory' exception is thrown ( even though there is still free physical memory on the machine ). We suspect that the process is suffering from heap fragmentation - so although there is free memory - although we would need to run the CLR viewer to confirm this.

    The attached zip file contains a test application and test data that we hope can be used to help replicate this issue in your environment. In the testdata directory there is a 76Mb CSV file you can use as a source file. Create an empty workbook for the target file. Select the first tab "Test 1 - Importing large CSV files"; Here you will see three buttons with different test variations - and you can click on these buttons to view the code behind. When you run these tests - you will see the memory and time taken to complete the tests.

    The first variation uses ASPOSE components to open both source and target files - and then a single call to Copy to tranfer cell data from source to target.

    The second variation uses ASPOSE components to open both source and target files - and then enters into a row/col loop to do a series of Get/Put statements, saving the target file intermittently to reduce the demand on memory of "unsaved changes".

    The third variation an ASPOSE component to open the target file, but not the source file. Instead we open the source CSV data with a custom stream reader - and then enter into a row/col loop to do a series of Get/Put statements, saving the target file intermittently to reduce the demand on memory of "unsaved changes".

    Our problem is that currently our production solution uses the technique exemplified in variation #1 - and encounters the 'out of memory' error after 2 or 3 runs using the clients larger files. This is on a 64bit 8Gb machine with nothing else running on it.

    Understandably the third variation offers the best performance and scalability, and does not crash with the 76Mb file, even after re-running 50 times. We are in the process of applying this approach to our production solution this weekend so our clients can continue processing.

    Our question to Aspose is to what extent can these components be scaled? Are we simply processing file sizes that these components were not designed for ? If this is the case, can you offer any advice on alternative ways of processing large XL files programatically - as most of our clients in the finanical services sector work with XL and CSV files that are routinely in the 100-300Mb space - so we need to find a solution at some point. Our preference is to work with Aspose to get the components working well at these scales, because we like your API, documentation and support (grovel grovel :)) - but it's really up to you guys as to how far you want to take these components. I think we need certainty more than anything else right now.

    If you are intending to increase the scalability of these components so that we can use them to process commercial file sizes - we have a couple of questions/thoughts:

    - Is there anything that can be done to reduce the "inflation factor" when loading a file? i.e. where a 75Mb file takes 500Mb in memory?
    - Can you consider adding methods for importing/exporting CSV data from ranges that uses raw high performance C# code on onside of the transfer ? ( instead of Asponse components on both sides) maybe a better version of the test code we have in variation-3 on test 1 in the testing application attached.
    - Would you be able to look at whether underload , these components are fragmenting the heap - consequently triggering the 'out of mem' error ?
    - Would it be possible to flush "unsaved changes" to a workbook to a temp co-located file once this list consumes an amount of memory ? i.e. once this list takes up [say] 500Mb - flush it to a temp file and free the memory etc... I understand that this will slow down the overall process, as it introduces file io, however the alternative is for the process to suffer from memory starvation and heap stress.

    Thanks again for all your help,

    James Simpson

     
  •  03-08-2010, 12:32 AM 225886 in reply to 225533

    Re: Memory consumption when running large cell operations

    Hi,

    Thanks for providing us information.
    We are looking into if we can import the CSV file to a worksheet as you mentioned in ImportCSV_Variant3 in the project.

    Thank you.


    Amjad Sahi
    Support Developer,
    Aspose Sialkot Team
    Contact Us
     
Page 1 of 2 (16 items)   1 2 Next >
View as RSS news feed in XML