|
|
Obfuscated PutValue is slow as hell
Last post 09-10-2009, 3:37 AM by Amjad Sahi. 21 replies.
-
03-19-2008, 7:57 PM |
-
zerk666
-
-
-
Joined on 02-16-2008
-
-
Posts 10
-
-
-
-
-
|
Obfuscated PutValue is slow as hell
I benchmarked our excel export and we call putvalue(string) a lot. Your obfuscation tool seems to add a lot of decryption logic to what seems to be a really simple switch case in your PutValue logic.
I suggest that you disable obfuscation for performance critical code such as the PutValue method. I don't think it add much security for you but it surely has a great impact on performance for your users...
Thanks
Danny
|
|
-
03-21-2008, 1:46 AM |
-
Laurence
-
-
-
Joined on 05-28-2003
-
-
Posts 7,779
-
-
-
-
-
|
Re: Obfuscated PutValue is slow as hell
Hi Danny,
Are you using Aspose.Cells for .Net or Aspose.Cells for Java? And which version are you using?
Actually PutValue(string) method is much more complex than other overloaded PutValue methods. Each Excel file has a string pool. When a string is input, we will check the string pool to avoid duplicate strings. That will greatly save memory if there are many same string in an file.
Can you give us a simple test case to demonstrate this performance issue?
I use the following code to put 20000 Row * 20Column data into an Excel file. All these day are strings. It takes about 4.5 seconds to put data and 10.5 seconds to create an 12MB file. It runs on my laptop, Pentium M CPU, 512MBRAM, WinXP, .Net1.0.
Workbook excel = new Workbook();
Cells cells;
DateTime start = DateTime.Now; cells = excel.Worksheets[0].Cells;
for(int i = 0; i < 20000; i ++) { for(int j = 0; j < 20; j ++) { Cell cell = cells[i, j]; cell.PutValue(i.ToString() + "Test" + j.ToString()); } }
DateTime mid = DateTime.Now;
TimeSpan span = mid - start;
Console.WriteLine(span.TotalSeconds);
excel.Save("D:\\test\\abc.xls");
DateTime end = DateTime.Now;
span = end - mid;
Console.WriteLine(span.TotalSeconds);
Laurence Chen Chief Architect Aspose Nanjing Team About Us Contact Us
|
|
-
03-23-2008, 9:23 PM |
-
zerk666
-
-
-
Joined on 02-16-2008
-
-
Posts 10
-
-
-
-
-
|
Re: Obfuscated PutValue is slow as hell
Hi Laurence,
Thank you for you quick answer. I'm using the 4.4.1.0 .Net version right now because 4.4.1.10 doesn't work for us when opening our excel templates, some rows are missing in the document...
I can confirm that what seems to be slow is a dictionary contains followed by a set_Item that match your explanation for the string pool.
I added a check before calling PutValue because many of our strings were either null or empty so I dropped from about 1.5 million PutValues to about 500k. This reduced the time consumed by your string dictionary a lot and reduced the file size considerably. I had a 25 meg xls files and I now have a 13 meg one.
The save that took about 5 or 6 minutes is now down to 2 so its a lot better too.
I think we can consider the performance case closed ;) but the regression in 4.4.1.10 would be worth looking at. If you can't find it by looking at the code changes since 4.4.1.0, maybe I could try to give you a sample of our document by stripping copyrighted data from it...
Thanks
Danny
|
|
-
03-23-2008, 10:58 PM |
-
Laurence
-
-
-
Joined on 05-28-2003
-
-
Posts 7,779
-
-
-
-
-
|
Re: Obfuscated PutValue is slow as hell
Attachment: Present (inaccessible)
Hi Danny,
For row missing problem, please try this attached fix. If the problem still exits, please give us your template file and sample code. We will check and fix it ASAP.
I think 2minutes is still a little slow for a 13MB file. Could you also post your sample code and file? We will check how to optimize it.
Laurence Chen Chief Architect Aspose Nanjing Team About Us Contact Us
|
|
-
03-24-2008, 10:25 AM |
-
zerk666
-
-
-
Joined on 02-16-2008
-
-
Posts 10
-
-
-
-
-
|
Re: Obfuscated PutValue is slow as hell
Hi Laurence,
I runned the code you posted earlier and I am able to dump a 68 meg xls file under 10 seconds on my machine. I just changed the string lenght to something a little more representative of what we have.
In our application thought, it takes about 30 seconds just for the PutValues, we then use AutoFitRows on all our sheets before saving and then save. Its the 30 seconds for the PutValues that bothers me since I haven't been able to reproduce it with your sample application, I'll try to and get back to you on this.
I'm not really worried since 30 seconds is not THAT slow but is definitely slower than your sample...
What I found out is if I iterate through columns before the row (I just swaped the for in your sample), it takes forever to execute... I would appreciate a little explanation for that ;) but I verified and we didn't do this in our application anyway...
As for the regression, the fix you sent me didn't work, so I'll try to put something together that I can send to you.
Thanks
Danny
|
|
-
03-25-2008, 1:58 AM |
-
Laurence
-
-
-
Joined on 05-28-2003
-
-
Posts 7,779
-
-
-
-
-
|
Re: Obfuscated PutValue is slow as hell
Hi Danny,
1. AutoFitRows is a time consuming function so please call it just once before you saveing the file.
2. 30 seconds for how many cells? I think maybe other code consumes the time.
3. We internally keep all cells in a list, first odered by row and second order by column. If you interate row first, new created cells will be append on the end of list. It's very fast. However, if you iterate column first, we have to find the appropriate position to insert the cell and adjust the list for insertion. That will degrade performance heavily. So please always interate row first.
Laurence Chen Chief Architect Aspose Nanjing Team About Us Contact Us
|
|
-
08-19-2008, 8:41 PM |
-
zerk666
-
-
-
Joined on 02-16-2008
-
-
Posts 10
-
-
-
-
-
|
Re: Obfuscated PutValue is slow as hell
Hi Laurence,
I just resumed working on improving Excel file save performance in our application because it is really slow and it took a lot of memory. I used the GetStyle/SetStyle method and got rid of the huge memory consumption.
Now, I think I found out why the save performance is really slow for us. We use an Excel template that we open and write over. This way we can put the excel in our embedded resource and use it when we want to export.
By loading an arbitrary Excel file, I suppose that the pattern where we need to iterate rows and then columns is not working anymore since some cells may exists anywhere in the data structure. When we need to insert a new cell, if there is one allocated farther down the array, it will get moved with O(n) complexity.
In worst case scenario where some columns aren't empty at the rightmost part of our document, each cell insertion will need to go through the O(n) process above. Resulting in a dramatic slowdown of the save performance...
Am I right?, do you have any advice how to create a template document that will yield the greatest results with your insertion algorithms?
Thanks
|
|
-
08-19-2008, 9:19 PM |
-
simon.zhao
-
-
-
Joined on 10-24-2005
-
-
Posts 301
-
-
-
-
-
|
Re: Obfuscated PutValue is slow as hell
Attachment: Present (inaccessible)
Hi,
Please try this fix. We have improved the save performace in this fix.
Please post or mail us a simple project to show this performace problem. We will check it soon.
Simon Zhao Developer Aspose Nanjing Team Contact Us
|
|
-
08-19-2008, 11:35 PM |
-
zerk666
-
-
-
Joined on 02-16-2008
-
-
Posts 10
-
-
-
-
-
|
Re: Obfuscated PutValue is slow as hell
Hi Simon,
I tried the fix without any significant improvement.
However, I finally found the bottleneck. We did 2 pass on the cells, the first pass was to set values and then we made a final pass to add some formatting.
When setting values, we skipped any cell that didn't have any data in it. So it left holes in the aspose data structure. The second pass was adding strikethrough on Range of cells so if the range encountered any holes in the data structure, it filled it. Filling holes requires a O(n) insert so since we have more than a million active cells in our datasheet, it could take easily up to 3 minutes to apply the data and the formatting before even reaching the save phase.
Now, even if we don't have anything to put in a cell... we call the Cells[i,j] accessor anyway so the cell gets initialized. It takes more memory but at least performance is a lot better (under 1 minute) for a 22meg excel file.
In future release, it would probably be useful to provide more flexibility on the Cells structure... maybe allow us to override the structure or replace it entirely with our own depending on our needs, or having an API to at least monitor when the cells array is being resized like crazy to help us determine performance problem.
Thanks
Danny
|
|
-
08-20-2008, 12:51 AM |
-
simon.zhao
-
-
-
Joined on 10-24-2005
-
-
Posts 301
-
-
-
-
-
|
Re: Obfuscated PutValue is slow as hell
Hi Danny,
Could you post a simple project ? We will check it soon.
Simon Zhao Developer Aspose Nanjing Team Contact Us
|
|
-
08-20-2008, 5:49 PM |
-
zerk666
-
-
-
Joined on 02-16-2008
-
-
Posts 10
-
-
-
-
-
|
Re: Obfuscated PutValue is slow as hell
Hi Simon,
Here's a sample
we start be creating a checkerboard pattern through the cells on the 1 first pass and on the 2 nd pass we fill the missing cells with values...
Workbook excel = new Workbook(); Cells cells = excel.Worksheets[0].Cells;
DateTime start = DateTime.Now;
for (int i = 0; i < 1000; i += 2) for (int j = 0; j < 1000; j += 2) cells[i, j].PutValue(string.Format("Cell {0}, {1}", i, j));
TimeSpan span = DateTime.Now - start;
Console.WriteLine(span.TotalSeconds);
start = DateTime.Now;
for (int i = 1; i < 1000; i += 2) for (int j = 1; j < 1000; j += 2) cells[i, j].PutValue(string.Format("Cell {0}, {1}", i, j));
span = DateTime.Now - start;
Console.WriteLine(span.TotalSeconds);
1st pass: 0.859386 seconds 2nd pass: 50.0943912 seconds
Have you tried to store the cells in a Dictionary<Pair<int,int>> where the pair represents the row and the columns index. It would probably be a lot faster for a lot of cases... maybe only the memory usage would go up a bit...
Currently, its really easy to shoot yourself in the foot by using the wrong insertion patterns for aspose cells. For us its a major drawback because we have a lot of legacy code to support through an interface and since Office Interop didn't have this issue, most access were not in the order Aspose is most comfortable with.
Thanks
Danny
|
|
-
08-20-2008, 6:08 PM |
-
zerk666
-
-
-
Joined on 02-16-2008
-
-
Posts 10
-
-
-
-
-
|
Re: Obfuscated PutValue is slow as hell
I decided to try my solution of Dictionary<Pair<int, int>> and show you the results :)
Here is the sample code (I used Point because it already exists in the framework)
class PointEqualityComparer : IEqualityComparer<Point> { public static readonly PointEqualityComparer Default = new PointEqualityComparer();
public bool Equals(Point x, Point y) { return x.X == y.X && x.Y == y.Y; }
public int GetHashCode(Point obj) { return (obj.X << 16) + obj.Y; } }
class CustomCells : Dictionary<Point, string> { public CustomCells() : base(PointEqualityComparer.Default) { }
public string this[int row, int column] { get { string value; if (base.TryGetValue(new Point(row, column), out value)) return value;
return null; }
set { base[new Point(row, column)] = value; } } }
static void Main(string[] args) { CustomCells customCells = new CustomCells();
DateTime start = DateTime.Now;
for (int i = 0; i < 1000; i += 2) for (int j = 0; j < 1000; j += 2) customCells[i, j] = string.Format("Cell {0}, {1}", i, j);
TimeSpan span = DateTime.Now - start;
Console.WriteLine(span.TotalSeconds);
start = DateTime.Now;
for (int i = 1; i < 1000; i += 2) for (int j = 1; j < 1000; j += 2) customCells[i, j] = string.Format("Cell {0}, {1}", i, j);
span = DateTime.Now - start;
Console.WriteLine(span.TotalSeconds); }
with this sample, I am able to achieve those numbers...
1st pass: 0.4062552 seconds 2nd pass: 0.4531308 seconds
What do you think?
Thanks
Danny
|
|
-
08-20-2008, 8:15 PM |
-
Laurence
-
-
-
Joined on 05-28-2003
-
-
Posts 7,779
-
-
-
-
-
|
Re: Obfuscated PutValue is slow as hell
Hi Danny,
Thanks for your information. We will check to optimize the performance. But since it's related to a lot of our current code, it will take a few weeks to make it.
Laurence Chen Chief Architect Aspose Nanjing Team About Us Contact Us
|
|
-
08-22-2008, 4:50 AM |
-
DomZ
-
-
-
Joined on 09-25-2007
-
-
Posts 120
-
-
-
-
-
|
Re: Obfuscated PutValue is slow as hell
Hi Laurence,
When profiling we have noticed the same things. And on others methods like InsertRows.
May be can you make some tests before and after obfuscating and publish the results to know how much the obfuscating cost ?
Thanks
|
|
-
08-27-2008, 5:33 PM |
-
zerk666
-
-
-
Joined on 02-16-2008
-
-
Posts 10
-
-
-
-
-
|
Re: Obfuscated PutValue is slow as hell
Hi Laurence,
Did you make any progress?... will it be feasable at all to remove the array structure and replace it with a dictionary without breaking the API that is already in place?
I would be glad to help if you need any assistance with this...
Thanks
Danny
|
|
Page 1 of 2 (22 items)
1
|
|