Multithreading issue PNGDevice

Hi,

we are using Aspose PDF library to convert PDF pages of multiple documents to PNG images within multiple threads (up to 10) using following code:

// Load the input PDF document
using (Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(filePath))
{

// Render the page into a raster image
using (MemoryStream imageStream = new MemoryStream())
{
// Create Resolution object
Resolution res = new Resolution(resolution);
if (format == ImageSaveFormat.PNG)
{
// Create PNG device with specified attributes
PngDevice pngDevice = new PngDevice(res);
// Convert a particular page and save the image to stream
pngDevice.Process(pdfDocument.Pages[pageNo], imageStream);
...

After processing of some large amount of PDF files (about 1 Gb) process is stuck within pngDevice.Process method at 100% CPU load and remains in this state forever (see attached image of stacktrace analyser) .

Please advice on correct usage of library in multithreaded environment. Please also take a look at following link
http://improve.dk/debugging-in-production-part-1-analyzing-100-cpu-usage-using-windbg/
it seems that static Dictionaries used here are not thread safe - unfortunately we cannot analyse problem further on our end.

Thanks & Best Regards
1 Like
Hi Shai,

Thanks for your inquiry. Are you converting some specific page numbers of PDF document to PNG or all pages? We will appreciate it if you please share your working sample code, so we will test the scenario and provide you information accordingly.

Moreover, please note processing time depends upon the file size/contents and system resources. Aspose.Pdf does not use temporary files for processing but memory. So performance can be suffered due to large file processing.

Furthermore, in reference to multithreading support, please note Aspose.Pdf is multithread safe as long as only one thread works on a document at a time. It is a typical scenario to have one thread working on one document, different threads can safely work on different documents at the same time.

We are sorry for the inconvenience caused.

Best Regards,

Hi Ahmad,


thank you for quick response.

Every PDF document has only one page so only one page is extracted.
Only one thread has access to the document.
There should not be a problem to prepare sample, just continuously run the code I provided in 10 worker threads.

File size of one page PDF document always remains low so it is not a performance issue.

Please investigate the source code of PNGDevice and usage of static members of Dictionary - they are not thread-safe and could have causing the deadlock.



1 Like

Hi Shai,


Thanks for you feedback. I have tested scenario with a sample PDF document using following code and noticed that CPU usage rises to 100%, it hangs for some time but returns to normal after some time. So I have logged a ticket PDFNEWNET-39309 for further investigation and rectification. We will notify you as soon as it is resolved.

try<o:p></o:p>

{

for (int i = 0; i <10;i++ )

{

Thread t = new Thread(convertpdftopng);

t.Start();

}

Console.WriteLine("conversion is done...");

}

catch (Exception error)

{

Console.WriteLine(error.ToString());

}

public static void convertpdftopng()

{

try

{

using (Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document("E:/data/A1-28 - PRECAST STADIA - UPPER BOWL.pdf"))

{

// Render the page into a raster image

using (MemoryStream imageStream = new MemoryStream())

{

// Create Resolution object

Resolution res = new Resolution(300);

// Create PNG device with specified attributes

PngDevice pngDevice = new PngDevice(res);

// Convert a particular page and save the image to stream

pngDevice.Process(pdfDocument.Pages[1], imageStream);

}

}

}

catch (Exception ex)

{

Console.WriteLine(ex.StackTrace);

}

}

We are sorry for the inconvenience caused.


Best Regards,

Hi,
with version 21.10.0 we still have hang-ups with PngDevice.Process(). It seems that the same problem underlies the PDFNEWNET-39309. As Docit has already described, the problem lies with the access to System.Collections.Generic.Dictionary’, which was not implemented thread save. Tracking down the error seems to be a bigger problem. However, the documents do not play a role here. It is only important that several threads have read and write access nearly at the same time to the same dictionary. Can’t rule out that methods other than ‘Process’ are not needed for this error to occur. Please check all static lists of the type ‘System.Collections.Generic…’ and make them thread save.

Stacktrace:
C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll!System.Collections.Generic.Dictionary`2.FindEntry+0xaa
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\bau\b5043016\58100ddd\assembly\dl3\141cbbdd\002b5099_7f60d701\Aspose.PDF.dll!#=zjy61W7nMqkr7P3pON0KiUlTScck6ue0deA==.#=zQ9HgtlpZrgyX+0xf7
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\bau\b5043016\58100ddd\assembly\dl3\141cbbdd\002b5099_7f60d701\Aspose.PDF.dll!#=ziCn8xkhpEvz81VUi9_dYA2ipqzKqqmmpgYmxlLTWdR$N.#=zEWQaAv4=+0x8b
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\bau\b5043016\58100ddd\assembly\dl3\141cbbdd\002b5099_7f60d701\Aspose.PDF.dll!#=zigVcVy0kF4TRalKjQfSZwe$6RQBq15y9G$BisMX6iz0$.#=z9cZxLEiSM$m4+0x63
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\bau\b5043016\58100ddd\assembly\dl3\141cbbdd\002b5099_7f60d701\Aspose.PDF.dll!#=zigVcVy0kF4TRalKjQfSZwe$6RQBq15y9G$BisMX6iz0$.#=zhmMyXlamo1Di+0x68b
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\bau\b5043016\58100ddd\assembly\dl3\141cbbdd\002b5099_7f60d701\Aspose.PDF.dll!#=zigVcVy0kF4TRalKjQfSZwe$6RQBq15y9G$BisMX6iz0$.#=zhmMyXlamo1Di+0x15
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\bau\b5043016\58100ddd\assembly\dl3\141cbbdd\002b5099_7f60d701\Aspose.PDF.dll!#=zigVcVy0kF4TRalKjQfSZwe$6RQBq15y9G$BisMX6iz0$.#=zieixP4E=+0x21
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\bau\b5043016\58100ddd\assembly\dl3\141cbbdd\002b5099_7f60d701\Aspose.PDF.dll!#=zvTBSW6Zaed0wowLrUj9HKmHxXquTNkCN48aB5Spy3ZHfPWx1Nw==.#=zERAke7l_Mq5V+0x89e
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\bau\b5043016\58100ddd\assembly\dl3\141cbbdd\002b5099_7f60d701\Aspose.PDF.dll!#=ziCn8xkhpEvz81VUi9_dYA2ipqzKqqmmpgYmxlLTWdR$N.#=zTF_Ihb4gxzYv+0x25
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\bau\b5043016\58100ddd\assembly\dl3\141cbbdd\002b5099_7f60d701\Aspose.PDF.dll!#=ziCn8xkhpEvz81VUi9_dYA2ipqzKqqmmpgYmxlLTWdR$N.#=zzbIY2mc=+0x24
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\bau\b5043016\58100ddd\assembly\dl3\141cbbdd\002b5099_7f60d701\Aspose.PDF.dll!#=zigVcVy0kF4TRalKjQfSZwe$6RQBq15y9G$BisMX6iz0$.#=zurYkq3cZfvny+0x208
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\bau\b5043016\58100ddd\assembly\dl3\141cbbdd\002b5099_7f60d701\Aspose.PDF.dll!#=zigVcVy0kF4TRalKjQfSZwe$6RQBq15y9G$BisMX6iz0$.#=zieixP4E=+0x20a
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\bau\b5043016\58100ddd\assembly\dl3\141cbbdd\002b5099_7f60d701\Aspose.PDF.dll!#=z4fCDDtM8J1XRJuuQUjMZljojLfnB.#=zV1190C8=+0x914
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\bau\b5043016\58100ddd\assembly\dl3\141cbbdd\002b5099_7f60d701\Aspose.PDF.dll!#=z4fCDDtM8J1XRJuuQUjMZljojLfnB.#=zV1190C8=+0x15
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\bau\b5043016\58100ddd\assembly\dl3\141cbbdd\002b5099_7f60d701\Aspose.PDF.dll!Aspose.Pdf.Devices.ImageDevice.#=zV1190C8=+0x9a
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\bau\b5043016\58100ddd\assembly\dl3\141cbbdd\002b5099_7f60d701\Aspose.PDF.dll!Aspose.Pdf.Devices.PngDevice.Process+0x18
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET

Thanks & best regards

1 Like

@daniel.wolf

Can you please try using 21.12 version of the API and if issue still persists, please share your sample PDF document along with the code snippet and environment detailed information so that we can try to replicate the issue in our environment and address it accordingly.

As already described, the document does not matter. It’s only about the non thread save access to the dictionary. The FindEntry in System.Collections.Generic.Dictionary will never return. See also https://stackoverflow.com/questions/33153485/is-it-possible-for-a-dictionary-in-net-to-cause-dead-lock-when-reading-and-writ/33153868. It is very difficult to reproduce this because it depends on several threads accessing a static dictionary at the same time.

1 Like

@daniel.wolf

We have opened another investigation as PDFNET-51200 in our issue tracking system to analyze this scenario with the latest version of the API. We will further look into its details and keep you posted with the status of its rectification. Please be patient and spare us some time.

We are sorry for the inconvenience.

It’s been a month now, what’s the status? Have you found anything?

1 Like

@daniel.wolf

We regret to share that the earlier logged ticket has not been yet resolved. We will surely investigate it after resolving the issues which are already under the investigation and let you know as soon as we have some definite updates regarding issue fix. We highly appreciate your patience and comprehension in this regard.

We apologize for the inconvenience.

@daniel.wolf

We would like to share with you that we performed initial investigation against the ticket PDFNET-39309 and found that the problem is due to accessing one PDF document through multiple threads. Please note that Aspose.PDF for .NET is a multithreaded safe API as long as one PDF document is accessed by one thread at a time. In other words, only one thread should access one PDF document at a time. Accessing same document simultaneously from multiple threads is not allowed.

Also, can you please share the sample code snippet that you are using at your end? We will perform investigation from that perspective as well and share our feedback with you.

@asad.ali
Our system runs in IIS with several worker processes. We do not have a handle on when which document needs a preview. The more users works, the greater the probability of parallel processing. However, we do not pass a document, but only a stream with the document content. From my point of view, this should always be new and therefore unique for you. Here are the methods used for a Word document and PDF:

private static void CreateImageFromWordDocument(Document document, Stream stream)
{
   using (Stream fileStream = document.ReadOnly())
   {
      Aspose.Words.Document doc = new Aspose.Words.Document(fileStream);
      Aspose.Words.Saving.ImageSaveOptions options = new Aspose.Words.Saving.ImageSaveOptions(Aspose.Words.SaveFormat.Png);
      options.PageSet = new PageSet(0);
      options.Resolution = 160;
      doc.Save(stream, options);
      stream.Seek(0, 0);
   }
}

public static void CreateImageFromPdf(Stream inputstream, Stream stream)
{
   var pdfDoc = inputstream.Length > 0 ? new AsposePdf.Document(inputstream) : new AsposePdf.Document();
   AsposePdf.Devices.PngDevice device = new AsposePdf.Devices.PngDevice();
   if (pdfDoc.Pages.Count > 0)
   {
      device.Process(pdfDoc.Pages[1], stream);
      stream.Seek(0, 0);
   }
}
1 Like

@daniel.wolf

Thanks for sharing more details. We have updated the ticket information accordingly and will let you know once we complete investigation from this perspective.

@asad.ali Has the additional information made troubleshooting easier? What is the status?

1 Like

@daniel.wolf

Regretfully, no certain progress could be made towards the issue resolution. However, your concerns have already been recorded and we will update this forum thread as soon as the ticket is resolved. We highly appreciate your patience in this regard.

We apologize for your inconvenience.

@asad.ali It’s been two other months now and I’m still waiting. What’s the status? Can I see the source of CreateImageFromPdf and all sub functions? Maybe I can help.

1 Like

@daniel.wolf

We are afraid that the investigation against the ticket could not get completed due to other pending issues in the queue. Please note that the nature of the issue seems complex and it would require certain amount of time in order to get is rectified after investigation. We will continue our investigation and as soon as we have some updates, we will share with you. We highly appreciate your patience and comprehension in this regard.

We apologize for the inconvenience.

The note that your implementation is not thread save was already made 7 years ago. I have carried this to you again 7 months ago. Also my help in finding the solution was not used. Unfortunately I fear now that you do not want to make your software thread save. Am I wrong and you forgot to update the status here.

1 Like

@daniel.wolf

We did not put this issue behind and we will surely resolve it. However, it is a performance related issue and complex in nature. Many API components need to be investigated and worked upon in order to completely resolve it. Nevertheless, your concerns have already been recorded and issue priority has been raised to the next level. We will inform you as soon as we have further updates about its resolution. We humbly apologize for the inconvenience.

@asad.ali Another year has passed. What is the status here? Has still nothing been done or have we no longer been actively informed here? If nothing has been corrected yet, I’m afraid we’ll have to look for another provider. The current situation is no longer acceptable for our customers.