Issue with Generating PDF from HTML containing Images

Pedro_Vitoretti · February 23, 2011, 6:57am

Hi,

I have a WebPart developed in C# that generates a PDF based on a HTML that contains some images, which are stored at a document library in SharePoint 2007.

After generating the PDF and seeing that the images were not found, I tried to debug and look up on the logs to find out, and the log of the web application contained the requests for the images, but an http status of 401.2 was returned.

The question is the following: How can I tell Aspose to use the credential of the current user (instead of the one that it’s using right now, which I don’t know)?

I imagined that it would use the NETWORK SERVICE account, which is the identity/account of the web application (IIS process), so I granted all sorts of permissions to the images for this specific user, but had no sucess.

Enablig anonymous access on the site is not possible, given the restrictions of the customer on the server.

Thanks

alexey.noskov · February 23, 2011, 8:28am

Hi

Thanks for your request. I think the problem occurs because you should specify network credentials to access the images. Currently, there is no way to set network credentials of Document and Aspose.Words uses default credentials. I linked your request to the appropriate issue.

Maybe, as a workaround, you can parse your HTML, create local copies of each image in the HTML document and use local path instead URI.

Best regards.

Pedro_Vitoretti · February 23, 2011, 9:51am

Unfortunally we’re not able to save the images anywhere else.

Is there another way to force it in .NET? I’ve tried impersonating…

Another possibility would be change this behaviour by reflection, but I’d need some more info from aspose to do so.

Thanks for your response!

alexey.noskov · February 23, 2011, 10:49am

Hi

Thanks for your inquiry. Since, Aspose.Words supports base64 as image source, you can create your own method to get images and replace image path in src attribute with base64 representation of the image. Here is a very simple code that demonstrates the technique:

[Test]
public void Test001()
{
    // Get Html string.
    string html = File.ReadAllText(@"Test001\in.html");
    // Create a regular expression that will help us to find image SRCs.
    Regex urlRegex = new Regex("src\\s*=\\s*[\"']+(http(s)?://([\\w-]+\\.)+[\\w-]+(/[\\w- ./?%&=]*)?)[\"']+");
    // Serch for SRCs.
    MatchCollection matchs = urlRegex.Matches(html);
    foreach (Match match in matchs)
    {
        // Replace urls with embedded base64 images.
        html = html.Replace(match.Groups[1].Value, GetBase64(match.Groups[1].Value));
    }
    // Now you can insert HTML into the document. All images are embedded into the HTML string.
    DocumentBuilder builder = new DocumentBuilder();
    builder.InsertHtml(html);
    builder.Document.Save(@"Test001\out.doc");
}

private string GetBase64(string imageUrl)
{
    string base64Data = "";
    try
    {
        // Prepare the web page we will be asking for
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(imageUrl);
        request.Method = "GET";
        request.ContentType = "image/jpeg";
        request.UserAgent = "Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT+5.0";
        // Execute the request
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();
        //We will read data via the response stream
        Stream resStream = response.GetResponseStream();
        //Write content into the MemoryStream
        BinaryReader resReader = new BinaryReader(resStream);
        // Build base64 string.
        base64Data = string.Format("data:image/jpeg;base64,{0}",
        Convert.ToBase64String(resReader.ReadBytes((int)response.ContentLength)));
    }
    catch (Exception)
    {
    }
    return base64Data;
}

I hope such approach could help you to achieve what you need.

Best regards,

Tanveer.khan · September 21, 2011, 11:36pm

Hi Alexey,

We have an html page which has some images/text and the images for this html page are in a associated folder.This html page and images folder have been placed in a sharepoint web application document library.

Can you please help us provide a sample of how we can convert a html file with images (located on a web app) into a pdf ?

We have tried using code sample at your site which reads .htm from local drive folder and converts to pdf. If we place the html file and images folder on drive , it does not convert the images into pdf. The images dont get displayed in pdf except for the html text.

Is it a limitation of the trial version ?

We tried converting single images to pdf and this worked without any issue.

We have been trying using the trial version of Aspose.pdf (ver 6.02) to check if fullfills our requirement (C#.NET)

alexey.noskov · September 22, 2011, 5:39am

Hi

Thanks for your request. You can try using code like the following to open the HTML document from url:

/// 
/// Opens document from web.
/// 
private Document OpenDocumentFromUrl(string url)
{
    //Prepare the web page we will be asking for
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
    request.Method = "GET";
    request.ContentType = "text/html";
    request.UserAgent = "Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT+5.0";
    //Execute the request
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    //We will read data via the response stream
    Stream resStream = response.GetResponseStream();
    //Write content into the MemoryStream
    BinaryReader resReader = new BinaryReader(resStream);
    MemoryStream docStream = new MemoryStream(resReader.ReadBytes((int)response.ContentLength));
    // Open document from stream.
    Document doc = new Document(docStream);
    return doc;
}

Hope this helps.

Best regards,

aspose.notifier · October 6, 2012, 9:53pm

The issues you have found earlier (filed as WORDSNET-927) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.