Converting PDF to HTML

Last post 01-19-2012, 12:34 AM by rashid.ali. 3 replies.
Sort Posts: Previous Next
  •  01-13-2012, 12:30 AM 354836

    Converting PDF to HTML .NET

    Hi Aspose,

    What would be the simplest way to convert a PDF into (text only) basic HTML?

    The simplest way I've found so far has been to retrieve all text and replace all new lines with line break HTML elements, using one of the below three methods to extract content:
    • PdfExtractor.ExtractText() method described here
    • using the TextAbsorber class described here
    • using the TextDevice class described here
    Is there another way to accomplish this?

    Best Regards,

    Miki

    Filed under: . Net;convert;html;PDF
     
  •  01-13-2012, 12:48 AM 354841 in reply to 354836

    Re: Converting PDF to HTML

    Hi Miki,

    Thank you for considering Aspose.Pdf.

    I am pleased to inform you that Aspose.Pdf for .NET v6.4 and above offers the feature of directly saving a PDF file to HTML. If you are using some older version, please download and try the latest version of Aspose.Pdf for .NET v6.5 and see if it fits your need. Following is a sample code which you can use for conversion from PDF to HTML.

    Aspose.Pdf.Document document = new Aspose.Pdf.Document("myFile.pdf");

    // Save to the thml

    document.Save("myFile.html", SaveFormat.Html);

    // or

    document.Save("myFile.html", new HtmlSaveOptions());

     

    Please feel free to contact support in case you face any issue.

    Thank You & Best Regards,


    Nausherwan Aslam
    Support Developer,
    Aspose Sialkot Team
    Contact Us
     
  •  01-17-2012, 3:14 AM 355618 in reply to 354841

    Re: Converting PDF to HTML .NET

    Thanks for the quick response. We've performed the upgrade Aspose.Pdf for .NET v6.6.

    We are opening a PDF document from a stream, then saving into a new stream as HTML, i.e.

    ...
    Aspose.Pdf.Document pdfDocument = new Document(sourceStream);
    ...
    pdfDocument.Save(htmlStream, SaveFormat.Html);
    ...

    However, the last line throws an exception: Save a document to a html stream is not supported. Please use method save to the file.

    Besides the three methods I've described in my initial post, is there a better way to convert a PDF to (text only) basic HTML without saving any files?

    Best Regards,

    Miki

     
  •  01-19-2012, 12:34 AM 356148 in reply to 355618

    Re: Converting PDF to HTML

    Hi Miki,

    I am very sorry to share with you that conversion from Streams to HTML format feature is not supported at the moment using Aspose.Pdf. You can achieve this functionality to convert the stream into PDF format and then PDF to HTML conversion is possible using above provided code.

    Please feel free to contact support in case you need any further assistance.

    Thanks & Regards,


    Rashid Ali
    Support Developer
    Aspose Sialkot Team
    http://www.aspose.com/
    Aspose – Your File Format Experts
     
View as RSS news feed in XML