Can I extract tables from Word documents and convert them to html strings?

Hi all,

I would like to extract tables from Word documents and convert them to html. Is there any way to convert for example a Table object into a html string?

Thanks in advance,

Alan

Hello

Thanks for your request. Yes of course, you can do it using Aspose.Words. Please try using the following code to achieve what you need:

Document doc = new Document("C:\\Temp\\in.doc");
// Create a clone of the document
Document tempDoc = doc.deepClone();
java.util.List tables = Arrays.asList(tempDoc.getChildNodes(NodeType.TABLE, true).toArray());
// Clear tempDoc
tempDoc.getSections().clear();
tempDoc.ensureMinimum();
for (int i = 0; i <tables.size(); i++)
{
    Table docTable = (Table) tables.get(i);
    Table docTableCopy = (Table) docTable.deepClone(true);
    // Insert table into document
    tempDoc.getFirstSection().getBody().insertAfter(docTableCopy, tempDoc.getFirstSection().getBody().getFirstParagraph());
    // Insert paragraph between tables
    tempDoc.getFirstSection().getBody().insertAfter(new Paragraph(tempDoc), tempDoc.getFirstSection().getBody().getFirstParagraph());
}
tempDoc.save("C:\\Temp\\outTables.html");

Best regards,

Thank you very much! I’ve tried the code and it works.

However, this means that I always have to call the “save” method, which means that I need to first store the Table on the hard drive. Is there any way to get a html String representation directly?

Thanks in advance,

Alan

Hello

Thank you for additional information. You can save your document to Stream and then to string. Please see the following code:

OutputStream htmlStream = new ByteArrayOutputStream();
tempDoc.save(htmlStream, SaveFormat.HTML);
String strHtml = htmlStream.toString();
strHtml = strHtml.substring(strHtml.indexOf("<"));
System.out.print(strHtml);

Best regards,

Thank you very much!