Hi all,
I would like to extract tables from Word documents and convert them to html. Is there any way to convert for example a Table object into a html string?
Thanks in advance,
Alan
Hi all,
I would like to extract tables from Word documents and convert them to html. Is there any way to convert for example a Table object into a html string?
Thanks in advance,
Alan
Hello
Thanks for your request. Yes of course, you can do it using Aspose.Words. Please try using the following code to achieve what you need:
Document doc = new Document("C:\\Temp\\in.doc");
// Create a clone of the document
Document tempDoc = doc.deepClone();
java.util.List tables = Arrays.asList(tempDoc.getChildNodes(NodeType.TABLE, true).toArray());
// Clear tempDoc
tempDoc.getSections().clear();
tempDoc.ensureMinimum();
for (int i = 0; i <tables.size(); i++)
{
Table docTable = (Table) tables.get(i);
Table docTableCopy = (Table) docTable.deepClone(true);
// Insert table into document
tempDoc.getFirstSection().getBody().insertAfter(docTableCopy, tempDoc.getFirstSection().getBody().getFirstParagraph());
// Insert paragraph between tables
tempDoc.getFirstSection().getBody().insertAfter(new Paragraph(tempDoc), tempDoc.getFirstSection().getBody().getFirstParagraph());
}
tempDoc.save("C:\\Temp\\outTables.html");
Best regards,
Thank you very much! I’ve tried the code and it works.
However, this means that I always have to call the “save” method, which means that I need to first store the Table on the hard drive. Is there any way to get a html String representation directly?
Thanks in advance,
Alan
Hello
Thank you for additional information. You can save your document to Stream and then to string. Please see the following code:
OutputStream htmlStream = new ByteArrayOutputStream();
tempDoc.save(htmlStream, SaveFormat.HTML);
String strHtml = htmlStream.toString();
strHtml = strHtml.substring(strHtml.indexOf("<"));
System.out.print(strHtml);
Best regards,
Thank you very much!