java.nio.charset.UnsupportedCharsetException: UTF-7

I’m receiving the following error when trying to open a Word document - this does not happen all of the time:
java.nio.charset.UnsupportedCharsetException: UTF-7
When I open the document in Word itself it works fine. What I’ve noticed is that if I perform a Save As from Word, it comes up with save as a Single File Web Page as the default. If I switch the Export type to a Word document and save it. I can process the document just fine using Aspose.
I’m using Aspose in a integration with Documentum. So when I recieve the document I’m getting it as a byte stream. Here is the code:

ByteArrayInputStream oBAInput = dfObject.getContent();
Document doc = new Document(oBAInput);

The above code fails during the creation of the doc object. What can I do to handle these word documents? Before upgrading to the latest version (2.6.1) of Aspose I would receive and unknown format error.
Thanks

I’ve uploaded a document that is failing.

The following returns and unknown file format error when trying to establish the document object.

i.e.:

String sFile = "C:/testdoc.doc";
Document oadoc;
oLicense = new License();
oLicense.setLicense(sLicenseFile);
oadoc = new Document(sFile);
BookmarkCollection oList = oadoc.getRange().getBookmarks();

etc…

I noticed when I open the TESTDOC.doc in a text editor it says that it’s a single file web page, excerpt:

MIME-Version: 1.0
Content-Type: multipart/related; boundary="----=_NextPart_01C98CED.DA16E0C0"

This document is a Single File Web Page, also known as a Web Archive file. If you are seeing this message, your browser or editor doesn’t support Web Archive files. Please download a browser that supports Web Archive, such as Microsoft Internet Explorer.

Is there a way to open these files?

Hi,
Your doc has a wrong extension. It is really in MHTML format (and should be .mhtml instead of .doc). Aspose.Words for Java doesn’t support MHTML yet – so it throws.
But I don’t understand about ‘UnsupportedCharsetException: UTF-7’ – in what version of Aspose.Words you get it?
And another thing. It will be better to download the fixed v.2.6.1 – it was broken by some unclear reason (forthcoming Friday, 13?). You can get it from the old address: https://releases.aspose.com/words/net .
Regards,

I receive the same error “java.nio.charset.UnsupportedCharsetException:
UTF-7” when attempting to open an .htm or .html file. This only occurs
when using Aspose.Words for Java 2.7.0. When I use Aspose.Words for
Java 2.6.1 this error does not occur. The file is actually an HTML
file and not MHTML.

Document doc = new Document("C:\Dev\testing.html");

Regards

Hi

Thanks for your request. Please attach your HTML document for testing. We will investigate the issues and provide you more information.
Best regards.

Hi,

I got the latest Aspose jars today, May’14.

We do DOC -> HTML ->DOC

With the previous Aspose release my export(HTML to DOC) was working fine.

Now all the documents are reporting java.nio.charset.UnsupportedCharsetException: UTF - 7
same with DOCX

here is the stack trace.

java.lang.Exception: java.nio.charset.UnsupportedCharsetException: UTF-7
java.lang.Exception: java.nio.charset.UnsupportedCharsetException: UTF-7
at asposewobfuscated.lz.tk(Unknown Source)
at asposewobfuscated.lz.tg(Unknown Source)
at asposewobfuscated.lz.R(Unknown Source)
at com.aspose.words.FileFormatUtil.i(Unknown Source)
at com.aspose.words.FileFormatUtil.a(Unknown Source)
at com.aspose.words.Document.a(Unknown Source)
at com.aspose.words.Document.(Unknown Source)
at com.aspose.words.Document.(Unknown Source)
at com.hylighter.document.DocxExportHandler.export(DocxExportHandl

Here is my conversion code for HTML to DOC(X)

Document document = new Document(extractedFile); extractedFile += ".docx"; 
document.setShadeFormData(false); document.save(extractedFile, SaveFormat.DOCX);

Am I missing some thing?

thanks,
karthik

Hi

Thanks for your request. Please attach sample documents for testing. I will investigate the issue and provide you more information.
Best regards.

Hi,

I have attached the original document Anti_phishing_White_Paper.doc
and its converted HTML as Anti_phishing_White_Paper.zip.

thanks,
karthik

Hi

Thank you for additional information. I cannot reproduce the problem on my side. Did you do anything with Aspose.Words.jar? Maybe you have obfuscated it or something else.
Please tell me how you use Aspose.Words in your application.
Best regards.

Hi,

Thanks for your reply.

My standalone conversion seems working.

I just moved the jars to my application lib folder. Nothing else.

We have a web application, in which we import document (DOC -> HTML).
We do some processing and customization
And export back to DOC or DOCX.
We do tidy clean up(before or after) where every HTML plays in the conversion.

thanks,
karthik

Hi,

Did you tried the HTML I had attached in my previous post. It reports the same exception even on my standalone conversion(HTML to DOC).

If your HTML to DOC(X) works. could you paste the sample code?

thanks,
karthik

Hi

Thanks for your request. Yes, I tried to convert your HTML to DOC(X) format and all works fine on my side. Here is my code:

// Open HTML docuemnt
Document doc = new Document("Anti_phishing_White_Paper.doc.html");
// Save document in DOC(X) format
doc.save("out.doc");

Best regards.

Hi,

Finally I got it working with the other Document constructor.

Document(file, loadformat, "");

I still don’t know why the same file worked on my standalone application failed on my web application.
Could be something with the automatic format detection?

I hope this helps someone with the same problem.

thanks,
karthik

It is perfect that you already found the solution. Thank you for sharing it here.
Best regards.

Hi,
You right, your deobfuscated stack trace points to automatic format detection code. But:

  1. This code branch used by every application that uses automatic format detection, i.e. new Document(filename) overload. Probably the bigger half of users are using this overload but you are single how reported such an issue.
  2. Your code reports that can’t find UTF-7 Charset. Since core java does not support UTF-7 Charset we include it in Aspose.Words jar. Code itself obfuscated within the jar and referenced by META-INF\services\java.nio.charset.spi.CharsetProvider file. Probably you somehow changed the jar: obfuscated it or modified META-INF\services folder so java’s CharsetProvider service can’t find our UTF-7. Another option – conflict of CharsetProviders in your environment (somewhat like JarHell).

Please, inform me: 1) are you modified Aspose.Words jar; 2) Is there another applications in your web environment that can conflict with our UTF-7 CharsetProvider?
Best Regards,