Hello Alexander!
Thank you for experimenting with our new release.
I can explain what happens with these documents. Probably you would be able to find workarounds.
a5.eml has a very long primary header. When Aspose.Words reads a document it first briefly detects file format, then instantiates appropriate huge importer objects. It’s a very lightweight algorithm analyzing first 512 bytes of any file. During file format detection signatures, mime types etc. might be checked. HTML and MHTML don’t have any signatures. So we have to accept anything that looks like a mime header searching for some mandatory records. For instance, MHTML detection requires that "Content-Type" occurs within first 512 bytes. This doesn’t happen in your document and format detection fails. How long the detection buffer should be s a philosophical question. RFC 2557 and RFC 822 don’t restrict header length. But we don’t want to read any candidate file potentially up to the end. I’ll try to improve detection to cover your cases but right now you can enforce importing MHTML by specifying format explicitly:
Document doc = new Document("a5.eml", LoadFormat.Mhtml, string.Empty);
When I did this I found another issue. The main HTML document is not read. Here is also format detection. It’s needed in general case to ensure that the source has proper encoding. Since encoding is already specified in a subsidiary header we can soften the conditions. As a workaround you can enclose the whole document in <html> … </html> tags.
a6.eml is not read because multipart/mixed content type is not supported. Other multipart messages (alternative and related) are imported okay.
I have linked your requests to appropriate issues. You’ll be notified when they are fixed.
You can share any materials with us by sending them via Aspose website:
http://www.aspose.com/corporate/purchase/faqs/send-license-to-aspose-staff.aspx
Regards,
Viktor Sazhaev
Software Engineer,
Aspose Auckland Team