Sign In  Sign Up Live-Chat

Converting .doc to mhtml email

Last post 10-10-2008, 2:42 PM by alexey.noskov. 22 replies.
Page 1 of 2 (23 items)   1 2 Next >
Sort Posts: Previous Next
  •  06-16-2008, 12:57 AM 131373

    Converting .doc to mhtml email

    Attachment: Present (inaccessible)

    We are building an application that will convert an uploaded word document into an email in mhtml formt. We are planning on using your product to handle server side conversion. However.

    We find that the converting of a .doc to email in our testing is giving a significantly different result than does MS word on the same task.

    I will upload the original doc, and two screen captures to show. We are wondering if..

    1. We are doing something wrong?

    2. Can we kick the application to give a better result?

    3. Can anything be done your end to improve the performance to mhtml?

    The attached files are ziped, showing the difference.

    If you think your product will not be capable of giving the conversion performance we seek, could you be so kind as to indicate this so we can move onto another solution.

     


    Regards,
    http://www.interactivewebs.com
     
  •  06-17-2008, 1:34 AM 131539 in reply to 131373

    Re: Converting .doc to mhtml email

    Hello!

     

    Thank you for your inquiry.

     

    I’ll take a look and provide you more information on this case shortly.

     

    Regards,
    Viktor Sazhaev
    Software Engineer, Aspose Auckland Team
     
  •  06-17-2008, 3:10 PM 131709 in reply to 131539

    Re: Converting .doc to mhtml email

    Hello!

     

    Please ensure you are using the latest version of Aspose.Words. I’m getting different results. MHTML conversion involves HTML and it has some restrictions. We can help you get better results with some document refactoring but that’s difficult to promise high fidelity on any input documents. Let’s consider particular issues to go further.

     

    Regards,
    Viktor Sazhaev
    Software Engineer, Aspose Auckland Team
     
  •  06-19-2008, 7:46 AM 132097 in reply to 131709

    Re: Converting .doc to mhtml email

    Attachment: Present (inaccessible)

    Thanks for the reply. I will give you a good example with the attached document.

    We find that when we use this one, we get the test font and spacing between the font different in ASPOSE than the original, yet word keeps the fony style in it's conversion.

    We find aspse changes it to Times New Roman looking font. Any suggestions.?


    Regards,
    http://www.interactivewebs.com


    Regards,
    http://www.interactivewebs.com
     
  •  06-19-2008, 10:09 AM 132130 in reply to 132097

    Re: Converting .doc to mhtml email

    Hello!

     

    Thank you for the new materials. I was unable to reproduce font problem that you are describing. I can send you my conversion results. The output is very nice. You can either give me an e-mail address or allow attaching it here.

     

    Regards,
    Viktor Sazhaev
    Software Engineer, Aspose Auckland Team
     
  •  06-20-2008, 8:13 PM 132372 in reply to 132130

    Re: Converting .doc to mhtml email

    My name is david

    Would you mind sending the result to us AT interactivewebs.com.au  - (please interporate the address)


    Regards,
    http://www.interactivewebs.com
     
  •  06-21-2008, 4:33 AM 132403 in reply to 132372

    Re: Converting .doc to mhtml email

    Hello David.

     

    Thank you for your cooperation.

     

    I'm attaching (to an e-mail) MHTML file and two raster pictures useful for visual comparison. I see that Verdana font looks slightly different but I got the same results with MS Word when saved the DOC as HTML and opened in the browser. I tried with Internet Explorer 7.0 and Avant Browser 11.5 (they really use the same engine). We cannot force fonts to be absolutely identical in MS Word and other applications. That’s one of the major considerations why we cannot guarantee 100% fidelity of MS Word documents and what we get after conversion to HTML, MHTML and PDF. These formats are not “native” for MS Word and live by other rules.

     

    I found that bulleted list in the MHTML has less line spacing and less spacing after bullets. The other difference is that our MHTML shows on the left of the browser window but one was produced by MS Word is centered. These differences can be considered but they need some MS Word specific “magic” in HTML code. Our design principle is to avoid it wherever possible because of complaints on non-standard techniques in HTML produced by MS Word. It exports almost everything that can be in documents but you can see how it is achieved if you open their HTML as text. In particular, there is no attributes or style modifiers in HTML to control indent after list bullets.

     

    Please let me know if we can help you further.

     

     

    Regards,
    Viktor Sazhaev
    Software Engineer, Aspose Auckland Team
     
  •  06-21-2008, 8:14 AM 132414 in reply to 132403

    Re: Converting .doc to mhtml email

    Thanks for the email and the information. Our goal is building a process for word mail merge that runs server side without needing word to be installed on the server. Naturally our goal is to produce as close as possible the same result from the server as MS word would give locally.

    I understand that you goal is a little more refined, in that correct end code is more important than blind replication.

    So I would like to check a few things.

    1. Any way to tweek the product to reproduce more exactly the end result that matches word's end result?

    2. If not, can you suggest any other items that are likely to cause issues, perhaps fonts and formatting that you know have a differing result?

    3. Any other tips or suggestions you have?


    Regards,
    http://www.interactivewebs.com
     
  •  06-22-2008, 8:10 AM 132449 in reply to 132414

    Re: Converting .doc to mhtml email

    Hello David!

     

    This is one of the most typical tasks for Aspose.Words. I mean mail merge and saving to miscellaneous formats on server. We try to produce output closer to what we get from MS Word. But some MS Word features don’t map directly to “non-native” formats. I already brought examples related to HTML/MHTML. In HTML standard some features don’t even define a-must behavior and delegate degrees of freedom to visual agents (browsers). Of course you know that different browsers can render same pages differently.

     

    I can take a look on what happens with line spacing. But we cannot control spacing after bullets unless using MS Word approach to output lists. That’s what we don’t like. Regarding these issues I cannot recommend any parameter tweaks. Perhaps you can get closer fidelity simulating that list by non-list paragraphs with tabs. But I think it’s unnecessarily complex and inconveniently.

     

    We don’t have full information on what items potentially cause issues. Here is a spreadsheet showing level of import and export to HTML and PDF (link below). But it states only that something is supported or partially supported without notes on rendering fidelity. We plan to improve this part of documentation and will think about fidelity. The better way is experimenting with real documents and discussing particular differences here in the forum.

     

    http://www.aspose.com/community/files/51/file-format-components/aspose.words/entry108980.aspx

     

     

    Regards,
    Viktor Sazhaev
    Software Engineer, Aspose Auckland Team
     
  •  07-01-2008, 11:54 AM 133851 in reply to 132449

    Re: Converting .doc to mhtml email

    Attachment: Present (inaccessible)

    Thanks for that info. It is helpful. - Now we are a developer license client, so the support has taken us that far.

    We can see that most of the failures we get also fail in word, or look other than expected in MS word - mhtml.

    This is a test 5 Attached. - standard MS word 2007 template saved by word to MHTML (good result) and my result as an attached msg file is what we have out of aspose. Not what we expected on this one.

    I also notice that messages converted from .docx files are including some strange text at the beginning of the message "

    

     

     

    Any comment on that?


    Regards,
    http://www.interactivewebs.com
     
  •  07-02-2008, 4:09 AM 133911 in reply to 133851

    Re: Converting .doc to mhtml email

    Hello!

     

    Thank you for your inquiry.

     

    I see that msg file is not MHTML at all. You should probably specify SaveFormat.Mhtml. To be absolutely sure you can share your code here for inspection.

     

    Floating objects are not supported in HTML/MHTML export. But your source document is designed this way: everything is put into floating boxes. This issue is known in our defect database as #1001. Currently it is considered as by-design behavior. As a workaround you can switch to using flow content.

     

    Please clarify regarding other things. What do you mean under “also fail in Word”? We cannot promise you better functional coverage than MS Word does. That would be great if you bring any samples.

     

    I also cannot guess exactly what happens to documents when you are converting from DOCX. You wrote that some unexpected characters appear at the beginning. Would you explain how you reproduce this?

     

    Regards,
    Viktor Sazhaev
    Software Engineer, Aspose Auckland Team
     
  •  07-07-2008, 9:04 AM 134541 in reply to 133911

    Re: Converting .doc to mhtml email

    Attachment: Present (inaccessible)

    Thanks for your reply. We are still testing.

    Here is a strange one. This test was from an MS word document. When sent via our email module application it worked fine.

    One email address received fine in outlook 2007.

    The other as you will see has STRIKE THROUGH on about 50% of the email.

    Any idea on the cause of that?


    Regards,
    http://www.interactivewebs.com
     
  •  07-07-2008, 9:56 AM 134555 in reply to 134541

    Re: Converting .doc to mhtml email

    Hello!

     

    Thank you for these materials. But I cannot guess what format it is: neither MHTML nor DOC. Please share the code you are using to obtain these documents and explain what you are getting next.

     

    Regards,
    Viktor Sazhaev
    Software Engineer, Aspose Auckland Team
     
  •  07-19-2008, 12:26 AM 136167 in reply to 134555

    Re: Converting .doc to mhtml email

    Attachment: Present (inaccessible)

    The attachments were in MS email format. Any recent MS email application should open them.

    In any case, here is a strange one.

    The attached word documents in both .doc or .docx add a strange

    

     

    to the beginning of the converted mhtml messages. Can you suggest a fix for this?

     

    Incidentally we notice the MS word conversion to MHTML for the same documents do not produce this.


    Regards,
    http://www.interactivewebs.com
     
  •  07-20-2008, 5:47 AM 136202 in reply to 136167

    Re: Converting .doc to mhtml email

    Hello!

     

    Thank you for additional materials.

     

    Here is what I get after conversion (beginning):

     

    MIME-Version: 1.0

    Content-Type: text/html;

                charset="utf-8"

    Content-Transfer-Encoding: quoted-printable

    Content-Location: document.html

     

    =EF=BB=BF<html>

                <head>

                            <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dutf-8" =

    />

                            <meta name=3D"generator" content=3D"Aspose.Words for .NET 5.2.1.0" />

                            <title>

                            </title>

                </head>

                <body>

                            <div class=3D"Section1">

                                       <p style=3D"margin-left:0pt; margin-right:0pt; margin-top:0pt; margin-bo=

    ttom:10pt; line-height:115%; font-size:11pt; "><span style=3D"font-family:'=

    Calibri'; font-size:11pt; ">Here is your profile information that might be =

    useful</span></p>

     

    The file itself has no Byte Order Mark (BOM) since it is always in ASCII. But encoded HTML is in UTF-8 by default and is preceded by UTF-8 BOM. I made it bold in the snippet. Do you mean this? It’s a part of standard and should be properly treated by consumer applications. I’m sorry if your application doesn’t treat BOM as BOM, does it? You can remove this sequence after conversion.

     

    Please let me know if I misunderstand your question.

     

    Regards,
    Viktor Sazhaev
    Software Engineer, Aspose Auckland Team
     
Page 1 of 2 (23 items)   1 2 Next >
View as RSS news feed in XML