Doc to html conversion then converted html did not parse

dear sir

my scenario is that

  1. first upload doc file and then we send response then we display on client browser

  2. when i am trying to parse then there is problem in case of when we use htmlfixedoption in upload file response but in case of htmlsaveoption it parse correctly when i examine the html

in case of htmlfixedoption there large html .

please help how to resolve .

Hi there,

Thanks for your inquiry. The HtmlFixed file is different from Html. Could you please share your input and output documents here for testing? Please also share the issue detail that you are facing while parsing. We will investigate the issue on our side and provide you more information.

dear sir

i am attached the doc file which want to convert into html and then try to parse but there is problem to parsing with jsoup;

i am using javascipt function to upload the doc file and show in div -

    function fun_uploadFile() {
        var file = "#file";
        var fd = new FormData();
        fd.append(‘file’, $(file)[0].files[0]);
        document.getElementById(‘dataupload’).style.display = "block";
        $.ajax({
                url : ‘UploadFile’,
        data : fd,
                method : ‘POST’,
        processData : false,
                contentType : false,
                success : function(data) {
            var receive_Json = JSON.parse(data);
            columnNumber = receive_Json[‘patternValue’];
            columnNumberH = receive_Json[‘patternValue1’];
            var html = receive_Json[‘html’];
            /* var xyz=document.getElementById(‘dataupload’);
            var children=xyz.childNodes;
            console.log(children[0]); */
            /* var nodesArray = [].slice.call(document.querySelectorAll("children"));
            
            */
            console.log(columnNumber);
            document.getElementById(‘dataupload’).innerHTML = html;
            document.getElementById(‘createtemp’).style.display = "none";
            document.getElementById(‘createQuery’).style.display = "block";
        }
        })
    }

on upload servlet i am using following code-

    protected void doPost(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {
        System.out.println("upload Servlet file");
        response.setCharacterEncoding("UTF-8");
        // JsContemtWriter jsContemtWriter=new JsContemtWriter();
        String serverPath = this.getServletContext().getRealPath(File.separator) + "\WebContent" + "\temp";
        try {
            if (!new File(serverPath).exists())
                new File(serverPath).mkdirs();
            // For uploading the file we require MultiPartRequest
            MultipartRequest mp = new MultipartRequest(request, serverPath, 1024 * 1024 * 1024);
            // for getting file name
            Enumeration<?> itr = mp.getFileNames();
            while (itr.hasMoreElements()) {
                filename = mp.getFilesystemName((String) itr.nextElement());
                System.out.println("filename=" + filename);
            }
            System.out.println(serverPath);
            Document doc = new Document(serverPath + "\\" + filename);
            doc.updatePageLayout();
            doc.updateFields();
            while (doc.getSections().getCount()>1)
            {
                doc.getFirstSection().appendContent((Section)doc.getFirstSection().getNextSibling());
                doc.getFirstSection().getNextSibling().remove();
            }
            HtmlFixedSaveOptions fixedOption = new HtmlFixedSaveOptions();
            fixedOption.setPageIndex(0);
            fixedOption.setEncoding(java.nio.charset.Charset.forName("UTF-8"));
            fixedOption.setPrettyFormat(true);
            fixedOption.setExportEmbeddedCss(true);
            fixedOption.setExportEmbeddedFonts(true);
            fixedOption.setPrettyFormat(true);
            fixedOption.setUseHighQualityRendering(true);
            fixedOption.setExportFormFields(true);
            fixedOption.setUseAntiAliasing(true);
            fixedOption.setExportEmbeddedImages(true);
            fixedOption.setJpegQuality(SaveFormat.HTML);
            HtmlSaveOptions options = new HtmlSaveOptions();
            // options.setExportPageSetup(true);
            options.setExportImagesAsBase64(true);
            options.setExportTextInputFormFieldAsText(false);
            options.setEncoding(java.nio.charset.Charset.forName("UTF-8"));
            options.setPrettyFormat(true);
            options.setExportImagesAsBase64(true);
            options.setExportPageSetup(true);
            options.setExportTextInputFormFieldAsText(false);
            options.setExportTocPageNumbers(true);
            options.setExportPageSetup(true);
            options.setExportDocumentProperties(true);
            options.setExportRelativeFontSize(false);
            options.setUseHighQualityRendering(true);
            options.setExportDocumentProperties(true);
            options.setScaleImageToShapeSize(true);
            options.setUseAntiAliasing(true);
            options.setExportRoundtripInformation(true);
            options.setImageResolution(SaveFormat.HTML_FIXED);
            int pageCount = doc.getPageCount();
            doc.save(serverPath + "\temp.html", fixedOption);
            // response.getWriter().write(Demo.replaceBlank(serverPath+"\temp.html"));
            JSONObject obj=new JSONObject();
            String result = Demo.replaceBlank(serverPath + "\temp.html");
            int numberofpattern = Demo.matchedPattern;
            Demo.matchedPattern = 0;
            String result1 = Demo.replaceBlank1(result);
            int numberofpattern1 = Demo.matchPattern1;
            Demo.matchPattern1 = 0;
            String result2=Demo.replaceBlankStar(result1);
            String afterCheckBox=Demo.replaceCheck(result2);
            obj.put("patternValue", numberofpattern);
            obj.put("patternValue1", numberofpattern1);
            obj.put("html", afterCheckBox);
            // System.out.println(result1);
            response.getWriter().write(obj.toString());
        } catch (Exception e) {
        }
    }
}

when i try to save with HtmlFixedSaveOptions then image position is correct( most of cases) but there is problem to parse or processiong it converted into very large string but view of page is very reliable

but when i am trying HtmlSaveOptions then view also look a like doc file uploaded but there issue in images position so how to resolve please provide a proper example

thank you

Hi there,

Thanks for sharing the detail. Please note that MS Word and HTML formats are quite different so sometimes it’s hard to achieve 100% fidelity. Upon processing HTML, some features of HTML might be lost. You can find a list of limitations upon HTML exporting here:

Save in the HTML (.HTML, .XHTML, .MHTML) Format

The HtmlFixed file is also different from Html. In your case, we suggest you please save the document to Html format instead of HtmlFixed. Moreover, note that Aspose.Words mimics the same behavior as MS Word does. If you convert your document to Html using MS Word, you will get the same output.