Epub creation - problem files (page breaks- TOC- footer images- partial bold txt in paragraph)

I have a number of problem files and want to inquire if Aspose.Words had the ability to address them…

  1. 14428134-VSR-an-Alternative-to-Thermal-Treatment.doc
  • is there an option to enforce that the two column layout be observed (the epub that is generated has only one column)
  • at the beginning of the abstract and introduction for quite a few lines the text is bold, why is that? can it be fixed
  1. 22464657-The-Graduate-Job-Search-Guide-IE.doc
  • why are the bottom images on page 2, page 3, etc not converted over
  1. 22582535-The-Mountain-of-Miracles(2).doc
  • the table of contents is a combination of images in the background and text on top however when the conversion is made the entire thing falls apart. Can the TOC be perserved?
  1. 5608572-39-Lessons-From-A-Child-Experiencing-Divorce.doc
  • is there an option to have the epub observe the page breaks in the original doc (the epub that is generated does not honor the page breaks)

Hi

Thank you for your interest in Aspose.Words.
1.1. No there is no option to output EPUB document into two columns. As you may know EPUB format is based on HTML, and HTML format does not natively support text columns.

1.2. I cannot reproduce this problem on my side. I use the latest version of Aspose.Words for testing. You can download it from here:
https://releases.aspose.com/words/net

2.1. The reason of the problem is the same as reason of 1.1. Image on the bottom of each page in your document is document footer. HEML does not natively support footers, so only primary footer is output at the end of the document.

3.1. The problem occurs because there are floating image behind TOC. Unfortunately, Aspose.Words does not support positioning of floating images upon converting to HTML, MHTML and EPUB.

4.1. Aspose.Words outputs page breaks as the following:

<br style="page-break-before:always; clear:both; mso-break-type:section-break" />

But it seems EPUB readers does not recognize such breaks. Currently we are working on splitting HTML into parts upon converting to EPUB. This feature might help you to achieve what you need. We will let you know once it is available.
Best regards.

Thank you very much for the responses.

I upgraded to the latest version - 9.3.0

I am still getting the partial bold lines with 14428134 - VSR - an - Alternative - to - Thermal - Treatment.doc

I notice if I just do a straight conversion, as in, the bold goes away,

Document doc = new Document(_sourceFilePath);
doc.Save(_destinationFilePath, SaveFormat.Epub);

(https://forum.aspose.com/t/63225)
However my actual code is…

Document doc = new Document(_sourceFilePath);
ParagraphResolver resolver = new ParagraphResolver();
doc.Accept(resolver);
doc.Range.Replace("\v", string.Empty, false, false);
doc.Save(_destinationFilePath, SaveFormat.Epub);

With my actual code the partial bold lines are present…
Is there some thing that can address this?

Thanks in advance…

Hi Brian,
Thanks for your inquiry.
I was unable to reproduce the issue using the latest version of Aspose.Words (9.3) and Adobe Digital Editons. Could you please post your output EPUB file here and a screenshot of the issue.
Thanks,

using this code to convert…

Document doc = new Document(_sourceFilePath);
ParagraphResolver resolver = new ParagraphResolver();
doc.Accept(resolver);
doc.Range.Replace("\v", string.Empty, false, false);
doc.Save(_destinationFilePath, SaveFormat.Epub);

attached is the source doc and converted epub(had to add the.zip to upload)

also screen shot of issue… screenshot - partial bold lines.JPG

Hi Brian,
Thanks for attaching your documents. I have managed to reproduced the issue on my side.
This occurs because the ParagraphResolver class is merging consective paragraphs into one. The heading which is bold has the style “Heading 3”. This is a paragraph style which means when the paragraphs are combined into one the entire block of text is being formatted with this style causing it to appear bold.
To fix this I have simply removed the part of Alexey’s code which combines paragraphs and only left in the part that removes empty paragraphs. To me this removes the empty spaces while still retaining the same look of the format, and avoiding this issue with styles. Hopefully the output still appears as what you wanted. If this does not could you please clarify excatly how you would like the paragraphs removed and how you would like the output to appear. The code is below and the edited class is called ParagraphResolverEmptyParagraphs.

private class ParagraphResolverEmptyParagraphs: DocumentVisitor
{
    public override VisitorAction VisitParagraphEnd(Paragraph paragraph)
    {
        // Get next node after the paragraph.
        CompositeNode nextNode = (CompositeNode) paragraph.NextSibling;
        // If paragraph is empty and the next node is also enpty paragraph, remove the paragraph.
        if (!paragraph.HasChildNodes && nextNode != null && !nextNode.HasChildNodes)
        {
            paragraph.Remove();
        }
        return VisitorAction.Continue;
    }
}

Thanks,

referring to the page break issue (4.1 above)

Is there a release date schedule or rough estimate on when this feature will be released?

Hello

Thanks for your request. The fix of this issue is scheduled on the next hotfix. Please expect a reply before next hotfix (within 3-4 weeks). We might just fix the problem or provide you more information.
Best regards,

The issues you have found earlier (filed as 11126) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(4)

The issues you have found earlier (filed as 1144) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(39)