Tweak HTML Export using Visitor pattern

Hello,
I am using Aspose.Words.NET v9.1. In our software, one of the functions saves a built-up Aspose.Words document to XHTML.
I need to be able to tweak this process to meet our needs as we inject meta-data into certain parts of the document. We are working with XML, but I find that simply injecting string “tags” (ie, …) into the content of a Run object leads to malformed XHTML. (note that our end tag can occur several nodes away). I am trying to find a better way of doing this and I think the DocumentVisitor approach is something that would be useful.
Using the Visitor pattern, I have a couple options available to me. I can either output custom html for BookmarkStart and BookmarkEnd nodes or Run nodes. The documentation examples only show Visitors that handle their own export. Someone on the forum said that Aspose themselves uses this pattern when saving to XHTML. Instead of writing my own Visitor to do this export, I’d like to inherit from Aspose’s Visitor, making only the modifications that I need to, so as not to lose proper XHTML export capability. I’ve poked around the object model but I can’t find a suitable Visitor to inherit from.
Might anybody know of a suitable base Visitor to inherit from or if what I want to do can be done without effectively rewriting the XHTML export function? If Aspose does not allow us to extend their base Visitors, can I request this feature (or something like it to tweak how certain nodes will be rendered)?

Hi there,
Thanks for your inquiry.
I’m afraid there is no way to hook into the visitors used during export to formats. You can only create your own visitors that enumerate nodes over the document object model (the same as using NextSibling/PreviousSibling etc).
I think you may still beable to achieve this by inserting a place holder and post processing your HTML to replace these.
I have logged your request to add custom markup to HTML export. May I ask how you want to insert custom markup into the middle of the HTML without invalidiating the HTML content? Maybe there is another way to achieve what you are looking for.
Thanks,

Hi Adam,
We presently are using a placeholder and post-processing. However, because we are doing this, our mark-up appears inside a run’s markup when I want it to appear outside (ie, … vs …). When I need to cover multiple runs, paragraphs, lists, and tables, this can get problematic. Presently, we have a situation when one of the runs has a header, we get :


and the tagging is malformed.
I would like it to output like this:


The and tags (which have custom attributes as well) would be associated with BookmarkStart and BookmarkEnd nodes accordingly. That is how we determine where to insert our markup. What I am doing presently is going to the first/last Run I find within that range and prepending/appending our custom markup. But this invalidates the markup and I have to fix it up after-the-fact, which is inelegant. I think that if we can do custom HTML at the Bookmark node level, then that will work okay.
Thank you for your assistance and for logging the request.

Hi there,
Thanks for this additonal information.
I think the way you are going about it right now it probably the easiest way to achieve this at the present moment. As you probably know bookmarks are exported as anchor tags so you can simply replace these tags with your custom markup during post processing.
I suggest instead of inserting the custom data within the run you instead insert it at the end of the document with a special style type. You can then extract these and match them with the anchor names and replace the data. You can then easily delete these markup spans after they have been processed. The code below demonstrates this:

AddMarkupNodes(doc, bookmarkName, "myAttribute=\"5\"");
public static void AddMarkupNodes(Document doc, string bookmarkName, string markupAttributes)
{
    string styleName = "MarkupStyle";
    if (doc.Styles[styleName] == null)
        doc.Styles.Add(StyleType.Character, styleName);
    Run markupRun = new Run(doc);
    markupRun.Font.Style = doc.Styles[styleName];
    markupRun.Text = string.Format("name=\"{0}\" {1}>", bookmarkName, markupAttributes);
    doc.LastSection.Body.LastParagraph.AppendChild(markupRun);
}

The only limitation right now is that bookmarks in the Aspose.Words DOM can only appear as inline nodes, so trying to achieve tags that wrap around paragraphs is harder. It’s still possible with some extra coding though.
Thanks,

Thank you for this tip. I will see what I can do with this technique.
Is it possible in the future that, if we can’t wrap bookmark tags around their content, that there be inlined tags that represent the start and end of a specific bookmark?
ie…

Hi there,
Thanks for your request.
I’m afraid from memory anchor tags are not allowed to span over block level nodes. We will most likely add support for your own custom tags which can be placed on any level in a future version of Aspose.Words. We will keep you informed.
Regarding making a bookmark have start and end anchor, you can achieve this easily already using a bit of code. Please see the implementation below.

SplitBookmarkRangeIntoSeparateBookmarks(doc);
public static void SplitBookmarkRangeIntoSeparateBookmarks(Document doc)
{
    DocumentBuilder builder = new DocumentBuilder(doc);
    ArrayList bookmarkList = new ArrayList();
    foreach(Bookmark bookmark in doc.Range.Bookmarks)
    bookmarkList.Add(bookmark);
    foreach(Bookmark bookmark in bookmarkList)
    {
        BookmarkStart start = bookmark.BookmarkStart;
        BookmarkEnd end = bookmark.BookmarkEnd;
        builder.MoveTo(start);
        builder.StartBookmark(bookmark.Name + "_Start");
        builder.EndBookmark(bookmark.Name + "_Start");
        builder.MoveTo(end);
        builder.StartBookmark(bookmark.Name + "End");
        builder.EndBookmark(bookmark.Name + "End");
        bookmark.Remove();
    }
}

One bookmark i.e named MyBookmark wrapped around alot of content will output as MyBookmark_Start at the start of the content and MyBookmark_End at the end of the content.
Thanks,

Thank you for a good idea regarding exporting HTML with custom data, but we decided to postpone work on this issue until we implement some of the earlier HTML features from our backlog.