Aspose.Words changes way Word processes Style information ?!?

Hello,

Using Aspose.Words I encountered really strange behaviour.

I have a customer generated Doc-File (e.g. attached Testabsatz.doc).
For the purpose of further automated processing of the Word-File (using VBA on .doc and xslt on uncompressed .docx), the goal is to remove hard (i.e. not created by styles) formatting throughout the document.

Before manipulating the word-File using Aspose, a set bold-Flag in a Style means ‘switch bold’, using bold in both Paragraph and Character Style results in non-bold text.

After manipulating the Word file (see attached Testabsatz.doc.doc), this is changed: Using both paragraph and character style with set bold-flag results in bold text.

I use the sourcecode from the attached word_style.rb.txt.
(@doc here denotes the document node)

Best Regards,

Martin Hillert

Hello
Thanks for your request. Maybe in your case you should just use clearFormatting method to resets to default paragraph formatting:
https://reference.aspose.com/words/java/com.aspose.words/paragraphformat/#clearFormatting
If it does not help, could you please attached expected document here for testing.
Best regards,

Hello, Alexey,

thanks for the fast reply.

There seems to be an misunderstanding.
The intention of the code I posted was to create a document with the same optical impression as Testabsatz.doc, but without using any formatting of runs (run.getFont.getBold == true etc.) but with style-formatting only. The runs are cleared of their formatting before adding a different, generated, character style which should generate the same optical impression as before, but without formatting as e.g. run.getFont.setBold etc.
The created document is Testabsatz.doc.doc, which looks differently even though the runs are formatted as intended. If you create a new word document and create styles with the same formatting as in Testabsatz.doc.doc the paragraphs will look as in Testabsatz.doc.
If the same is done using Aspose you get the result as in Testabsatz.doc.doc.

So WORD does function differently with styles generated by aspose, as opposed by styles generated by Word itself.

The change seems to toggle some internal switch in word since new styles generated in Testabsatz.doc.doc using Word behave the same as those generated by Aspose.Words but still differently from those in Testabsatz.doc which has never been manipulated with Aspose.Words.

I hope this clarifies my issue.

Best regards,

Martin Hillert

Hello
Thanks for your inquiry. Could you please provide me full code which will allow me to reproduce the problem on my side? Also, which version of Aspose.Words you are using?
Best regards,

Hello, Alexey,

I used Aspose.Words for Java V10.4 and V10.6 with the same results.

I already sent you the sourcecode I use to generate the problem (word_style.rb). Unfortunately it is ruby source code accessing the AsposeWords library via a library called ruby java bridge. I am not proficient enough using Java as a language to ‘translate’ this to Java on short notice.

I think the sourcecode should still be readable, if somewhat unusual, since the methods I use have the same names as in java.

The method I used is called ‘reformat_runs’.
Outside the class I sent you is only file handling and assigning the document to the @doc variable.

I hope this works for you.

Best regards,

Martin Hillert

Hi
Thank you for additional information. However, I am still curios, why you need all the text formatting to be applied using character styles? It is normal practice in MS Word documents to use text formation applied directly to runs.
I suppose, the your code does not work as expected because you do not take in account inheriting of formatting. Formatting of text in MS Word documents can be defined on few different levels:

  1. Paragraph style defined for the particular paragraph;
  2. Character style defined for the particular run;
  3. Explicit formatting specified for a particular run.

For more information, please see the following link:
https://docs.aspose.com/words/net/aspose-words-document-object-model/
Best regards,

Hi,

I really do understand the way Word renders text.

What I try to communicate is that Aspose.words manipulated Files are rendered differently by Word than ‘Word only’-Files.

Consider the ‘boldness’ of the first paragraph of Testabsatz.doc.
This paragraph is formatted with a paragraph-style which sets bold to true.
This complete paragraph is formatted with a character style which sets bold to true.
There is no explicit formatting regarding ‘bold’ or ‘non-bold’.
The paragraph is rendered ‘non-bold’ in Word.

Now consider the second paragraph of Testabsatz.doc.doc.
There the same formatting applies except the styles are generated with Aspose.Words using the code I sent you.
Still it is rendered bold.

What is the difference?

Best regards,

Martin Hillert

Hi
Thank you for additional information. I think, the problem occur not because the style is created by Aspose.Words. I think there are simply some inheriting rules that are not handled by your code. That is why formatting is not the same. If I understand correctly, in your code you are copying formatting of run to the character style, i.e. copy value of each font property of run to the appropriate properties of style font. However, there can be default values that are overwritten by values in styles. When you override them with values from run, you get default values in style and that is why formatting is changed.
Best regards,

Hi,

Could you please explain in detail why the behaviour in the example above is different?
Please explain why two runs with exactly the same formatting are rendered differently by Word.
Pleasse explain why, in Testabsatz.doc.doc, I can create a new paragraph (using Word), apply a paragraph format setting the paragraph bold, apply (to a part of the paragraph) a character style which sets bold, and get bold text, while doing the same in Testabsatz.doc results in plain (not-bold) text.

How can I get information about the ‘inheriting rules’ you mention?

Somehow your reactions seem like you simply do not believe there is a problem while not beeing able to explain why the behaviour is just the way it is.
I feel somewhat not taken seriously.

Best regards,

Martin Hillert

Hi
Thanks for your request. I believe there is no problem. Let’s see what you are doing in your code: you loop through all runs and copy inline formation to the style applied to each run. However, the same style can be applied to multiple runs, so inline formation of each next run overrides formation copied from the previous run. Am I right?
The simplest way to work this around is creating a new style for each run and use the style applied the current run’s style as a base style of the newly created style. Code will look like the following:

@Test
public void Test001() throws Exception
{
    Document doc = new Document("C:\\Temp\\Testabsatz.doc");
    reformarRuns(doc);
    doc.save("C:\\Temp\\out.doc");
}

/**
    * Methods copies formating of runs into character styles and apply styles to the runs.
    */
private void reformarRuns(Document doc) throws Exception
{
    // Get all runs in the document.
    NodeCollection runs = doc.getChildNodes(NodeType.RUN, true, false);
    // Loop through all runs.

    for(Run run : runs)
    {
        System.out.println(run.getFont().getBold());
        // Get character style asociated with the current run.
        // If there is no such style, create new style and apply it to the run.
        Style runStyle = doc.getStyles().add(StyleType.CHARACTER, UUID.randomUUID().toString());
        // Use the style applied to the run as a base style of the newly created style.
        if (run.getFont().getStyle() != null)
            runStyle.setBaseStyleName(run.getFont().getStyleName());
        copyFormatting(run.getFont(), runStyle.getFont());
        // Clear formating of the run and reset style.
        run.getFont().clearFormatting();
        run.getFont().setStyle(runStyle);
    }
}

///
/// Uses reflection to copy properties from one object to another of the same type.
/// Not recommended for use on non-formatting related classes.
///
/// The source formatting object
/// The destination formatting object
public static void copyFormatting(Object source, Object dest) throws Exception
{
    if (source.getClass() != dest.getClass())
        throw new Exception("All objects must be of the same type");
    Method methodlist[] = source.getClass().getDeclaredMethods();
    // Iterate through each property in the source object.
    for (Method prop: methodlist)
    {
        // Continue processing only if the method starts with 'get'.
        if (!prop.getName().startsWith("get"))
            continue;
        // Skip indexed access items. Skip setting the internals of a style as these should not be changed.
        if (prop.getName() == "getItem" || prop.getName() == "getStyle" ||
                prop.getName() == "getStyleName" || prop.getName() == "getStyleIdentifier")
            continue;
        Object value;

        // Wrap this call as it can throw an exception. Skip if thrown
        try
        {
            // Get value by invoking getter method.
            value = prop.invoke(source);
        }
        catch (Exception e)
        {
            continue;
        }
        // Get the corresponding setter method.
        Method setter = null;
        try
        {
            setter = source.getClass().getDeclaredMethod(prop.getName().replace("get", "set"), prop.getReturnType());
        }
        catch (Exception e)
        {
            // do nothing if throws.
        }
        // Skip if value can not be retrieved.
        if (value != null)
        {
            // If this property returns a class which belongs to the
            if (prop.getReturnType().getPackage() != null && prop.getReturnType().getPackage().getName().equals("com.aspose.words"))
            {
                // Recurse into this class.
                copyFormatting(prop.invoke(source), prop.invoke(dest));
            }
            else if (setter != null)
            {
                // If we can write to this property then copy the value across.
                setter.invoke(dest, prop.invoke(source));
            }
        }
    }
}

Sorry, I am not strong in ruby so code is in java.
However, such approach is not ideal because there is a limit of styles in MS Word documents. So if your input document will contain a lot of runs, you will run out the limit and you will not be able to save the document. Of course, you can search for runs with the same formatting and create a common style for them, but in this case your logic will be very very complicated.
Hope my code and explanation will help you to achieve what you need.
Best regards,

Hi,

you wrote:
Let’s see what you are doing in your
code: you loop through all runs and copy inline formation to the style applied
to each run. However, the same style can be applied to multiple runs, so inline
formation of each next run overrides formation copied from the previous run. Am
I right?
What happens is somewhat more complicated.
I cycle through the runs.

For each run:

I check if the bold-flag is checked or not in the inline information of the run.
If it is checked I generate a ‘signature’, meaning a Information about the corresponding paragraph style: does this check result in bold text or not bold text (that bold is checked only means the text is formatted differently from the character/paragraph style conglomerate format).
I generate a new character style which has set the bold-flag and has a name containing ‘bold’ or ‘non-bold’ according to the signature unless exactly such a style is already created (this way I keep the number of styles at a minimum).

But the styles I created seem to work differently from the styles which are created by Word itself.

I concur the problem could stem from the missing, i.e. not set, baseStyleName property, but nobody could tell me what the effect of setting this property exactly is.

I can imagine the solution is one of the following:
a) setting the BaseStyleName to the ‘standard’ style
or
b) setting the BaseStyleName to the paragraph-style of the parent paragraph of the run.

I’ll tell the results as soon as I found the time to test both cases.

Best regards,

Martin Hillert

Hi
Thank you for additional information. Have you tried using the approach I suggested in my previous answer. I tested the code on my side and the output document look exactly the same as input document, but the output does not contain any inline formation. Just like you needed.
When you set base style, the created style simply inherits formation from the base style. In the code I suggested, I used the style applied to the current run as a base style for the newly create style. This was done in order not to lose formation applied by the old style when I reset the style to the newly created styles. The newly created style stands for formatting that was inline.
Best regards,

Hi,

I worked the tests I mentioned above.

The results lead to information which seem to confirm my suspicion that the function of aspose generated styles is somewhat strange.

I testet variant (a) (see my last post).
The result looks exactly the same as Testabsatz.doc.doc.
Then I tried something new:
I changed the aspose-created styles using Word in the following manner:

  • Format Inspector, right click on Style, change style (I tried this in a german version of word and translated the actions i did, so the names of the actions might differ)
  • In the upcoming dialogue I first unchecked and then rechecked the bold-button.
    Afterwards the rendering looks different. (!)
    That the rendering still does not look exactly as in Testabsatz.doc is due to a bug in my script.

If you can reproduce this behaviour, this is the proof that aspose-created character styles lead to different rendering behaviour of word than word-created character styles.

Please confirm the reproducability of this bug.

Hi
Thank you for additional information. But I still do not think there is a bug. Have you tested with the code I provided? This code produces output that look exactly the same as your input documents. I think there is something wrong in your code, but since differences in translation from java to ruby and vice versa, I cannot find out what exactly is wrong.
In the code I provided, I create a character style for each run in the document, the style applied to the run before my changes is used as a base style for the newly created style. This approach works just like you need I suppose.
Best regards,