Highlight word in PDF is not working with our sample file

kannank22 · December 30, 2013, 8:57am

Hi Support,

We have tried to highlight the search keyword (example: “commission”) in the PDF by using below your sample code, but it shadow.hide the text and show only color on that are,.

Please refer input file “sample_INPUT” and output file “sample_OUTPUT”.

We would be happy to proceed this product if you could solve this issue and provide solutions.

Thanks & Regards,
Selvam.R

Code:

public static void Main(string[] args)
{
// The path to the documents directory.
string dataDir = Path.GetFullPath("…/…/…/Data/");

//open document
Document pdfDocument = new Document(dataDir + “sample_page_4.pdf”);
//create TextAbsorber object to find all instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(“commission”);
//accept the absorber for all the pages
for (int i = 1; i <= pdfDocument.Pages.Count; i++)
{
pdfDocument.Pages[i].Accept(textFragmentAbsorber);

//get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

//PdfContentEditor editor = new PdfContentEditor();
//editor.BindPdf(dataDir + “input.pdf”);
//editor.CreateMarkup(new System.Drawing.Rectangle(87, 513, 90, 10), “”, 0, 1, System.Drawing.Color.Yellow);
//editor.Save(dataDir + “output1.pdf”);

//loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
//update text and other properties
//textFragment.
//textFragment.Text = “TEXT”;
//textFragment.TextState.Font = FontRepository.FindFont(“Verdana”);
//textFragment.TextState.FontSize = 22;
//textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Blue);
textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Yellow);
//textFragment.TextState.Underline = true;
}
}

// Save resulting PDF document.
pdfDocument.Save(dataDir + “sampleoutput2.pdf”);

// Let user know about the outcome of the processing.
System.Console.WriteLine(“Text replaced successfully!”);
}

tilal.ahmad · December 30, 2013, 9:47am

Hi Selvam,

We are sorry for the inconvenience caused. While testing the scenario with the latest version of Aspose.Pdf for NET 8.7.0, We have managed to reproduce the reported issue and logged it in our bug tracking system as PDFNEWNET-36215 for further investigation and resolution. We will notify you via this thread as soon as it is resolved.

Please feel free to contact us for any further assistance.

Best Regards,

kannank22 · December 30, 2013, 9:51am

fine, i expect the output today itself is it possible?

codewarior · December 30, 2013, 12:16pm

Hi Selvam,

As we just have noticed this problem, so development team requires little time to investigate and figure out the actual reasons of this issue. As soon as we have made some significant progress towards it’s resolution, we would be more than happy to update you with the status of correction. Please be patient and spare us little time.

krselvam2000 · December 30, 2013, 10:11pm

Dear Support,

This is urgent requirement and our client does not give more time.
We have only 2 days to release the application.

We would be happy if you could provide solution as much earliest possible.

"Happy New Year"

Regards,
Selvam.R

codewarior · December 31, 2013, 12:26am

Hi Selvam,

As a workaround, you may consider using CreateMarkup(…) method of PdfContentEditor class. In order to get the coordinates of text, you can use TextFragmentAbsorber class and in order to highlight the text, you can use CreateMarkup(…) method. Please take a look over the resultant PDF generated with following code snippet.

[C#]

double X_Cord = 0;<o:p></o:p>

double Y_Cord = 0;

//open document

Document pdfDocument = new Document("c:/pdftest/sample_INPUT.pdf");

//create TextAbsorber object to find all instances of the input search phrase

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("commission");

//accept the absorber for all the pages

for (int i = 1; i <= pdfDocument.Pages.Count; i++)

{

pdfDocument.Pages[i].Accept(textFragmentAbsorber);

//get the extracted text fragments

TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

//loop through the fragments

foreach (TextFragment textFragment in textFragmentCollection)

{

X_Cord = textFragment.Position.XIndent;

Y_Cord = textFragment.Position.YIndent;

// textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Yellow);

}

PdfContentEditor editor = new PdfContentEditor();

editor.BindPdf("c:/pdftest/sample_INPUT.pdf");

editor.CreateMarkup(new System.Drawing.Rectangle((int)X_Cord, (int)Y_Cord, 90, 10), "", 0, 1, System.Drawing.Color.Yellow);

editor.Save(“c:/pdftest/Highlighted_sample_INPUT.pdf”);

kannank22 · December 31, 2013, 1:02am

Hi Nayyer Shahbaz,

but the code is not highlighting while i run in my machine.sad :-(

krselvam2000 · December 31, 2013, 2:15am

Hi Support,

It is not working for us.
Could you please try for us to highlight the text by using our input file ?

Regards,
Selvam.R

kannank22 · December 31, 2013, 8:35am

Dear Support,

We have used your code with small changes, highlight is not working.

I request you to look at below our code and correct us if we did wrong on that code.

We would be happy if you could provide solution by today.

"Happy New Year"

Regards,

Selvam.R

public static void Main(string[] args)

{

double X_Cord = 0;

double Y_Cord = 0;

//open document

Document pdfDocument = new Document(dataDir + “sample_INPUT.pdf”);

//create TextAbsorber object to find all instances of the input search phrase

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(“commission”);

PdfContentEditor editor = new PdfContentEditor();

editor.BindPdf(dataDir + “sample_INPUT.pdf”);

//accept the absorber for all the pages

for (int i = 1; i <= pdfDocument.Pages.Count; i++)

{

pdfDocument.Pages[i].Accept(textFragmentAbsorber);

//get the extracted text fragments

TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

//loop through the fragments

foreach (TextFragment textFragment in textFragmentCollection)

{

X_Cord = textFragment.Position.XIndent;

Y_Cord = textFragment.Position.YIndent;

// textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Yellow);

editor.CreateMarkup(new System.Drawing.Rectangle((int)X_Cord, (int)Y_Cord, 90, 10), “”, 0, 1, System.Drawing.Color.Yellow);

}

editor.Save(dataDir + “sampleoutput3.pdf”);

System.Console.WriteLine(“Text highlighted successfully!”);

}

codewarior · December 31, 2013, 12:52pm

Hi Selvam,

I have again tested the scenario using Aspose.Pdf for .NET 8.7.0 where I have used the file sample_INPUT.pdf shared in your first post (516716) and as per my observations, the text is properly being highlighted. Please note that I have used Visual Studio 2010 application running over Windows 7 (x64) where I have set the target platform of my application as .NET Framework 4.0. For testing purposes, I have used the exact code as shared in your above post.

We are sorry for your inconvenience.

kannank22 · January 2, 2014, 1:02am

Hi Nayyar,

I installed Aspose.Pdf for .NET 8.7.0 and I did with visual studio framework 4.0 and i gave reference with the Aspose.Pdf.dll that is residing the net4.0_ClientProfile folder.

i worked with the below code but it doesn’t highlighted.

Code:

public static void Main(string[] args)

{

string dataDir = Path.GetFullPath(“…/…/…/Data/”);

double X_Cord = 0;

double Y_Cord = 0;

//open document

Document pdfDocument = new Document(dataDir + “sample_INPUT.pdf”);

//create TextAbsorber object to find all instances of the input search phrase

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(“commission”);

//accept the absorber for all the pages

for (int i = 1; i <= pdfDocument.Pages.Count; i++)

{

pdfDocument.Pages[i].Accept(textFragmentAbsorber);

//get the extracted text fragments

TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

//loop through the fragments

foreach (TextFragment textFragment in textFragmentCollection)

{

X_Cord = textFragment.Position.XIndent;

Y_Cord = textFragment.Position.YIndent;

// textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Yellow);

}

PdfContentEditor editor = new PdfContentEditor();

editor.BindPdf(dataDir + “sample_INPUT.pdf”);

editor.CreateMarkup(new System.Drawing.Rectangle((int)X_Cord, (int)Y_Cord, 90, 10), “”, 0, 1, System.Drawing.Color.Yellow);

editor.Save(dataDir + “sampleoutput4.pdf”);

System.Console.WriteLine(“Text highlighted successfully!”);

}

tilal.ahmad · January 2, 2014, 1:51am

Hi Selvam,

Thanks for your feedback. We have further investigated the issue and found that if we don’t implement the license then text is not being highlighted. So can you please double check the license implementation in your code? Hopefully applying a valid license will fix the issue. Please find enclosed sample output. If issue persist with your license file then please share your license file as suggested here for further investigation.

We are sorry for the inconvenience caused.

Best Regards,

krselvam2000 · January 2, 2014, 3:55am

Dear Support,

Thanks.

I hope you know that, we have not yet purchased “Aspose PDF” product.
Once we tested this functionality and working fine then only We could ask our client to purchase this product and use it in our application.

Please let us know how to get the license details for trial version.

This is fourth day we are chating and trying to get solution to proceed further.

We would be happy if you could provide solution in single shot.

If you need any information, please let me know b krselvam2000@gmail.com or 09962082051

Thanks,
Selvam.R

codewarior · January 2, 2014, 7:30am

Hi Selvam,

Please visit the following link for required information on Get a temporary license. In the event of any further query, please feel free to contact.

asad.ali · December 17, 2017, 4:21pm

@kannank22

Thanks for your patience.

We have investigated the issue PDFNET-36215 and found that source document came from scanning and optical text recognition (OCR). It contains original image and invisible text.
Even so text is invisible you can use TextFragmentAbsorber to find it. But changes of invisible text appearance (foreground color) will not be rendered until the text becomes visible. Earlier making text visible required complicated working with operators. But TextState.Invisible property has became available with DOM approach.

Please consider the following code snippet:

//open document
Document pdfDocument11 = new Document(myDir + "sample_INPUT.pdf");

//create TextAbsorber object to find all instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("commission");

//accept the absorber for all the pages
for (int i = 1; i <= pdfDocument11.Pages.Count; i++)
{
    pdfDocument11.Pages[i].Accept(textFragmentAbsorber);

    //get the extracted text fragments
    TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

    //loop through the fragments
    foreach (TextFragment textFragment in textFragmentCollection)
    {
        //update text and other properties
        textFragment.TextState.Invisible = false;

        //textFragment.Text = "TEXT";
        textFragment.TextState.Font = FontRepository.FindFont("Verdana");
        textFragment.TextState.FontSize = 9;
        textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Blue);
        textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Yellow);
        //textFragment.TextState.Underline = true;
    }
}

// Save resulting PDF document.
pdfDocument11.Save(myDir + "36215_out.pdf");

Please try using the latest release version Aspose.Pdf for .NET 17.12 and in case you face any issue, please feel free to contact us.

swapnilmax2000 · October 7, 2018, 6:11pm

I am using below code to search text and highlight it.
It is working but the time taken is very huge.
My document has 25 pages and it has 25 instance of search text , 1 search text in each page.
It take 2 minutes which is unacceptable.

//open document
[Document pdfDocument11 = new Document(@“C:\TestArea\Destination\SUP000011\ATM-1B4L2KQ0ZE0-0001\OpenAML.pdf”);

        //create TextAbsorber object to find all instances of the input search phrase
        TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("GAITS");

        //accept the absorber for all the pages
        for (int i = 1; i <= pdfDocument11.Pages.Count; i++)
        {
            pdfDocument11.Pages[i].Accept(textFragmentAbsorber);

            //get the extracted text fragments
            TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

            //loop through the fragments
            foreach (TextFragment textFragment in textFragmentCollection)
            {
                //update text and other properties
               // textFragment.TextState.Invisible = false;

                //textFragment.Text = "TEXT";
                textFragment.TextState.Font = FontRepository.FindFont("Verdana");
                textFragment.TextState.FontSize = 9;
                textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Blue);
                textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Yellow);
                //textFragment.TextState.Underline = true;
            }
        }

// Save resulting PDF document.
pdfDocument11.Save(@“C:\TestArea\Destination\SUP000011\ATM-1B4L2KQ0ZE0-0001\Highlightdoc.pdf”);

asad.ali · October 7, 2018, 9:56pm

@swapnilmax2000

Thanks for contacting support.

Would you please provide us your sample PDF document, so that we can test the scenario in our environment and address it accordingly.

swapnilmax2000 · October 8, 2018, 2:38am

I cannot upload that pdf as it contains confidential data. Can you please let me know same like pdf , can this functionality(search and highlight) be done for ppt, doc, xls and other documents ?

asad.ali · October 8, 2018, 12:10pm

@swapnilmax2000

Thanks for getting back to us.

Please note that we need sample PDF document in order to observe the same issue in our environment which you are facing. In case your document is private, you may please share it in a private message private-message.png (12.7 KB) as this way it will only be accessible by Aspose Staff. We will definitely test the scenario in our environment and address it accordingly.

You can surely achieve required functionality for other file formats using Aspose APIs. For DOC/DOCX file format(s), you may use Aspose.Words to search and highlight text. In order to search and highlight text in XLS/XLSX documents, you can find cell for the particular text/data and once it is found, you can apply background color to the found cell using Aspose.Cells API.

We are gathering information regarding similar functionality for PPT/PPTX file format(s) and will share with you shortly.

asad.ali · October 8, 2018, 8:16pm

@swapnilmax2000

I am afraid that currently this feature is not present for PPT/PPTX files. However, we have logged a feature request as SLIDESNET-40600 in our issue tracking system. We will surely investigate the feasibility of the feature and let you know as soon as we have some definite updates in this regard. Please be patient and spare us little time.

We are sorry for the inconvenience.