Extract entire text from a Presentation

Skip to end of metadata
Go to start of metadata
You can extract text from presentation using Aspose.Slides for .NET . The text inside a presentation is located in two types of objects namely TextHolder and TextFrame. TextHolder is a special Placeholder that holds text and TextFrame is actually a TextBox which you create using MS-PowerPoint.

Aspose.Slides for .NET has designed the layout of TextHolder and TextFrame almost identical so that programmers could access and manipulate text easily. Both objects have Paragraphs collection i.e TextHolder.Paragraphs and TextFrame.Paragraphs .

Paragraph is actually a line of text which ends with a carriage return and line feed character. It means, whenever you want to add a new line of text you will need a new paragraph.

Paragraph is subdivided into number of portions which can be access with Portions collection i.e Paragraph.Portions . Portion is a set of characters inside a paragraph which have unique formatting. So if you want to make text bold, underline, italic or change its height or color, you will need Portion object.

Extracting presentation text

Below are the code examples that iterate all the slides and display their text on console.

[C#]
Presentation pres = new Presentation("c:\\source.ppt");

//iterate all slides
int lastSlidePosition = pres.Slides.LastSlidePosition;

for (int pos = 1; pos <= lastSlidePosition; pos++)
{
    Slide sld = pres.GetSlideByPosition(pos);

    //iterate all shapes
    int shapesCount = sld.Shapes.Count;

    for (int shpIdx = 0; shpIdx < shapesCount; shpIdx++)
    {
        Shape shp = sld.Shapes[shpIdx];

        //Get the paragraphs from textholder or textframe
        ParagraphCollection paras = null;

        //Check if shape holds a textholder
        if (shp.Placeholder != null && shp.IsTextHolder == true)
        {
            TextHolder thld = (TextHolder)shp.Placeholder;
            paras = thld.Paragraphs;
        }
        else
        {
            if (shp.TextFrame != null)
            {
                paras = shp.TextFrame.Paragraphs;
            }//if
        }//else

        //Print the text on Console
        if (paras != null)
        {
            int parasCount = paras.Count;

            for (int paraIdx = 0; paraIdx < parasCount; paraIdx++)
            {
                Paragraph para = paras[paraIdx];

                //print the text on console
                Console.WriteLine(para.Text);
            }
        }//end if
    }//end for
}//end for

 

[Visual Basic]
Dim pres As Presentation
pres = New Presentation("c:\source.ppt")

'iterate all slides
Dim lastSlidePosition As Integer
lastSlidePosition = pres.Slides.LastSlidePosition

For pos As Integer = 1 To lastSlidePosition

    Dim sld As Slide
    sld = pres.GetSlideByPosition(pos)

    'iterate all shapes
    Dim shapesCount As Integer
    shapesCount = sld.Shapes.Count

    For shpIdx As Integer = 0 To shapesCount - 1
        Dim shp As Shape
        shp = sld.Shapes(shpIdx)

        'Get the paragraphs from textholder or textframe
        Dim paras As ParagraphCollection
        paras = Nothing

        'Check if shape holds a textholder
        If Not shp.Placeholder Is Nothing And shp.IsTextHolder = True Then
            Dim thld As TextHolder
            thld = CType(shp.Placeholder, TextHolder)
            paras = thld.Paragraphs
        Else
            If Not shp.TextFrame Is Nothing Then
                paras = shp.TextFrame.Paragraphs
            End If
        End If

        If Not paras Is Nothing Then

            Dim parasCount As Integer
            parasCount = paras.Count

            For paraIdx As Integer = 0 To parasCount - 1
                Dim para As Paragraph

                para = paras(paraIdx)
                Console.WriteLine(para.Text)
            Next
        End If ' if not para
    Next
Next

 

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.