| You can extract text from presentation using Aspose.Slides for .NET . The text inside a presentation is located in two types of objects namely TextHolder and TextFrame. TextHolder is a special Placeholder that holds text and TextFrame is actually a TextBox which you create using MS-PowerPoint.
Aspose.Slides for .NET has designed the layout of TextHolder and TextFrame almost identical so that programmers could access and manipulate text easily. Both objects have Paragraphs collection i.e TextHolder.Paragraphs and TextFrame.Paragraphs . Paragraph is actually a line of text which ends with a carriage return and line feed character. It means, whenever you want to add a new line of text you will need a new paragraph. Paragraph is subdivided into number of portions which can be access with Portions collection i.e Paragraph.Portions . Portion is a set of characters inside a paragraph which have unique formatting. So if you want to make text bold, underline, italic or change its height or color, you will need Portion object. |
Extracting presentation text
Below are the code examples that iterate all the slides and display their text on console.
Presentation pres = new Presentation("c:\\source.ppt"); //iterate all slides int lastSlidePosition = pres.Slides.LastSlidePosition; for (int pos = 1; pos <= lastSlidePosition; pos++) { Slide sld = pres.GetSlideByPosition(pos); //iterate all shapes int shapesCount = sld.Shapes.Count; for (int shpIdx = 0; shpIdx < shapesCount; shpIdx++) { Shape shp = sld.Shapes[shpIdx]; //Get the paragraphs from textholder or textframe ParagraphCollection paras = null; //Check if shape holds a textholder if (shp.Placeholder != null && shp.IsTextHolder == true) { TextHolder thld = (TextHolder)shp.Placeholder; paras = thld.Paragraphs; } else { if (shp.TextFrame != null) { paras = shp.TextFrame.Paragraphs; }//if }//else //Print the text on Console if (paras != null) { int parasCount = paras.Count; for (int paraIdx = 0; paraIdx < parasCount; paraIdx++) { Paragraph para = paras[paraIdx]; //print the text on console Console.WriteLine(para.Text); } }//end if }//end for }//end for
Dim pres As Presentation pres = New Presentation("c:\source.ppt") 'iterate all slides Dim lastSlidePosition As Integer lastSlidePosition = pres.Slides.LastSlidePosition For pos As Integer = 1 To lastSlidePosition Dim sld As Slide sld = pres.GetSlideByPosition(pos) 'iterate all shapes Dim shapesCount As Integer shapesCount = sld.Shapes.Count For shpIdx As Integer = 0 To shapesCount - 1 Dim shp As Shape shp = sld.Shapes(shpIdx) 'Get the paragraphs from textholder or textframe Dim paras As ParagraphCollection paras = Nothing 'Check if shape holds a textholder If Not shp.Placeholder Is Nothing And shp.IsTextHolder = True Then Dim thld As TextHolder thld = CType(shp.Placeholder, TextHolder) paras = thld.Paragraphs Else If Not shp.TextFrame Is Nothing Then paras = shp.TextFrame.Paragraphs End If End If If Not paras Is Nothing Then Dim parasCount As Integer parasCount = paras.Count For paraIdx As Integer = 0 To parasCount - 1 Dim para As Paragraph para = paras(paraIdx) Console.WriteLine(para.Text) Next End If ' if not para Next Next
