Content Streams And Resources
A PDF content stream contains a sequence of instructions composed of Objects (operands) and keywords (operators) describing the appearance of a page or other graphical entity.
GemBox.Pdf compiles content stream operations (and the associated resources) into a group of content elements, such as text, paths, and external objects (images and forms) for easy inspection and manipulation.
These content elements and other types related to PDF content are implemented in the GemBox.Pdf.Content namespace.
Content Elements
The GemBox.Pdf.Content namespace includes seven types of PDF content element:
- PdfTextContent: a sequence of glyphs from a single face of a single font.
- PdfPathContent: a geometrical item composed of lines and curves.
- PdfImageContent: a rectangular array of sample values, each representing a color.
- PdfFormContent: a self-contained description of any sequence of content elements.
- PdfShadingContent: a smooth transition between colors.
- PdfContentGroup: a group of content elements that are independent of the rest of the surrounding elements.
- PdfContentMark: a mark used to distinguish a section of the PDF content.
The base class for all content elements is PdfContentElement.
All content element types except PdfContentMark inherit PdfVisualContentElement, which enables them to transform the coordinate system and to apply formatting properties (graphics state).
Formatting (graphics state)
GemBox.Pdf abstracts PDF graphics state by grouping its entries into the following types:
- PdfFillFormat: formatting properties that affect the filling of PDF textual or geometrical content.
- PdfStrokeFormat: formatting properties that affect the stroking of PDF textual or geometrical content.
- PdfClipFormat: formatting properties that affect the clipping area.
- PdfTextFormat: formatting properties that affect only the PDF textual content.
- PdfContentFormat: groups all previously defined formatting properties.
Formatting properties are accessible for any content element except PdfContentMark via the Format property.
Usage in GemBox.Pdf
Content elements contained in a PDF page can be obtained via the Content property.
The most important content in a PDF document is text and GemBox.Pdf provides the following functionalities related to text:
- Extraction of Unicode representation of text (if it is provided in the input PDF document) for the entire PdfPage via the Content.ToString() method and for the PdfTextContent element via the ToString method. See the Reading example.
- Bounds of each PdfTextContent element via the Bounds property. See the Reading text info example.
- Font (family, style, weight, stretch, and size) of each PdfTextContent element via the PdfTextContent.Format.Text.Font property. See the Reading text info example.
- Format text via the PdfFormattedText class and draw text via the DrawText method. See the Writing example.
Additionally, GemBox.Pdf provides the following functionalities related to PDF content:
- Export of images from PDF pages via the PdfImageContent.Save(String) methods. See the Export images example.
- Import images to PDF pages via the DrawImage method. See the Import images example.
- Add shapes (paths) to PDF pages via the AddPath method. See the Shapes (Paths) example.
- Add form XObjects to PDF pages via the AddForm method. See the Form XObjects example.
- Add shadings to PDF pages via the AddShading method. See the Shadings example.
- Add content groups to PDF pages via the AddGroup method. See the Content Groups example.
- Add marked content to PDF pages via the AddMarkStart and AddMarkEnd methods. See the Marked Content example.
- Fill, stroke, and clip content using various colors, patterns, and shadings. See the Content Formatting example.
Editing Content
When editing the PdfForm.Content or the PdfTilingPattern.Content, you must call PdfContent.BeginEdit before and PdfContent.EndEdit after editing. Calling PdfContent.BeginEdit and PdfContent.EndEdit is optional when editing the PdfPage.Content but might improve performance in some situations. When PdfContent.EndEdit is called, GemBox.Pdf converts back the PdfContent and all PdfContentElements underneath it to a content stream and the accompanying resource dictionary.
The following example shows how to improve performance when writing text to multiple pages using the same font so that embedded subset of the font is calculated just once and not after editing each page.
using (var document = new PdfDocument())
{
using (var formattedText = new PdfFormattedText())
{
// Set font to TrueType font that will be subset and embedded in the document.
formattedText.Font = new PdfFont("Calibri", 96);
// Draw a single letter on each page.
for (int i = 0; i < 2; ++i)
{
formattedText.Append(((char)('A' + i)).ToString());
var page = document.Pages.Add();
// Begin editing the page content, but don't end it until all pages are edited.
page.Content.BeginEdit();
page.Content.DrawText(formattedText, new PdfPoint(100, 500));
formattedText.Clear();
}
}
// End editing of all pages.
// This will convert back the content of each page to the underlying content stream and the accompanying resource dictionary.
// Subset of the 'Calibri' font, that contains only glyphs for characters 'A' to 'B' will be calculated just once before being
// embedded in the document.
foreach (var page in document.Pages)
page.Content.EndEdit();
document.Save("Content Streams And Resources.pdf");
}
The example creates just two pages because of free version limitations, but the performance boost of using demonstrated explicit editing of PDF pages would be more noticeable when writing different text to many pages using one or more TrueType/OpenType fonts on each page.
See Also
PDF Specification ISO 32000-1:2008, section '7.8 Content Streams and Resources'
PDF Specification ISO 32000-1:2008, section '8 Graphics'
PDF Specification ISO 32000-1:2008, section '9 Text'