Objects
A PDF document is a data structure composed from a small set of basic types of data objects.
These basic types of objects are implemented in GemBox.Pdf.Objects namespace.
Objects
A PDF includes eight basic types of objects (ordered based on complexity):
Additionally, GemBox.Pdf defines one more basic object type:
The null object
The null object is a representation of no value and a singleton whose only instance can be obtained with Null property.
The null object is usually used in PdfArray to specify that the array's element has no value (depending on the context, it usually means that the default value of the array's element should be used). The null object is rarely used in PdfDictionary because specifying the null object as the value of a dictionary entry is equivalent to omitting the entry entirely.
Boolean values
Boolean values represent the logical values of true and false and are singletons whose only two instances can be obtained with False and True properties.
Their value can be obtained with Value property.
Integer and Real numbers
Integer numbers represent mathematical integers. They can be created with PdfInteger.Create(Int32) method, and their value can be obtained with Value property. To minimize memory usage, integer numbers created from an implementation-defined interval around zero will always return the same PdfInteger instance.
Real numbers represent mathematical real numbers. They can be created with PdfNumber.Create(Double) method, and their value can be obtained with Value property. If input value is actually an integer ( casted to is equal to the original ), then the created number will be the PdfInteger instance.
PdfInteger extends PdfNumber so it can be used in any place where PdfNumber can be used.
Names
Name is an atomic symbol uniquely defined by a sequence of any characters (8-bit values) except null (character code 0).
Name can be created with PdfName.Create(String) method, and its value can be obtained with ToString() method.
UTF-8 encoding is used to encode the input value to PdfName and to decode the PdfName to output value.
Names are predominantly used as keys in PdfDictionary.
Strings
String consists of a series of zero or more bytes.
String can be created with:
- PdfString.Create(String) method - in which case PDFDoc encoding is used to encode the input value to PdfString, if all characters from the input value are supported in PDFDoc encoding; otherwise, UTF16BE encoding is used, or
- PdfString.Create(DateTimeOffset) method - in which case ASCII encoding is used to encode the input.
- PdfString.Create(String, IPdfEncoding, PdfStringForm) method - in which case the specified encoding is used to encode the input value to PdfString.
Its value can be obtained with:
- ToString() method - in which case UTF16BE encoding is used to decode the PdfString to output value, if the first two bytes of the PdfString are 254 followed by 255 that represent the Unicode byte order marker U+FEFF; otherwise, PDFDoc encoding is used, if all bytes from the PdfString are supported in PDFDoc encoding; otherwise, Byte encoding is used, or
- PdfString.ToString(IPdfEncoding) method - in which case the specified encoding is used to decode the PdfString to output value.
String can be written in two forms:
- Literal - as a sequence of literal characters enclosed in parentheses, and
- Hexadecimal - as hexadecimal data enclosed in angle brackets < >.
Use PdfString.Create(String, IPdfEncoding, PdfStringForm) method to create a string in the specified form and use Form property to get the form of the string.
Arrays
An array is a one-dimensional collection of objects arranged sequentially. Arrays may be heterogeneous; that is, an array's elements may be any combination of PdfNumbers, PdfStrings, PdfDictionaries or any other PdfBasicObjects, including other PdfArrays.
An array may have zero elements. Only one-dimensional arrays are directly supported. Arrays of higher dimensions can be constructed by using arrays as elements of arrays, nested to any depth.
An array can be created with any of the Create() methods.
PdfArray implements , and all of their descendant interfaces that you can use to work with an array.
Dictionaries
A dictionary is an associative table containing pairs of objects, known as the dictionary's entries. The first element of each entry is the key and the second element is the value. The key is always of type PdfName. The value may be any kind of PdfBasicObject, including another PdfDictionary.
PdfDictionaryEntry whose value is Null is treated the same as if the entry does not exist. The dictionary may have zero entries. The entries in a dictionary are unordered. Multiple entries in the same dictionary cannot have the same key.
The dictionary can be created with any of the Create() methods.
PdfDictionary implements , and all of their descendant interfaces that you can use to work with a dictionary.
Streams
A stream is a (potentially large) sequence of bytes. A stream can specify filters that indicate whether and how the data in the stream should be transformed (decoded) before it is used. All streams are indirect objects, meaning that they can be contained only as a value of Value property.
A stream can be created with Create() method.
A stream's data (either decoded or encoded) can be read or written with PdfStream.Open(PdfStreamDataMode, PdfStreamDataState) method. A stream's extent (the number of bytes of its encoded data) can be obtained with Length property. Stream's data can be encoded (usually compressed) by specifying various PdfFilters in the Filters property. Other stream properties can be obtained from Dictionary property.
GemBox.Pdf takes care to maintain integrity between a stream's data and the Filters required for decoding and encoding the data. If Filters with which the data was encoded are changed before reading the decoded data, will be thrown. Additionally, Length is automatically updated when writing the stream's data.
Note
GemBox.Pdf currently does not directly support streams whose data is contained in an external file. Still, these streams can be created with GemBox.Pdf by using Dictionary property and specifying F and, optionally, FFilter and FDecodeParms entries.
Indirect Objects
Any PdfBasicObject, except PdfIndirectObject, can be contained in a PdfIndirectObject. This gives the object the ability to be contained in multiple places (for example, as an element of a PdfArray and as the value of a PdfDictionary entry) because its container (PdfIndirectObject) has the ability to be contained in multiple places.
If a PdfIndirectObject is read from a PDF file or is written to a PDF file, then it will have a unique object identifier that can be obtained with Id property, and that consists of:
- A positive integer ObjectNumber.
- A non-negative integer GenerationNumber. In a newly created file, all indirect objects shall have generation numbers of 0. Nonzero generation numbers may be introduced when the file is later updated.
Together, the combination of an object number and a generation number uniquely identify an indirect object in a PDF file. If a PdfIndirectObject is not associated with any PDF file, then its Id will be Undefined.
Note
Id property should be used only as a diagnostics and debugging aid because its value might change if the PDF file is closed or when saving a PDF document to a new PDF file. Opposed to PDF Specification ISO 32000-1:2008 that also defines an indirect reference with which the indirect object may be referred to from elsewhere in the file, GemBox.Pdf only defines PdfIndirectObject that serves both as a definition of an indirect object and of an indirect reference. GemBox.Pdf takes care to map each indirect object and indirect reference from a PDF file with the same object identifier to the same PdfIndirectObject instance and vice versa. If a PDF file contains an invalid indirect reference (whose object identifier cannot be located in the PDF file), then indirect reference is mapped to Null instance, if invalid indirect reference is located in a PdfArray; or PdfDictionaryEntry is not added, if it is located as a value of a PdfDictionary entry. This GemBox.Pdf feature makes indirect objects and indirect references easier to work with, and it minimizes memory usage.
An indirect object can be created with any of the Create() methods. Its value can be obtained or set with Value property.
Caution
Value is obtained in a lazy fashion if the PdfIndirectObject is associated with the PDF file. This means that the value will be parsed from the PDF file only when it is requested for the first time. This feature enables GemBox.Pdf to perform fast reading and updating of the PDF file. Set a Value only if you are sure that the PdfIndirectObject is not referenced from any other place. GemBox.Pdf never sets a Value of an existing PdfIndirectObject because the PdfIndirectObject might be referenced from several places. Instead, a new PdfIndirectObject is created, and its Value is set.
Class Hierarchy
Basic object types are leaf nodes of GemBox.Pdf basic object type class hierarchy that is shown in the following picture:
Following subsections describe base classes from the above class hierarchy:
PdfBasicObject
PdfBasicObject is a base class for all basic PDF objects.
It provides ObjectType property for faster testing if a basic object is of the specified type than the as
or is
cast operators and ToString() method that returns a representation of the basic object used primarily for debugging purposes.
PdfBasicValue
PdfBasicValue is a base class for all immutable basic PDF objects.
PdfBasicValue instance is immutable and therefore can be shared (contained in multiple PdfDictionary or PdfArray objects), thus reducing the memory usage.
PdfBasicValue instance is thread-safe.
PdfBasicValue instance implements value equality by requiring all derived types to implement PdfBasicValue.Equals(Object) and GetHashCode() methods.
PdfBasicContainer
PdfBasicContainer is a base class for all mutable basic PDF objects.
PdfBasicContainer instance is mutable and therefore cannot be shared (contained in multiple PdfDictionary or PdfArray objects). The only exception is PdfIndirectObject instance that is also mutable, but can be shared.
PdfBasicContainer instance is not thread-safe.
PdfBasicContainer instance implements reference equality by making its implementations of PdfBasicContainer.Equals(Object) and GetHashCode() methods sealed
.
Additionally, PdfBasicContainer instance might be in a read-only state, meaning that it and all of its descendant PdfBasicContainers cannot be changed anymore. This enables faster implementations of other features, such as maintenance of integrity between PdfStream's data and Filters required for decoding and encoding the data. Use IsReadOnly property to test if an object is in a read-only state.
If you want to change the PdfBasicContainer instance that is in a read-only state or you want to use the similar instance in another location, then use one of the PdfBasicContainer.Clone(Boolean) methods, and use the returned instance for the requested operations.
PdfBasicCollection
PdfBasicCollection is a base class for all basic PDF objects that are collections (PdfArray and PdfDictionary).
It implements interface and is a data source for PDF Document Structure components.
Usage in GemBox.Pdf
To use basic PDF objects in GemBox.Pdf, you must first get the underlying PdfDictionary of a PdfDocument or any other PDF Document Structure component.
This is accomplished by importing GemBox.Pdf.Objects namespace with a statement using GemBox.Pdf.Objects;
for C# or Import GemBox.Pdf.Objects
for VB.NET. This namespace import will expose extensions methods defined in PdfObjectExtensions that can be used on any PDF Document Structure component to get the underlying PdfDictionary or PdfArray.
Here is an example of how to use PdfObjectExtensions to set a conforming product's private data in a page-piece dictionary associated with the document. The example also shows how to use various basic PDF objects, such as PdfName, PdfString, PdfDictionary and PdfIndirectObject. For more information about page-piece dictionaries in PDF, see PDF Specification ISO 32000-1:2008, section '14.5 Page-Piece Dictionaries'.
using System;
using System.Globalization;
using GemBox.Pdf.Objects;
using GemBox.Pdf.Text;
namespace GemBox.Pdf.Examples
{
class ObjectsExample
{
void SetPrivateDataOnDocument(PdfDocument document)
{
// Get document's trailer dictionary.
var trailer = document.GetDictionary();
// Get document catalog dictionary from the trailer.
var catalog = (PdfDictionary)((PdfIndirectObject)trailer[PdfName.Create("Root")]).Value;
// Either retrieve 'PieceInfo' entry value from document catalog or create page-piece dictionary and set it to document catalog under 'PieceInfo' entry.
PdfDictionary pieceInfo;
var pieceInfoKey = PdfName.Create("PieceInfo");
var pieceInfoValue = catalog[pieceInfoKey];
switch (pieceInfoValue.ObjectType)
{
case PdfBasicObjectType.Dictionary:
pieceInfo = (PdfDictionary)pieceInfoValue;
break;
case PdfBasicObjectType.IndirectObject:
pieceInfo = (PdfDictionary)((PdfIndirectObject)pieceInfoValue).Value;
break;
case PdfBasicObjectType.Null:
pieceInfo = PdfDictionary.Create();
catalog[pieceInfoKey] = PdfIndirectObject.Create(pieceInfo);
break;
default:
throw new InvalidOperationException("PieceInfo entry must be dictionary.");
}
// Create page-piece data dictionary for 'GemBox.Pdf' conforming product and set it to page-piece dictionary.
var data = PdfDictionary.Create();
pieceInfo[PdfName.Create("GemBox.Pdf")] = data;
// Create private data dictionary that will hold private data that 'GemBox.Pdf' conforming product understands.
var privateData = PdfDictionary.Create();
data[PdfName.Create("Data")] = privateData;
// Set 'Title' and 'Version' entries to private data.
privateData[PdfName.Create("Title")] = PdfString.Create(ComponentInfo.Title);
privateData[PdfName.Create("Version")] = PdfString.Create(ComponentInfo.Version);
// Specify date of the last modification of 'GemBox.Pdf' private data (required by PDF specification).
data[PdfName.Create("LastModified")] = PdfString.Create(DateTimeOffset.Now);
}
}
}