Performance metrics with large Word files in C# and VB.NET
GemBox.Document is a Word component that follows .NET design guidelines and best practices. It represents Word files in-memory through its rich content model that contains sections, blocks, inlines, drawings, etc. It has optimized memory consumption, allocation, while not jeopardizing the efficiency and speed of the execution.
The following example shows how you can use BenchmarkDotNet to track the performance of GemBox.Document using the provided input Word file with 15 sections of various content. The file should cover any typical Word requirements; it includes different kinds of elements (like images, shapes, and tables) and Word features (like bookmarks, comments, and footnotes).
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Engines;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
using GemBox.Document;
using System.Collections.Generic;
using System.IO;
[SimpleJob(RuntimeMoniker.Net80)]
[SimpleJob(RuntimeMoniker.Net48)]
public class Program
{
private DocumentModel document;
private readonly Consumer consumer = new Consumer();
public static void Main()
{
BenchmarkRunner.Run<Program>();
}
[GlobalSetup]
public void SetLicense()
{
// If using the Professional version, put your serial key below.
ComponentInfo.SetLicense("FREE-LIMITED-KEY");
// If using Free version and example exceeds its limitations, use Trial or Time Limited version:
// https://www.gemboxsoftware.com/document/examples/free-trial-professional/1301
this.document = DocumentModel.Load("%#RandomSections.docx%");
}
[Benchmark]
public DocumentModel Reading()
{
return DocumentModel.Load("%#RandomSections.docx%");
}
[Benchmark]
public void Writing()
{
using (var stream = new MemoryStream())
this.document.Save(stream, new DocxSaveOptions());
}
[Benchmark]
public void Iterating()
{
this.LoopThroughAllElements().Consume(this.consumer);
}
public IEnumerable<Element> LoopThroughAllElements()
{
return this.document.GetChildElements(true);
}
}
Imports BenchmarkDotNet.Attributes
Imports BenchmarkDotNet.Engines
Imports BenchmarkDotNet.Jobs
Imports BenchmarkDotNet.Running
Imports GemBox.Document
Imports System.Collections.Generic
Imports System.IO
<SimpleJob(RuntimeMoniker.Net80)>
<SimpleJob(RuntimeMoniker.Net48)>
Public Class Program
Private document As DocumentModel
Private ReadOnly consumer As Consumer = New Consumer()
Public Shared Sub Main()
BenchmarkRunner.Run(Of Program)()
End Sub
<GlobalSetup>
Public Sub SetLicense()
' If using the Professional version, put your serial key below.
ComponentInfo.SetLicense("FREE-LIMITED-KEY")
' If using Free version and example exceeds its limitations, use Trial or Time Limited version:
' https://www.gemboxsoftware.com/document/examples/free-trial-professional/1301
Me.document = DocumentModel.Load("%#RandomSections.docx%")
End Sub
<Benchmark>
Public Function Reading() As DocumentModel
Return DocumentModel.Load("%#RandomSections.docx%")
End Function
<Benchmark>
Public Sub Writing()
Using stream = New MemoryStream()
Me.document.Save(stream, New DocxSaveOptions())
End Using
End Sub
<Benchmark>
Public Sub Iterating()
Me.LoopThroughAllElements().Consume(Me.consumer)
End Sub
Public Function LoopThroughAllElements() As IEnumerable(Of Element)
Return Me.document.GetChildElements(True)
End Function
End Class
Benchmarks for 10,000 Word pages
The more content you have, the more memory you'll need. The amount of content you can handle depends on a few factors, like the machine's available memory, the application's architecture (32-bit or 64-bit), the targeted .NET platform (.NET Core or .NET Framework), etc.
The following benchmark charts provide the results of working with Word files with up to 10 thousand pages. They show a steady and linear increase in both time and memory with an increased number of pages. For more information, see the resulting performance measurements in the 10_Thousand_Pages_Performance.xlsx file.
Tips for improving performance
The following are some recommendations for improving performance while developing with GemBox.Document:
- When saving the same document to multiple files of fixed document types (PDF, XPS, and image formats), use
DocumentModelPaginator.Save
methods rather thanDocumentModel.Save
methods. - When styling the content, consider using the document's default style (
DocumentModel.DefaultCharacterFormat
andDocumentModel.DefaultParagraphFormat
). Also, prefer to reuse the named styles from theDocumentModel.Styles
collection rather than setting a direct formatting on the document elements.