Performance metrics with large Excel files
GemBox.Spreadsheet is an Excel component that follows .NET design guidelines and best practices. It represents Excel files in memory through its rich content model that contains sheets, rows, cells, styles, etc. It has optimized memory consumption and allocation while not jeopardizing the efficiency and speed of the execution.
The following example shows how you can use BenchmarkDotNet to track the performance of GemBox.Spreadsheet using the provided input Excel file with 25 sheets of various cell data. The file should cover any typical Excel requirements; it includes different elements (like images, charts, and pivot tables) and Excel features (like conditional formatting, data validation, and filtering).
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Engines;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
using GemBox.Spreadsheet;
using System.Collections.Generic;
using System.IO;
[SimpleJob(RuntimeMoniker.Net80)]
[SimpleJob(RuntimeMoniker.Net48)]
public class Program
{
private ExcelFile workbook;
private readonly Consumer consumer = new Consumer();
public static void Main()
{
BenchmarkRunner.Run<Program>();
}
[GlobalSetup]
public void SetLicense()
{
// If using the Professional version, put your serial key below.
SpreadsheetInfo.SetLicense("FREE-LIMITED-KEY");
// If using Free version and example exceeds its limitations, use Trial or Time Limited version:
// https://www.gemboxsoftware.com/spreadsheet/examples/free-trial-professional/1001
this.workbook = ExcelFile.Load("%#RandomSheets.xlsx%");
}
[Benchmark]
public ExcelFile Reading()
{
return ExcelFile.Load("%#RandomSheets.xlsx%");
}
[Benchmark]
public void Writing()
{
using (var stream = new MemoryStream())
this.workbook.Save(stream, new XlsxSaveOptions());
}
[Benchmark]
public void Iterating()
{
this.LoopThroughAllCells().Consume(this.consumer);
}
public IEnumerable<object> LoopThroughAllCells()
{
foreach (var worksheet in this.workbook.Worksheets)
foreach (var row in worksheet.Rows)
foreach (var cell in row.AllocatedCells)
yield return cell.Value;
}
}
Imports BenchmarkDotNet.Attributes
Imports BenchmarkDotNet.Engines
Imports BenchmarkDotNet.Jobs
Imports BenchmarkDotNet.Running
Imports GemBox.Spreadsheet
Imports System.Collections.Generic
Imports System.IO
<SimpleJob(RuntimeMoniker.Net80)>
<SimpleJob(RuntimeMoniker.Net48)>
Public Class Program
Private workbook As ExcelFile
Private ReadOnly consumer As Consumer = New Consumer()
Public Shared Sub Main()
BenchmarkRunner.Run(Of Program)()
End Sub
<GlobalSetup>
Public Sub SetLicense()
' If using the Professional version, put your serial key below.
SpreadsheetInfo.SetLicense("FREE-LIMITED-KEY")
' If using Free version and example exceeds its limitations, use Trial or Time Limited version:
' https://www.gemboxsoftware.com/spreadsheet/examples/free-trial-professional/1001
Me.workbook = ExcelFile.Load("%#RandomSheets.xlsx%")
End Sub
<Benchmark>
Public Function Reading() As ExcelFile
Return ExcelFile.Load("%#RandomSheets.xlsx%")
End Function
<Benchmark>
Public Sub Writing()
Using stream = New MemoryStream()
Me.workbook.Save(stream, New XlsxSaveOptions())
End Using
End Sub
<Benchmark>
Public Sub Iterating()
Me.LoopThroughAllCells().Consume(Me.consumer)
End Sub
Public Iterator Function LoopThroughAllCells() As IEnumerable(Of Object)
For Each worksheet In Me.workbook.Worksheets
For Each row In worksheet.Rows
For Each cell In row.AllocatedCells
Yield cell.Value
Next
Next
Next
End Function
End Class
Benchmarks for 1,000,000 Excel rows
The more cells you have, the more memory you'll need. The amount of data you can handle depends on a few factors, like the machine's available memory, the application's architecture (32-bit or 64-bit), the targeted .NET platform (.NET Core or .NET Framework), etc.
The following benchmark charts provide the results of working with Excel files with up to 1 million rows. They show a steady and linear increase in both time and memory with an increased number of rows. For more information, see the resulting performance measurements in the 1_Million_Rows_Performance.xlsx file.
Streamlined loading
When loading a file, GemBox.Spreadsheet by default loads everything into a content model. When the loaded file is huge, the whole model can occupy a lot of memory, or it might not even fit.
To overcome this problem, you can use streamlined loading, which doesn't load the whole spreadsheet into the model but loads a set of rows only when requested. After the new set of rows is requested, the old one is cleared from the memory. This approach significantly decreases memory consumption, even when iterating through the whole file.
The following example shows how you can load the file in streamlined mode and obtain the values using the same API as you would normally.
using GemBox.Spreadsheet;
class Program
{
static void Main()
{
// If using the Professional version, put your serial key below.
SpreadsheetInfo.SetLicense("FREE-LIMITED-KEY");
// Load the workbook using the streaming mode.
var workbook = ExcelFile.Load("huge.xlsx", new XlsxLoadOptions()
{
StreamingMode = LoadStreamingMode.Read
});
// Get values from the workbook as you normally would.
var worksheet = workbook.Worksheets[0];
foreach (var row in worksheet.Rows)
foreach (var cell in row.AllocatedCells)
{
var value = cell.Value;
// Process the value...
}
}
}
Imports GemBox.Spreadsheet
Module Program
Sub Main()
' If using the Professional version, put your serial key below.
SpreadsheetInfo.SetLicense("FREE-LIMITED-KEY")
' Load the workbook using the streaming mode.
Dim workbook = ExcelFile.Load("huge.xlsx", New XlsxLoadOptions() With {
.StreamingMode = LoadStreamingMode.Read
})
' Get values from the workbook as you normally would.
Dim worksheet = workbook.Worksheets(0)
For Each row In worksheet.Rows
For Each cell In row.AllocatedCells
Dim value = cell.Value
' Process the value...
Next
Next
End Sub
End Module
The following chart shows the timeline of memory consumption when loading a very large (1 million rows, each having 50 cells, total size 50MB) and comparing the file in the default and the streaming mode.
Even though the streaming mode benefits from lower memory consumption, the limitation is that the file will load in a read-only mode. Modifying rows, column collection, and cells (values, styles, formula) is not generally supported and might have unpredictable behavior.
Tips for improving performance
Besides using the streaming mode, here are other recommendations for improving performance while developing with GemBox.Spreadsheet:
- When reading large Excel files, the most efficient way to retrieve the data is by accessing it in sequence, from the first
ExcelRow
in a sheet to the last, and from the firstExcelCell
in a row to the last. Avoid accessing cells randomly in workbooks with large data sets. - When writing large Excel files with a huge amount of numbers and dates, use one of the
ExcelCell.SetValue
overload methods rather than theExcelCell.Value
property to avoid boxing and unboxing value types. - Typically, sheets have the same number of columns in each row. In that case, to retrieve the columns count, use the
Count
value of the firstExcelRow.AllocatedCells
rather than theExcelWorksheet.CalculateMaxUsedColumns
method. - When auto fitting columns, you can speed up the operation by reducing the number of cells measured with the
ExcelColumn.AutoFit(Double, ExcelRow, ExcelRow)
method. You can also use the first parameter, the scaling factor, to adjust the calculated column width. - When inserting or removing rows or columns, you can speed up the operation by disabling the
ExcelFile.AutomaticFormulaUpdate
option.