Performance metrics with large Excel files

GemBox.Spreadsheet is an Excel component that follows .NET design guidelines and best practices. It represents Excel files in memory through its rich content model that contains sheets, rows, cells, styles, etc. It has optimized memory consumption and allocation while not jeopardizing the efficiency and speed of the execution.

The following example shows how you can use BenchmarkDotNet to track the performance of GemBox.Spreadsheet using the provided input Excel file with 25 sheets of various cell data. The file should cover any typical Excel requirements; it includes different elements (like images, charts, and pivot tables) and Excel features (like conditional formatting, data validation, and filtering).

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Engines;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
using GemBox.Spreadsheet;
using System.Collections.Generic;
using System.IO;

[SimpleJob(RuntimeMoniker.Net80)]
[SimpleJob(RuntimeMoniker.Net48)]
public class Program
{
    private ExcelFile workbook;
    private readonly Consumer consumer = new Consumer();

    public static void Main()
    {
        BenchmarkRunner.Run<Program>();
    }

    [GlobalSetup]
    public void SetLicense()
    {
        // If using the Professional version, put your serial key below.
        SpreadsheetInfo.SetLicense("FREE-LIMITED-KEY");

        // If using Free version and example exceeds its limitations, use Trial or Time Limited version:
        // https://www.gemboxsoftware.com/spreadsheet/examples/free-trial-professional/1001

        this.workbook = ExcelFile.Load("%#RandomSheets.xlsx%");
    }

    [Benchmark]
    public ExcelFile Reading()
    {
        return ExcelFile.Load("%#RandomSheets.xlsx%");
    }

    [Benchmark]
    public void Writing()
    {
        using (var stream = new MemoryStream())
            this.workbook.Save(stream, new XlsxSaveOptions());
    }

    [Benchmark]
    public void Iterating()
    {
        this.LoopThroughAllCells().Consume(this.consumer);
    }

    public IEnumerable<object> LoopThroughAllCells()
    {
        foreach (var worksheet in this.workbook.Worksheets)
            foreach (var row in worksheet.Rows)
                foreach (var cell in row.AllocatedCells)
                    yield return cell.Value;
    }
}
Imports BenchmarkDotNet.Attributes
Imports BenchmarkDotNet.Engines
Imports BenchmarkDotNet.Jobs
Imports BenchmarkDotNet.Running
Imports GemBox.Spreadsheet
Imports System.Collections.Generic
Imports System.IO

<SimpleJob(RuntimeMoniker.Net80)>
<SimpleJob(RuntimeMoniker.Net48)>
Public Class Program

    Private workbook As ExcelFile
    Private ReadOnly consumer As Consumer = New Consumer()

    Public Shared Sub Main()
        BenchmarkRunner.Run(Of Program)()
    End Sub

    <GlobalSetup>
    Public Sub SetLicense()
        ' If using the Professional version, put your serial key below.
        SpreadsheetInfo.SetLicense("FREE-LIMITED-KEY")

        ' If using Free version and example exceeds its limitations, use Trial or Time Limited version:
        ' https://www.gemboxsoftware.com/spreadsheet/examples/free-trial-professional/1001

        Me.workbook = ExcelFile.Load("%#RandomSheets.xlsx%")
    End Sub

    <Benchmark>
    Public Function Reading() As ExcelFile
        Return ExcelFile.Load("%#RandomSheets.xlsx%")
    End Function

    <Benchmark>
    Public Sub Writing()
        Using stream = New MemoryStream()
            Me.workbook.Save(stream, New XlsxSaveOptions())
        End Using
    End Sub

    <Benchmark>
    Public Sub Iterating()
        Me.LoopThroughAllCells().Consume(Me.consumer)
    End Sub

    Public Iterator Function LoopThroughAllCells() As IEnumerable(Of Object)
        For Each worksheet In Me.workbook.Worksheets
            For Each row In worksheet.Rows
                For Each cell In row.AllocatedCells
                    Yield cell.Value
                Next
            Next
        Next
    End Function

End Class
Measuring performance of reading, writing, and iterating through Excel files in C# and VB.NET
Screenshot of GemBox.Spreadsheet performance measurements

Benchmarks for 1,000,000 Excel rows

The more cells you have, the more memory you'll need. The amount of data you can handle depends on a few factors, like the machine's available memory, the application's architecture (32-bit or 64-bit), the targeted .NET platform (.NET Core or .NET Framework), etc.

The following benchmark charts provide the results of working with Excel files with up to 1 million rows. They show a steady and linear increase in both time and memory with an increased number of rows. For more information, see the resulting performance measurements in the 1_Million_Rows_Performance.xlsx file.

Benchmark chart of time that's required for reading and writing Excel files with up to 1 million rows
Benchmark chart of elapsed time for 1 million rows
Benchmark chart of memory that's required for creating Excel files with up to 1 million rows
Benchmark chart of allocated memory for 1 million rows

Streamlined loading

When loading a file, GemBox.Spreadsheet by default loads everything into a content model. When the loaded file is huge, the whole model can occupy a lot of memory, or it might not even fit.

To overcome this problem, you can use streamlined loading, which doesn't load the whole spreadsheet into the model but loads a set of rows only when requested. After the new set of rows is requested, the old one is cleared from the memory. This approach significantly decreases memory consumption, even when iterating through the whole file.

The following example shows how you can load the file in streamlined mode and obtain the values using the same API as you would normally.

using GemBox.Spreadsheet;

class Program
{
    static void Main()
    {
        // If using the Professional version, put your serial key below.
        SpreadsheetInfo.SetLicense("FREE-LIMITED-KEY");

        // Load the workbook using the streaming mode.
        var workbook = ExcelFile.Load("huge.xlsx", new XlsxLoadOptions()
        {
            StreamingMode = LoadStreamingMode.Read
        });

        // Get values from the workbook as you normally would.
        var worksheet = workbook.Worksheets[0];
        foreach (var row in worksheet.Rows)
            foreach (var cell in row.AllocatedCells)
            {
                var value = cell.Value;
                // Process the value...
            }
    }
}
Imports GemBox.Spreadsheet

Module Program

    Sub Main()

        ' If using the Professional version, put your serial key below.
        SpreadsheetInfo.SetLicense("FREE-LIMITED-KEY")

        ' Load the workbook using the streaming mode.
        Dim workbook = ExcelFile.Load("huge.xlsx", New XlsxLoadOptions() With {
            .StreamingMode = LoadStreamingMode.Read
        })
        
        ' Get values from the workbook as you normally would.
        Dim worksheet = workbook.Worksheets(0)
        For Each row In worksheet.Rows
            For Each cell In row.AllocatedCells
                Dim value = cell.Value
                ' Process the value...
            Next
        Next

    End Sub

End Module

The following chart shows the timeline of memory consumption when loading a very large (1 million rows, each having 50 cells, total size 50MB) and comparing the file in the default and the streaming mode.

Benchmark chart of memory that's required for loading a huge file
Benchmark chart of allocated memory for a huge file

Even though the streaming mode benefits from lower memory consumption, the limitation is that the file will load in a read-only mode. Modifying rows, column collection, and cells (values, styles, formula) is not generally supported and might have unpredictable behavior.

Tips for improving performance

Besides using the streaming mode, here are other recommendations for improving performance while developing with GemBox.Spreadsheet:

  • When reading large Excel files, the most efficient way to retrieve the data is by accessing it in sequence, from the first ExcelRow in a sheet to the last, and from the first ExcelCell in a row to the last. Avoid accessing cells randomly in workbooks with large data sets.
  • When writing large Excel files with a huge amount of numbers and dates, use one of the ExcelCell.SetValue overload methods rather than the ExcelCell.Value property to avoid boxing and unboxing value types.
  • Typically, sheets have the same number of columns in each row. In that case, to retrieve the columns count, use the Count value of the first ExcelRow.AllocatedCells rather than the ExcelWorksheet.CalculateMaxUsedColumns method.
  • When auto fitting columns, you can speed up the operation by reducing the number of cells measured with the ExcelColumn.AutoFit(Double, ExcelRow, ExcelRow) method. You can also use the first parameter, the scaling factor, to adjust the calculated column width.
  • When inserting or removing rows or columns, you can speed up the operation by disabling the ExcelFile.AutomaticFormulaUpdate option.

See also


Next steps

GemBox.Spreadsheet is a .NET component that enables you to read, write, edit, convert, and print spreadsheet files from your .NET applications using one simple API.

Download Buy