Read, merge, split PDF in Python
GemBox.Pdf is a .NET library that can process PDF files from any .NET application. But it's also a COM-accessible library that you can use in Python. To use GemBox.Pdf in Python, you'll need to: When working with PDF, we usually need to perform some actions, such as merging documents with familiar topics or splitting a file to extract a particular page from it. See the following example to learn how to read a PDF file, split it into multiple files, and merge PDF files using Python. Not all members of GemBox.Pdf are COM-accessible because of limitations like unsupported static and overload methods. That is why you can use the However, if you need to use many GemBox.Pdf members from Python, we recommend creating a .NET wrapper library. Your wrapper library should do all the work and expose a minimal set of classes and methods to the unmanaged code. It will enable you to take advantage of GemBox.Pdf's full capabilities, avoiding any COM limitations, and improving performance by reducing the number of COM Callable Wrappers created at runtime.System Requirements
:: Add GemBox.Pdf to COM registry for x86 (32-bit) applications.
C:\Windows\Microsoft.NET\Framework\v4.0.30319\RegAsm.exe [path to installed GemBox.Pdf.dll]
:: Add GemBox.Pdf to COM registry for x64 (64-bit) applications.
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\RegAsm.exe [path to installed GemBox.Pdf.dll]
:: Install Python extension for Windows.
pip install pywin32
Working with PDF files in Python
import os
import win32com.client as COM
# Create ComHelper object.
comHelper = COM.Dispatch("GemBox.Pdf.ComHelper")
# If using the Professional version, put your serial key below.
comHelper.ComSetLicense("FREE-LIMITED-KEY")
fileNames = ["\\%#MergeFile01.pdf%", "\\%#MergeFile02.pdf%", "\\%#MergeFile03.pdf%"]
################
### Read PDF ###
################
# Load PDF file.
document1 = comHelper.Load(os.getcwd() + fileNames[0])
pages1 = document1.Pages
# Read text content from each PDF page.
for i1 in range(pages1.Count):
page = pages1.Item(i1)
print(page.Content.ToString() + "\n")
document1.Dispose()
#################
### Merge PDF ###
#################
# Create PdfDocument object.
document2 = COM.Dispatch("GemBox.Pdf.PdfDocument")
# Merge multiple PDF files into a single PDF file.
for fileName in fileNames:
sourceDocument = comHelper.Load(os.getcwd() + fileName)
sourcePages = sourceDocument.Pages
for i2 in range(sourcePages.Count):
document2.Pages.AddClone(sourcePages.Item(i2))
sourceDocument.Dispose()
comHelper.Save(document2, os.getcwd() + "\\Merge Files.pdf")
document2.Dispose()
#################
### Split PDF ###
#################
# Load PDF file.
document3 = comHelper.Load(os.getcwd() + "\\Merge Files.pdf")
pages3 = document3.Pages
# Split a single PDF file into multiple PDF files.
for i3 in range(pages3.Count):
destinationDocument = COM.Dispatch("GemBox.Pdf.PdfDocument")
destinationDocument.Pages.AddClone(pages3.Item(i3))
comHelper.Save(destinationDocument, os.getcwd() + "\\Page" + str(i3) + ".pdf")
destinationDocument.Dispose()
document3.Dispose()
Wrapper Library
ComHelper
class which provides alternatives for some members that cannot be called with COM Interop.