Convert between Word files and HTML pages in C# and VB.NET
With GemBox.Document you can achieve quick and efficient conversion between Word documents and HTML pages, using simple and straightforward C# or VB.NET code.
The following examples show how you can import and export HTML content to and from DOC, DOCX, RTF and XML formats. GemBox.Document creates a well-formed HTML file from the Word document's rich content and images. The images are extracted as separate files to Alternatively, you can specify that images should be exported directly into an HTML file as base64-encoded data (Data URLs image source) using the The following example shows how you can convert a Word file to HTML with embedded images and semantic elements. You can also convert your Word file to web archive format (MHTML format) which is useful for creating a web page with concatenated resources or creating an email message. By default GemBox.Document will reference images within MHTML files with GemBox.Document supports reading input HTML files from a path, URL or a stream by using one of the When loading an HTML text or HTML stream you'll need to specify The following example shows how you can convert an HTML file to a Word document.Convert Word files to HTML or MHTML
HtmlSaveOptions.FilesDirectoryPath
and referenced relative to HtmlSaveOptions.FilesDirectorySrcPath
.HtmlSaveOptions.EmbedImages
property.using GemBox.Document;
class Program
{
static void Main()
{
// If using the Professional version, put your serial key below.
ComponentInfo.SetLicense("FREE-LIMITED-KEY");
// Load Word file (DOC, DOCX, RTF, XML) into DocumentModel object.
var document = DocumentModel.Load("%InputFileName%");
var saveOptions = new HtmlSaveOptions()
{
HtmlType = HtmlType.Html,
EmbedImages = true,
UseSemanticElements = true
};
// Save DocumentModel object to HTML (or MHTML) file.
document.Save("Exported.html", saveOptions);
}
}
Imports GemBox.Document
Module Program
Sub Main()
' If using the Professional version, put your serial key below.
ComponentInfo.SetLicense("FREE-LIMITED-KEY")
' Load Word file (DOC, DOCX, RTF, XML) into DocumentModel object.
Dim document = DocumentModel.Load("%InputFileName%")
Dim saveOptions As New HtmlSaveOptions() With
{
.HtmlType = HtmlType.Html,
.EmbedImages = True,
.UseSemanticElements = True
}
' Save DocumentModel object to HTML (or MHTML) file.
document.Save("Exported.html", saveOptions)
End Sub
End Module
Content-Location
headers. However, some MHTML viewers, like Microsoft Outlook, fail to load such resources. In that case you can switch to Content-ID (CID)
references using the HtmlSaveOptions.UseContentIdHeaders
property.Convert HTML pages to Word files
DocumentModel.Load
methods, and supports reading HTML text by using the ContentRange.LoadText
or ContentPosition.LoadText
methods.HtmlLoadOptions.BaseAddress
in order to import images with the relative path.using GemBox.Document;
class Program
{
static void Main()
{
// If using the Professional version, put your serial key below.
ComponentInfo.SetLicense("FREE-LIMITED-KEY");
// Load input HTML file.
DocumentModel document = DocumentModel.Load("%InputFileName%");
// When reading any HTML content a single Section element is created,
// which can be used to specify various Word document's page options.
// The same can also be achieved with HTML document itself,
// by using CSS properties on "@page" directive or "<body>" element.
Section section = document.Sections[0];
PageSetup pageSetup = section.PageSetup;
PageMargins pageMargins = pageSetup.PageMargins;
pageMargins.Top = pageMargins.Bottom = pageMargins.Left = pageMargins.Right = 0;
// Save output DOCX file.
document.Save("Output.%OutputFileType%");
}
}
Imports GemBox.Document
Module Program
Sub Main()
' If using the Professional version, put your serial key below.
ComponentInfo.SetLicense("FREE-LIMITED-KEY")
' Load input HTML file.
Dim document As DocumentModel = DocumentModel.Load("%InputFileName%")
' When reading any HTML content a single Section element is created,
' which can be used to specify various Word document's page options.
' The same can also be achieved with HTML document itself,
' by using CSS properties on "@page" directive or "<body>" element.
Dim section As Section = document.Sections(0)
Dim pageSetup As PageSetup = section.PageSetup
Dim pageMargins As PageMargins = pageSetup.PageMargins
With pageMargins
.Left = 0
.Right = 0
.Top = 0
.Bottom = 0
End With
' Save output DOCX file.
document.Save("Output.%OutputFileType%")
End Sub
End Module