Convert Office Documents to HTML
Every Office Oxide handle has a to_html() method that emits clean, semantic HTML5 from any supported format. Use it for browser previews, email rendering, or a quick visual diff.
One-shot
Python
import office_oxide
html = office_oxide.to_html("report.docx")
open("report.html", "w").write(html)
Rust
use office_oxide::to_html;
let html = to_html("report.docx")?;
std::fs::write("report.html", html)?;
JavaScript
import { toHtml } from 'office-oxide';
import { writeFileSync } from 'node:fs';
writeFileSync('report.html', toHtml('report.docx'));
Go
html, err := officeoxide.ToHTML("report.docx")
os.WriteFile("report.html", []byte(html), 0o644)
C#
File.WriteAllText("report.html", OfficeOxide.ToHtml("report.docx"));
Reusable handle
Python
from office_oxide import Document
with Document.open("slides.pptx") as doc:
html = doc.to_html()
JavaScript
using doc = Document.open('slides.pptx');
const html = doc.toHtml();
Rust
let doc = office_oxide::Document::open("slides.pptx")?;
let html = doc.to_html();
What gets emitted
The HTML is fragment-style — no <html>, <head>, or <body> wrapper. You decide where to mount it and what stylesheet to apply.
| Source | HTML element |
|---|---|
| Heading | <h1> … <h6> matching the source level |
| Paragraph | <p> |
| Bold / italic / underline | <strong>, <em>, <u> |
| List | <ul> / <ol> with <li> children |
| Table | <table> with <thead>, <tbody>, <tr>, <th>, <td> |
| Hyperlink | <a href="..."> |
| Image | <img src="..." alt="..."> |
| XLSX sheet | <section data-sheet="name"> + <table> |
| PPTX slide | <section data-slide="N"> + body + optional <aside> for notes |
The output is escaped: any user content inside the document is HTML-escaped, so embedding the result in a page is safe by default.
Wrapping for a standalone page
from office_oxide import Document
with Document.open("report.docx") as doc:
body = doc.to_html()
page = f"""<!DOCTYPE html>
<html><head><meta charset="utf-8"><title>Report</title>
<link rel="stylesheet" href="docs.css">
</head><body>{body}</body></html>"""
open("report.html", "w").write(page)
Use cases
- In-browser preview of uploaded documents (
<input type="file">→ WASM →<iframe srcdoc>). - Email rendering of generated reports.
- Diff views — HTML diffs render meaningfully in code-review tools.
- Search indexing with structure preserved (so headings can boost results).
See also
- Markdown extraction — when you want plain-text-friendly output
- IR extraction — structured JSON when you need to render your own
- WASM Quick Start — for in-browser conversion