Convert Legacy DOC, XLS, PPT to OOXML
Office Oxide is the only Rust or Python library that reads Word 97–2003 (.doc), Excel 97–2003 (.xls), and PowerPoint 97–2003 (.ppt) and writes their modern OOXML equivalents — without a JVM (Apache Tika), an external converter (LibreOffice headless), or a commercial license (Aspose).
save_as does the conversion in one call. Open a legacy file, save it with a modern extension; Office Oxide routes it through the IR and writes a fresh OOXML container.
One-liner
Rust
use office_oxide::Document;
Document::open("old.doc")?.save_as("modern.docx")?;
Document::open("old.xls")?.save_as("modern.xlsx")?;
Document::open("old.ppt")?.save_as("modern.pptx")?;
Python
from office_oxide import Document
with Document.open("old.doc") as doc:
doc.save_as("modern.docx")
with Document.open("old.xls") as doc:
doc.save_as("modern.xlsx")
with Document.open("old.ppt") as doc:
doc.save_as("modern.pptx")
JavaScript
import { Document } from 'office-oxide';
using doc = Document.open('old.xls');
doc.saveAs('modern.xlsx');
Go
doc, _ := officeoxide.Open("old.xls")
defer doc.Close()
doc.SaveAs("modern.xlsx")
C#
using var doc = Document.Open("old.xls");
doc.SaveAs("modern.xlsx");
Bulk migration
Migrate a corpus in a one-liner.
Python
from pathlib import Path
from office_oxide import Document
for src in Path("legacy").rglob("*"):
if src.suffix.lower() in {".doc", ".xls", ".ppt"}:
new_ext = {".doc": ".docx", ".xls": ".xlsx", ".ppt": ".pptx"}[src.suffix.lower()]
dst = Path("modern") / src.relative_to("legacy").with_suffix(new_ext)
dst.parent.mkdir(parents=True, exist_ok=True)
with Document.open(src) as doc:
doc.save_as(dst)
print(f"{src} → {dst}")
Rust
use office_oxide::Document;
use std::path::Path;
fn migrate(src: &Path, dst: &Path) -> office_oxide::Result<()> {
Document::open(src)?.save_as(dst)?;
Ok(())
}
Wrap the loop in rayon for parallel migration of large corpora.
Shell — using the CLI
find legacy/ -iname '*.doc' | parallel \
'office-oxide convert {} modern/{/.}.docx'
find legacy/ -iname '*.xls' | parallel \
'office-oxide convert {} modern/{/.}.xlsx'
find legacy/ -iname '*.ppt' | parallel \
'office-oxide convert {} modern/{/.}.pptx'
What survives the round-trip
Office Oxide preserves the content shape — paragraphs, tables, cells, slides, lists, headings — and the values inside them. A few categories don’t carry over because the legacy formats encode them in proprietary structures the IR doesn’t model:
| Category | Carried | Notes |
|---|---|---|
| Paragraph text | ✓ | Including bold/italic/underline runs |
| Lists | ✓ | Ordered + unordered |
| Tables | ✓ | Cells, row order, header row |
| XLSX cell values (string, number, bool) | ✓ | — |
| Sheet names | ✓ | — |
| Slide titles + body | ✓ | — |
| Hyperlinks | ✓ | — |
| Images | partial | DOC/PPT preserve inline images; XLS image anchors are dropped |
| Comments / revisions | — | Tracked changes are flattened |
| Formulas (XLS) | values only | Cached formula results survive; formula expressions don’t round-trip |
| WordArt, smart art, charts | — | Re-render in the target format if you need them |
| Encryption | — | Decrypt the legacy file first (e.g. via LibreOffice) |
For most LLM, indexing, and archival use cases, the content-level fidelity is what matters — and Office Oxide gets you a fully editable DOCX/XLSX/PPTX in milliseconds.
Performance
Per-file conversion runs at the same order of magnitude as text extraction. On a typical Word 97 .doc, expect single-digit milliseconds end-to-end.
See also
- Performance benchmarks — DOC/XLS/PPT extraction numbers
- Build from IR — what
save_asdoes under the hood - Migrate from python-docx — switching from a per-format Python library