Skip to content

Convert Legacy DOC, XLS, PPT to OOXML

Office Oxide is the only Rust or Python library that reads Word 97–2003 (.doc), Excel 97–2003 (.xls), and PowerPoint 97–2003 (.ppt) and writes their modern OOXML equivalents — without a JVM (Apache Tika), an external converter (LibreOffice headless), or a commercial license (Aspose).

save_as does the conversion in one call. Open a legacy file, save it with a modern extension; Office Oxide routes it through the IR and writes a fresh OOXML container.

One-liner

Rust

use office_oxide::Document;

Document::open("old.doc")?.save_as("modern.docx")?;
Document::open("old.xls")?.save_as("modern.xlsx")?;
Document::open("old.ppt")?.save_as("modern.pptx")?;

Python

from office_oxide import Document

with Document.open("old.doc") as doc:
    doc.save_as("modern.docx")

with Document.open("old.xls") as doc:
    doc.save_as("modern.xlsx")

with Document.open("old.ppt") as doc:
    doc.save_as("modern.pptx")

JavaScript

import { Document } from 'office-oxide';

using doc = Document.open('old.xls');
doc.saveAs('modern.xlsx');

Go

doc, _ := officeoxide.Open("old.xls")
defer doc.Close()
doc.SaveAs("modern.xlsx")

C#

using var doc = Document.Open("old.xls");
doc.SaveAs("modern.xlsx");

Bulk migration

Migrate a corpus in a one-liner.

Python

from pathlib import Path
from office_oxide import Document

for src in Path("legacy").rglob("*"):
    if src.suffix.lower() in {".doc", ".xls", ".ppt"}:
        new_ext = {".doc": ".docx", ".xls": ".xlsx", ".ppt": ".pptx"}[src.suffix.lower()]
        dst = Path("modern") / src.relative_to("legacy").with_suffix(new_ext)
        dst.parent.mkdir(parents=True, exist_ok=True)
        with Document.open(src) as doc:
            doc.save_as(dst)
        print(f"{src}{dst}")

Rust

use office_oxide::Document;
use std::path::Path;

fn migrate(src: &Path, dst: &Path) -> office_oxide::Result<()> {
    Document::open(src)?.save_as(dst)?;
    Ok(())
}

Wrap the loop in rayon for parallel migration of large corpora.

Shell — using the CLI

find legacy/ -iname '*.doc' | parallel \
  'office-oxide convert {} modern/{/.}.docx'

find legacy/ -iname '*.xls' | parallel \
  'office-oxide convert {} modern/{/.}.xlsx'

find legacy/ -iname '*.ppt' | parallel \
  'office-oxide convert {} modern/{/.}.pptx'

What survives the round-trip

Office Oxide preserves the content shape — paragraphs, tables, cells, slides, lists, headings — and the values inside them. A few categories don’t carry over because the legacy formats encode them in proprietary structures the IR doesn’t model:

Category Carried Notes
Paragraph text Including bold/italic/underline runs
Lists Ordered + unordered
Tables Cells, row order, header row
XLSX cell values (string, number, bool)
Sheet names
Slide titles + body
Hyperlinks
Images partial DOC/PPT preserve inline images; XLS image anchors are dropped
Comments / revisions Tracked changes are flattened
Formulas (XLS) values only Cached formula results survive; formula expressions don’t round-trip
WordArt, smart art, charts Re-render in the target format if you need them
Encryption Decrypt the legacy file first (e.g. via LibreOffice)

For most LLM, indexing, and archival use cases, the content-level fidelity is what matters — and Office Oxide gets you a fully editable DOCX/XLSX/PPTX in milliseconds.

Performance

Per-file conversion runs at the same order of magnitude as text extraction. On a typical Word 97 .doc, expect single-digit milliseconds end-to-end.

See also