Skip to content

Performance Benchmarks

Office Oxide is benchmarked on 6,062 files drawn from 11 independent public test suites — LibreOffice Core, Apache POI, Open XML SDK, ClosedXML, Pandoc, python-docx/python-pptx, Apache Tika, calamine, openpreserve, oletools, and the LibreOffice legacy corpus.

Methodology: single-thread, release build with LTO, warm disk cache (steady-state), median of three runs on an idle system.

DOCX — 2,538 files

Library Language Mean p99 Pass Rate License
office_oxide Rust 0.8 ms 3.9 ms 98.9% MIT
python-docx Python 11.8 ms 98 ms 95.1% MIT

Office Oxide is 14× faster than python-docx on the mean and 25× faster at the tail (p99). Pass rate is 3.8 percentage points higher.

XLSX — 1,802 files

Library Language Mean p99 Pass Rate License
office_oxide Rust 5.0 ms 40 ms 97.8% MIT
python-calamine Rust/Python 13.9 ms 183 ms 96.6% MIT
openpyxl Python 94.5 ms 698 ms 96.2% MIT

Office Oxide is 2.8× faster than calamine (the next-fastest XLSX library) and 18× faster than openpyxl. It also has the highest pass rate of the three.

PPTX — 806 files

Library Language Mean p99 Pass Rate License
office_oxide Rust 0.7 ms 3.9 ms 98.4% MIT
python-pptx Python 32.5 ms 174 ms 86.7% MIT

Office Oxide is 46× faster than python-pptx, with an 11.7 percentage-point higher pass rate. python-pptx struggles with PowerPoint files that diverge from its expected schema; office_oxide handles them transparently.

Legacy formats — 916 files

No other Rust or Python library reads .doc, .xls, and .ppt without a JVM (Apache Tika) or external binaries (catdoc, antiword).

.doc — 246 files

Library Mean p99 Pass Rate License
office_oxide 0.3 ms 3.4 ms 94.7% MIT
catdoc 4.3 ms 41 ms 90.2% GPL-2.0
antiword 4.5 ms 66 ms 76.8% GPL-2.0

.xls — 494 files

Library Mean p99 Pass Rate License
office_oxide 2.8 ms 75 ms 99.2% MIT
xls2csv (catdoc) 6.9 ms 58 ms 84.0% GPL-2.0
python-calamine 9.0 ms 96 ms 90.7% MIT
xlrd 36.6 ms 503 ms 93.1% BSD-3

xls2csv has a tighter p99 (58 ms vs 75 ms) because it emits truncated/lossy output on complex sheets. Office Oxide is 2.4× faster on the mean and passes 15 percentage points more of the corpus.

.ppt — 176 files

Library Mean p99 Pass Rate License
office_oxide 0.7 ms 6.6 ms 100% MIT
catppt (catdoc) 2.8 ms 8 ms 77.8% GPL-2.0

Pass rate — 98.4% across 6,062 files

The 97 non-passing files are all invalid inputs:

Category Count Notes
Invalid ZIP / CFB archive 43 Truncated, missing EOCD, bad CFB magic
Missing required part 21 Encrypted, password-protected, or stream absent
Malformed XML 18 XML bombs, ill-formed tags, fuzz-corrupted content
Invalid CFB header 15 WordPerfect / IBM DisplayWrite / Excel 3/4 misnamed as .doc/.xls, CVE-exploit fixtures

Zero failures on legitimate Word 97+ / Excel 97+ / PowerPoint 97+ files. Zero panics, zero timeouts, zero false negatives on valid documents.

Corpus

Source Files License
LibreOffice Core 2,185 MPL-2.0
Apache POI 1,298 Apache-2.0
Open XML SDK 707 MIT
ClosedXML 371 MIT
Pandoc 224 GPL-2.0
python-docx + python-pptx 111 MIT
Apache Tika 108 Apache-2.0
calamine 28 MIT
openpreserve 20 CC0
oletools 17 BSD-2
LibreOffice (legacy) 12 MPL-2.0
Total 6,062

Reproducible benchmarks live in bench_rust/ and bench_python.py. Full methodology and per-file breakdown in BENCHMARKS.md.

See also