Performance Benchmarks
Office Oxide is benchmarked on 6,062 files drawn from 11 independent public test suites — LibreOffice Core, Apache POI, Open XML SDK, ClosedXML, Pandoc, python-docx/python-pptx, Apache Tika, calamine, openpreserve, oletools, and the LibreOffice legacy corpus.
Methodology: single-thread, release build with LTO, warm disk cache (steady-state), median of three runs on an idle system.
DOCX — 2,538 files
| Library | Language | Mean | p99 | Pass Rate | License |
|---|---|---|---|---|---|
| office_oxide | Rust | 0.8 ms | 3.9 ms | 98.9% | MIT |
| python-docx | Python | 11.8 ms | 98 ms | 95.1% | MIT |
Office Oxide is 14× faster than python-docx on the mean and 25× faster at the tail (p99). Pass rate is 3.8 percentage points higher.
XLSX — 1,802 files
| Library | Language | Mean | p99 | Pass Rate | License |
|---|---|---|---|---|---|
| office_oxide | Rust | 5.0 ms | 40 ms | 97.8% | MIT |
| python-calamine | Rust/Python | 13.9 ms | 183 ms | 96.6% | MIT |
| openpyxl | Python | 94.5 ms | 698 ms | 96.2% | MIT |
Office Oxide is 2.8× faster than calamine (the next-fastest XLSX library) and 18× faster than openpyxl. It also has the highest pass rate of the three.
PPTX — 806 files
| Library | Language | Mean | p99 | Pass Rate | License |
|---|---|---|---|---|---|
| office_oxide | Rust | 0.7 ms | 3.9 ms | 98.4% | MIT |
| python-pptx | Python | 32.5 ms | 174 ms | 86.7% | MIT |
Office Oxide is 46× faster than python-pptx, with an 11.7 percentage-point higher pass rate. python-pptx struggles with PowerPoint files that diverge from its expected schema; office_oxide handles them transparently.
Legacy formats — 916 files
No other Rust or Python library reads .doc, .xls, and .ppt without a JVM (Apache Tika) or external binaries (catdoc, antiword).
.doc — 246 files
| Library | Mean | p99 | Pass Rate | License |
|---|---|---|---|---|
| office_oxide | 0.3 ms | 3.4 ms | 94.7% | MIT |
| catdoc | 4.3 ms | 41 ms | 90.2% | GPL-2.0 |
| antiword | 4.5 ms | 66 ms | 76.8% | GPL-2.0 |
.xls — 494 files
| Library | Mean | p99 | Pass Rate | License |
|---|---|---|---|---|
| office_oxide | 2.8 ms | 75 ms | 99.2% | MIT |
| xls2csv (catdoc) | 6.9 ms | 58 ms | 84.0% | GPL-2.0 |
| python-calamine | 9.0 ms | 96 ms | 90.7% | MIT |
| xlrd | 36.6 ms | 503 ms | 93.1% | BSD-3 |
xls2csv has a tighter p99 (58 ms vs 75 ms) because it emits truncated/lossy output on complex sheets. Office Oxide is 2.4× faster on the mean and passes 15 percentage points more of the corpus.
.ppt — 176 files
| Library | Mean | p99 | Pass Rate | License |
|---|---|---|---|---|
| office_oxide | 0.7 ms | 6.6 ms | 100% | MIT |
| catppt (catdoc) | 2.8 ms | 8 ms | 77.8% | GPL-2.0 |
Pass rate — 98.4% across 6,062 files
The 97 non-passing files are all invalid inputs:
| Category | Count | Notes |
|---|---|---|
| Invalid ZIP / CFB archive | 43 | Truncated, missing EOCD, bad CFB magic |
| Missing required part | 21 | Encrypted, password-protected, or stream absent |
| Malformed XML | 18 | XML bombs, ill-formed tags, fuzz-corrupted content |
| Invalid CFB header | 15 | WordPerfect / IBM DisplayWrite / Excel 3/4 misnamed as .doc/.xls, CVE-exploit fixtures |
Zero failures on legitimate Word 97+ / Excel 97+ / PowerPoint 97+ files. Zero panics, zero timeouts, zero false negatives on valid documents.
Corpus
| Source | Files | License |
|---|---|---|
| LibreOffice Core | 2,185 | MPL-2.0 |
| Apache POI | 1,298 | Apache-2.0 |
| Open XML SDK | 707 | MIT |
| ClosedXML | 371 | MIT |
| Pandoc | 224 | GPL-2.0 |
| python-docx + python-pptx | 111 | MIT |
| Apache Tika | 108 | Apache-2.0 |
| calamine | 28 | MIT |
| openpreserve | 20 | CC0 |
| oletools | 17 | BSD-2 |
| LibreOffice (legacy) | 12 | MPL-2.0 |
| Total | 6,062 |
Reproducible benchmarks live in bench_rust/ and bench_python.py. Full methodology and per-file breakdown in BENCHMARKS.md.
See also
- vs python-docx — DOCX migration
- vs openpyxl — XLSX migration
- vs python-pptx — PPTX migration
- vs Apache Tika — when you want to drop the JVM