Question 1

What is the fastest Python library for DOCX, XLSX, and PPTX?

Accepted Answer

Office Oxide is the fastest. DOCX text extraction averages 0.8ms (vs 11.8ms for python-docx — 14× faster). XLSX averages 5.0ms (vs 94.5ms for openpyxl — 18× faster). PPTX averages 0.7ms (vs 32.5ms for python-pptx — 46× faster). Benchmarked on 6,062 real-world files.

Question 2

Is Office Oxide free for commercial use?

Accepted Answer

Yes. Office Oxide is dual-licensed MIT OR Apache-2.0 — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL or copyleft restrictions.

Question 3

Does Office Oxide handle legacy .doc, .xls, and .ppt files?

Accepted Answer

Yes. Office Oxide reads all six formats: DOCX, XLSX, PPTX, plus legacy DOC, XLS, PPT. It is the only Rust or Python library that supports all three legacy formats without a JVM (Apache Tika) or external binaries (catdoc, antiword).

Question 4

Can Office Oxide convert documents to Markdown?

Accepted Answer

Yes. Every supported format has built-in to_markdown() that preserves headings, tables, lists, and structure — ideal for LLM and RAG pipelines. No separate package needed.

Question 5

How does Office Oxide compare to calamine and openpyxl for XLSX?

Accepted Answer

On 1,802 XLSX files: Office Oxide averages 5.0ms (97.8% pass rate). python-calamine averages 13.9ms (96.6%). openpyxl averages 94.5ms (96.2%). Office Oxide is 2.8× faster than calamine and 18× faster than openpyxl, with the highest pass rate.

Question 6

Does Office Oxide work in the browser?

Accepted Answer

Yes. Office Oxide ships a WASM build (office-oxide-wasm on npm) that runs in any browser or bundler. Process Office documents client-side with no server round-trips — useful for privacy-sensitive workloads.

Format	Output
DOCX	Body text in document order, plus headers and footers; soft hyphens stripped
XLSX	Cell values across every sheet, tab-separated within a row, blank line between sheets
PPTX	Slide title, body placeholders, table cells, and notes — one slide per paragraph block
DOC	Same shape as DOCX, parsed directly from the CFB piece table
XLS	Same shape as XLSX, parsed directly from BIFF8 records
PPT	Same shape as PPTX, parsed from the PowerPoint Document stream

Format	Mean	p99	Pass rate
DOCX (2,538 files)	0.8ms	3.9ms	98.9%
XLSX (1,802 files)	5.0ms	40ms	97.8%
PPTX (806 files)	0.7ms	3.9ms	98.4%
DOC (246 files)	0.3ms	3.4ms	94.7%
XLS (494 files)	2.8ms	75ms	99.2%
PPT (176 files)	0.7ms	6.6ms	100%

Extract Text from Office Documents

One-shot helper

Rust

Python

JavaScript

Go

C#

Reusable handle

Rust

Python

JavaScript

What you get per format

From bytes (no temp file)

Python

JavaScript

Rust

Performance

See also