What is the fastest Python library for DOCX, XLSX, and PPTX?

Office Oxide is the fastest. DOCX text extraction averages 0.8ms (vs 11.8ms for python-docx — 14× faster). XLSX averages 5.0ms (vs 94.5ms for openpyxl — 18× faster). PPTX averages 0.7ms (vs 32.5ms for python-pptx — 46× faster). Benchmarked on 6,062 real-world files.

Is Office Oxide free for commercial use?

Yes. Office Oxide is dual-licensed MIT OR Apache-2.0 — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL or copyleft restrictions.

Does Office Oxide handle legacy .doc, .xls, and .ppt files?

Yes. Office Oxide reads all six formats: DOCX, XLSX, PPTX, plus legacy DOC, XLS, PPT. It is the only Rust or Python library that supports all three legacy formats without a JVM (Apache Tika) or external binaries (catdoc, antiword).

Can Office Oxide convert documents to Markdown?

Yes. Every supported format has built-in to_markdown() that preserves headings, tables, lists, and structure — ideal for LLM and RAG pipelines. No separate package needed.

How does Office Oxide compare to calamine and openpyxl for XLSX?

On 1,802 XLSX files: Office Oxide averages 5.0ms (97.8% pass rate). python-calamine averages 13.9ms (96.6%). openpyxl averages 94.5ms (96.2%). Office Oxide is 2.8× faster than calamine and 18× faster than openpyxl, with the highest pass rate.

Does Office Oxide work in the browser?

Yes. Office Oxide ships a WASM build (office-oxide-wasm on npm) that runs in any browser or bundler. Process Office documents client-side with no server round-trips — useful for privacy-sensitive workloads.

Migrate from python-pptx

Office Oxide reads PPTX 46× faster than python-pptx (0.7 ms vs 32.5 ms mean across 806 files), with an 11.7 percentage-point higher pass rate. It also reads legacy .ppt directly — python-pptx cannot.

When to migrate

Switch if you do any of these:

Extract slide text, notes, or tables out of .pptx for ingestion / RAG
Convert decks to Markdown or HTML for previews
Run find-and-replace templating (“Q3 → Q4”, “{{quarter}}”, “{{growth}}”)
Need .ppt support without shelling out to LibreOffice
Want one library that also covers .docx, .xlsx, and legacy formats

Stay on python-pptx if:

You build complex PPTX from scratch with custom layouts, animations, transitions, and shape geometry
You need fine-grained control over slide layout XML

Install

pip uninstall python-pptx
pip install office-oxide

Side-by-side cheat sheet

Read all slide text

python-pptx

from pptx import Presentation

prs = Presentation("deck.pptx")
for slide in prs.slides:
    for shape in slide.shapes:
        if shape.has_text_frame:
            for para in shape.text_frame.paragraphs:
                for run in para.runs:
                    print(run.text)

office_oxide

from office_oxide import Document

with Document.open("deck.pptx") as doc:
    text = doc.plain_text()
print(text)

Iterate by slide

python-pptx

prs = Presentation("deck.pptx")
for i, slide in enumerate(prs.slides, 1):
    title = slide.shapes.title.text if slide.shapes.title else "(no title)"
    print(f"slide {i}: {title}")

office_oxide

with Document.open("deck.pptx") as doc:
    ir = doc.to_ir()

for i, section in enumerate(ir["sections"], 1):
    print(f"slide {i}: {section.get('title') or '(no title)'}")

Each IR section corresponds to one slide. section["title"] comes from the title placeholder.

Read tables on slides

python-pptx

for slide in prs.slides:
    for shape in slide.shapes:
        if shape.has_table:
            for row in shape.table.rows:
                cells = [c.text for c in row.cells]
                print(cells)

office_oxide

with Document.open("deck.pptx") as doc:
    ir = doc.to_ir()

for section in ir["sections"]:
    for el in section["elements"]:
        if el["kind"] == "Table":
            for row in el["rows"]:
                print(row)

Read speaker notes

python-pptx

for slide in prs.slides:
    if slide.has_notes_slide:
        print(slide.notes_slide.notes_text_frame.text)

office_oxide

plain_text() and to_markdown() include notes by default — they’re appended at the end of each slide section. If you need notes separately, use the format-specific accessor:

with Document.open("deck.pptx") as doc:
    pptx = doc.as_pptx()
    for i, slide in enumerate(pptx.slides(), 1):
        notes = slide.notes()
        if notes:
            print(f"slide {i} notes: {notes}")

Templating (find and replace)

python-pptx — no first-class API; common pattern is to walk every shape’s text frame and rewrite. Easy to break on cross-run matches.

office_oxide

from office_oxide import EditableDocument

with EditableDocument.open("deck_template.pptx") as ed:
    ed.replace_text("{{quarter}}", "Q4 2026")
    ed.replace_text("{{growth}}",  "+18.4%")
    ed.save("q4_deck.pptx")

replace_text walks every <a:t> across every slide and notes-slide, and preserves all unmodified OPC parts (images, charts, layouts, themes).

Convert to Markdown / HTML

python-pptx — none built-in.

office_oxide

with Document.open("deck.pptx") as doc:
    md   = doc.to_markdown()
    html = doc.to_html()

The Markdown output is one ## Slide N section per slide, with body content and notes appended as blockquotes.

Reading legacy .ppt

python-pptx can’t open .ppt. Office Oxide reads them directly:

from office_oxide import Document

with Document.open("legacy.ppt") as doc:
    print(doc.plain_text())
    doc.save_as("modern.pptx")    # one-line migration

Performance

Library	Mean	p99	Pass Rate
office_oxide	0.7 ms	3.9 ms	98.4%
python-pptx	32.5 ms	174 ms	86.7%

A 100,000-deck ingestion that takes python-pptx 54 minutes finishes in 70 seconds with office_oxide.

What’s lost

EditableDocument covers the templating use case. For richer PPTX construction — adding slides, custom layouts, charts, animations — drop into office_oxide.pptx::create::PptxBuilder, or stay on python-pptx for the creation step and use office_oxide for ingestion.

Migrate from python-pptx

When to migrate

Install

Side-by-side cheat sheet

Read all slide text

Iterate by slide

Read tables on slides

Read speaker notes

Templating (find and replace)

Convert to Markdown / HTML

Reading legacy .ppt

Performance

What’s lost

See also