Migrate from python-pptx
Office Oxide reads PPTX 46× faster than python-pptx (0.7 ms vs 32.5 ms mean across 806 files), with an 11.7 percentage-point higher pass rate. It also reads legacy .ppt directly — python-pptx cannot.
When to migrate
Switch if you do any of these:
- Extract slide text, notes, or tables out of
.pptxfor ingestion / RAG - Convert decks to Markdown or HTML for previews
- Run find-and-replace templating (“Q3 → Q4”, “{{quarter}}”, “{{growth}}”)
- Need
.pptsupport without shelling out to LibreOffice - Want one library that also covers
.docx,.xlsx, and legacy formats
Stay on python-pptx if:
- You build complex PPTX from scratch with custom layouts, animations, transitions, and shape geometry
- You need fine-grained control over slide layout XML
Install
pip uninstall python-pptx
pip install office-oxide
Side-by-side cheat sheet
Read all slide text
python-pptx
from pptx import Presentation
prs = Presentation("deck.pptx")
for slide in prs.slides:
for shape in slide.shapes:
if shape.has_text_frame:
for para in shape.text_frame.paragraphs:
for run in para.runs:
print(run.text)
office_oxide
from office_oxide import Document
with Document.open("deck.pptx") as doc:
text = doc.plain_text()
print(text)
Iterate by slide
python-pptx
prs = Presentation("deck.pptx")
for i, slide in enumerate(prs.slides, 1):
title = slide.shapes.title.text if slide.shapes.title else "(no title)"
print(f"slide {i}: {title}")
office_oxide
with Document.open("deck.pptx") as doc:
ir = doc.to_ir()
for i, section in enumerate(ir["sections"], 1):
print(f"slide {i}: {section.get('title') or '(no title)'}")
Each IR section corresponds to one slide. section["title"] comes from the title placeholder.
Read tables on slides
python-pptx
for slide in prs.slides:
for shape in slide.shapes:
if shape.has_table:
for row in shape.table.rows:
cells = [c.text for c in row.cells]
print(cells)
office_oxide
with Document.open("deck.pptx") as doc:
ir = doc.to_ir()
for section in ir["sections"]:
for el in section["elements"]:
if el["kind"] == "Table":
for row in el["rows"]:
print(row)
Read speaker notes
python-pptx
for slide in prs.slides:
if slide.has_notes_slide:
print(slide.notes_slide.notes_text_frame.text)
office_oxide
plain_text() and to_markdown() include notes by default — they’re appended at the end of each slide section. If you need notes separately, use the format-specific accessor:
with Document.open("deck.pptx") as doc:
pptx = doc.as_pptx()
for i, slide in enumerate(pptx.slides(), 1):
notes = slide.notes()
if notes:
print(f"slide {i} notes: {notes}")
Templating (find and replace)
python-pptx — no first-class API; common pattern is to walk every shape’s text frame and rewrite. Easy to break on cross-run matches.
office_oxide
from office_oxide import EditableDocument
with EditableDocument.open("deck_template.pptx") as ed:
ed.replace_text("{{quarter}}", "Q4 2026")
ed.replace_text("{{growth}}", "+18.4%")
ed.save("q4_deck.pptx")
replace_text walks every <a:t> across every slide and notes-slide, and preserves all unmodified OPC parts (images, charts, layouts, themes).
Convert to Markdown / HTML
python-pptx — none built-in.
office_oxide
with Document.open("deck.pptx") as doc:
md = doc.to_markdown()
html = doc.to_html()
The Markdown output is one ## Slide N section per slide, with body content and notes appended as blockquotes.
Reading legacy .ppt
python-pptx can’t open .ppt. Office Oxide reads them directly:
from office_oxide import Document
with Document.open("legacy.ppt") as doc:
print(doc.plain_text())
doc.save_as("modern.pptx") # one-line migration
Performance
| Library | Mean | p99 | Pass Rate |
|---|---|---|---|
| office_oxide | 0.7 ms | 3.9 ms | 98.4% |
| python-pptx | 32.5 ms | 174 ms | 86.7% |
A 100,000-deck ingestion that takes python-pptx 54 minutes finishes in 70 seconds with office_oxide.
What’s lost
EditableDocument covers the templating use case. For richer PPTX construction — adding slides, custom layouts, charts, animations — drop into office_oxide.pptx::create::PptxBuilder, or stay on python-pptx for the creation step and use office_oxide for ingestion.
See also
- Replace text —
replace_textsemantics and run-boundary handling - Office for RAG — slide-aware chunking
- Performance benchmarks