Skip to content

Migrate from python-pptx

Office Oxide reads PPTX 46× faster than python-pptx (0.7 ms vs 32.5 ms mean across 806 files), with an 11.7 percentage-point higher pass rate. It also reads legacy .ppt directly — python-pptx cannot.

When to migrate

Switch if you do any of these:

  • Extract slide text, notes, or tables out of .pptx for ingestion / RAG
  • Convert decks to Markdown or HTML for previews
  • Run find-and-replace templating (“Q3 → Q4”, “{{quarter}}”, “{{growth}}”)
  • Need .ppt support without shelling out to LibreOffice
  • Want one library that also covers .docx, .xlsx, and legacy formats

Stay on python-pptx if:

  • You build complex PPTX from scratch with custom layouts, animations, transitions, and shape geometry
  • You need fine-grained control over slide layout XML

Install

pip uninstall python-pptx
pip install office-oxide

Side-by-side cheat sheet

Read all slide text

python-pptx

from pptx import Presentation

prs = Presentation("deck.pptx")
for slide in prs.slides:
    for shape in slide.shapes:
        if shape.has_text_frame:
            for para in shape.text_frame.paragraphs:
                for run in para.runs:
                    print(run.text)

office_oxide

from office_oxide import Document

with Document.open("deck.pptx") as doc:
    text = doc.plain_text()
print(text)

Iterate by slide

python-pptx

prs = Presentation("deck.pptx")
for i, slide in enumerate(prs.slides, 1):
    title = slide.shapes.title.text if slide.shapes.title else "(no title)"
    print(f"slide {i}: {title}")

office_oxide

with Document.open("deck.pptx") as doc:
    ir = doc.to_ir()

for i, section in enumerate(ir["sections"], 1):
    print(f"slide {i}: {section.get('title') or '(no title)'}")

Each IR section corresponds to one slide. section["title"] comes from the title placeholder.

Read tables on slides

python-pptx

for slide in prs.slides:
    for shape in slide.shapes:
        if shape.has_table:
            for row in shape.table.rows:
                cells = [c.text for c in row.cells]
                print(cells)

office_oxide

with Document.open("deck.pptx") as doc:
    ir = doc.to_ir()

for section in ir["sections"]:
    for el in section["elements"]:
        if el["kind"] == "Table":
            for row in el["rows"]:
                print(row)

Read speaker notes

python-pptx

for slide in prs.slides:
    if slide.has_notes_slide:
        print(slide.notes_slide.notes_text_frame.text)

office_oxide

plain_text() and to_markdown() include notes by default — they’re appended at the end of each slide section. If you need notes separately, use the format-specific accessor:

with Document.open("deck.pptx") as doc:
    pptx = doc.as_pptx()
    for i, slide in enumerate(pptx.slides(), 1):
        notes = slide.notes()
        if notes:
            print(f"slide {i} notes: {notes}")

Templating (find and replace)

python-pptx — no first-class API; common pattern is to walk every shape’s text frame and rewrite. Easy to break on cross-run matches.

office_oxide

from office_oxide import EditableDocument

with EditableDocument.open("deck_template.pptx") as ed:
    ed.replace_text("{{quarter}}", "Q4 2026")
    ed.replace_text("{{growth}}",  "+18.4%")
    ed.save("q4_deck.pptx")

replace_text walks every <a:t> across every slide and notes-slide, and preserves all unmodified OPC parts (images, charts, layouts, themes).

Convert to Markdown / HTML

python-pptx — none built-in.

office_oxide

with Document.open("deck.pptx") as doc:
    md   = doc.to_markdown()
    html = doc.to_html()

The Markdown output is one ## Slide N section per slide, with body content and notes appended as blockquotes.

Reading legacy .ppt

python-pptx can’t open .ppt. Office Oxide reads them directly:

from office_oxide import Document

with Document.open("legacy.ppt") as doc:
    print(doc.plain_text())
    doc.save_as("modern.pptx")    # one-line migration

Performance

Library Mean p99 Pass Rate
office_oxide 0.7 ms 3.9 ms 98.4%
python-pptx 32.5 ms 174 ms 86.7%

A 100,000-deck ingestion that takes python-pptx 54 minutes finishes in 70 seconds with office_oxide.

What’s lost

EditableDocument covers the templating use case. For richer PPTX construction — adding slides, custom layouts, charts, animations — drop into office_oxide.pptx::create::PptxBuilder, or stay on python-pptx for the creation step and use office_oxide for ingestion.

See also