Migrate from openpyxl
Office Oxide reads XLSX 18× faster than openpyxl (5.0 ms vs 94.5 ms mean across 1,802 files), with the highest pass rate of any tested library. It also reads legacy .xls directly — something openpyxl can’t do at all.
When to migrate
Switch if you do any of these:
- Read cells, rows, sheets, or tables out of
.xlsxfor ingestion / RAG / dashboards - Convert spreadsheets to Markdown or HTML
- Need
.xlssupport without addingxlrd(deprecated since 2.0) or shelling out to LibreOffice - Want to also process
.docx,.pptx, or legacy DOC/PPT without more dependencies - Use
EditableDocumentto write cells in templates
Stay on openpyxl if:
- You build complex XLSX from scratch with charts, conditional formatting, named styles, and pivot tables (openpyxl is the strongest pure-Python option for full-featured creation)
- You need formula evaluation in pure Python
Install
pip uninstall openpyxl
pip install office-oxide
Side-by-side cheat sheet
Open a workbook
openpyxl
from openpyxl import load_workbook
wb = load_workbook("budget.xlsx", data_only=True)
office_oxide
from office_oxide import Document
with Document.open("budget.xlsx") as doc:
...
Iterate sheet cells
openpyxl
from openpyxl import load_workbook
wb = load_workbook("budget.xlsx", data_only=True)
for sheet in wb.worksheets:
for row in sheet.iter_rows(values_only=True):
print(row)
office_oxide
from office_oxide import Document
with Document.open("budget.xlsx") as doc:
ir = doc.to_ir()
for section in ir["sections"]:
print(f"# {section.get('title')}")
for el in section["elements"]:
if el["kind"] == "Table":
for row in el["rows"]:
print(row)
For richer per-cell access (types, formulas, merged cells) drop into the XLSX module:
with Document.open("budget.xlsx") as doc:
xlsx = doc.as_xlsx()
for sheet in xlsx.sheets():
for cell in sheet.cells():
print(cell.address(), cell.value(), cell.value_type())
Read a single cell
openpyxl
wb = load_workbook("budget.xlsx", data_only=True)
sheet = wb["Q4"]
val = sheet["B5"].value
office_oxide
with Document.open("budget.xlsx") as doc:
val = doc.as_xlsx().sheet("Q4").cell("B5").value()
Write cells (templating)
openpyxl
from openpyxl import load_workbook
wb = load_workbook("template.xlsx")
ws = wb["Summary"]
ws["A1"] = "Total"
ws["B1"] = 42.5
ws["C1"] = True
wb.save("filled.xlsx")
office_oxide
from office_oxide import EditableDocument
with EditableDocument.open("template.xlsx") as ed:
ed.set_cell(0, "A1", "Total") # sheet 0 = first sheet
ed.set_cell(0, "B1", 42.5)
ed.set_cell(0, "C1", True)
ed.save("filled.xlsx")
sheet_index is zero-based; cell_ref is standard A1 notation. To resolve sheet name → index, read the workbook once and call sheets().
Convert to Markdown / HTML
openpyxl — none built-in; you’d render rows yourself.
office_oxide
with Document.open("budget.xlsx") as doc:
md = doc.to_markdown() # one ## section per sheet, GFM tables
Read sheet names
openpyxl
wb = load_workbook("budget.xlsx")
print(wb.sheetnames)
office_oxide
with Document.open("budget.xlsx") as doc:
print([s.name() for s in doc.as_xlsx().sheets()])
Reading legacy .xls
openpyxl can’t open .xls. The historical workaround was xlrd, deprecated since 2.0 for .xls, no longer maintained.
office_oxide
from office_oxide import Document
with Document.open("legacy.xls") as doc:
print(doc.plain_text())
doc.save_as("modern.xlsx") # one-line migration
Performance
| Library | Mean | p99 | Pass Rate |
|---|---|---|---|
| office_oxide | 5.0 ms | 40 ms | 97.8% |
| python-calamine | 13.9 ms | 183 ms | 96.6% |
| openpyxl | 94.5 ms | 698 ms | 96.2% |
A typical analytics pipeline that reads 100,000 spreadsheets daily: 8.3 hours with openpyxl, 26 minutes with office_oxide.
What’s lost
EditableDocument.set_cell writes raw cell values; it does not modify number formats, conditional formatting, charts, or named ranges (those parts are preserved verbatim). For XLSX construction with full styling, use openpyxl or drop into office_oxide.xlsx::create::XlsxBuilder.
See also
- Set XLSX cells — full type matrix and edge cases
- Migrate from xlrd — for legacy
.xls - Performance benchmarks