Migrate from xlrd
xlrd was the standard Python library for reading legacy .xls (Excel 97–2003) files. It dropped .xls support in 2.0 (2020) and recommends migrating off. The community workarounds — pinning xlrd<2.0, shelling out to LibreOffice, switching to python-calamine — each have caveats.
Office Oxide reads .xls directly, 13× faster than xlrd’s last .xls-capable release, with a higher pass rate. As a bonus, you can also convert .xls → .xlsx in one line.
When to migrate
Switch if any of these apply:
- You’re still on
xlrd<2.0and want a maintained library - You need both
.xlsand.xlsxfrom one library - You want to migrate the corpus to
.xlsxonce and stop dealing with the legacy format - You also need
.doc,.ppt,.docx, or.pptx— covered by the same install
Install
pip uninstall xlrd
pip install office-oxide
Side-by-side cheat sheet
Open a .xls
xlrd
import xlrd
book = xlrd.open_workbook("legacy.xls")
sheet = book.sheet_by_index(0)
office_oxide
from office_oxide import Document
with Document.open("legacy.xls") as doc:
xls = doc.as_xls()
sheet = xls.sheets()[0]
Iterate cells
xlrd
for row in range(sheet.nrows):
for col in range(sheet.ncols):
print(sheet.cell_value(row, col))
office_oxide
for cell in sheet.cells():
print(cell.address(), cell.value())
Read all cells as a table (most common case)
xlrd
import xlrd
book = xlrd.open_workbook("legacy.xls")
sheet = book.sheet_by_index(0)
rows = [
[sheet.cell_value(r, c) for c in range(sheet.ncols)]
for r in range(sheet.nrows)
]
office_oxide
from office_oxide import Document
with Document.open("legacy.xls") as doc:
ir = doc.to_ir()
# First sheet → first section → first table
table = next(el for el in ir["sections"][0]["elements"] if el["kind"] == "Table")
rows = table["rows"]
Sheet names
xlrd
book = xlrd.open_workbook("legacy.xls")
print(book.sheet_names())
office_oxide
with Document.open("legacy.xls") as doc:
print([s.name() for s in doc.as_xls().sheets()])
Convert .xls → .xlsx (one line)
If your downstream tooling already speaks .xlsx, the cleanest migration path is to convert the corpus once and never touch .xls again:
from office_oxide import Document
with Document.open("legacy.xls") as doc:
doc.save_as("modern.xlsx")
For a whole directory:
from pathlib import Path
from office_oxide import Document
for src in Path("legacy").rglob("*.xls"):
dst = src.with_suffix(".xlsx")
with Document.open(src) as doc:
doc.save_as(dst)
Performance
| Library | .xls Mean |
p99 | Pass Rate |
|---|---|---|---|
| office_oxide | 2.8 ms | 75 ms | 99.2% |
| python-calamine | 9.0 ms | 96 ms | 90.7% |
| xlrd | 36.6 ms | 503 ms | 93.1% |
Office Oxide is 13× faster than xlrd and has a 6.1 percentage-point higher pass rate.
What’s lost
xlrd’s formula expressions, defined names, and shared formula caches are not surfaced through the IR. Cached formula results survive — that’s what most downstream tools actually need. For formula expressions, drop into the format-specific xls module or convert to .xlsx and use xlsx.
See also
- Migrate from openpyxl — for
.xlsx - Conversion: legacy → OOXML — what
save_asdoes - Performance benchmarks