Skip to content

Migrate from xlrd

xlrd was the standard Python library for reading legacy .xls (Excel 97–2003) files. It dropped .xls support in 2.0 (2020) and recommends migrating off. The community workarounds — pinning xlrd<2.0, shelling out to LibreOffice, switching to python-calamine — each have caveats.

Office Oxide reads .xls directly, 13× faster than xlrd’s last .xls-capable release, with a higher pass rate. As a bonus, you can also convert .xls.xlsx in one line.

When to migrate

Switch if any of these apply:

  • You’re still on xlrd<2.0 and want a maintained library
  • You need both .xls and .xlsx from one library
  • You want to migrate the corpus to .xlsx once and stop dealing with the legacy format
  • You also need .doc, .ppt, .docx, or .pptx — covered by the same install

Install

pip uninstall xlrd
pip install office-oxide

Side-by-side cheat sheet

Open a .xls

xlrd

import xlrd

book = xlrd.open_workbook("legacy.xls")
sheet = book.sheet_by_index(0)

office_oxide

from office_oxide import Document

with Document.open("legacy.xls") as doc:
    xls = doc.as_xls()
    sheet = xls.sheets()[0]

Iterate cells

xlrd

for row in range(sheet.nrows):
    for col in range(sheet.ncols):
        print(sheet.cell_value(row, col))

office_oxide

for cell in sheet.cells():
    print(cell.address(), cell.value())

Read all cells as a table (most common case)

xlrd

import xlrd

book = xlrd.open_workbook("legacy.xls")
sheet = book.sheet_by_index(0)
rows = [
    [sheet.cell_value(r, c) for c in range(sheet.ncols)]
    for r in range(sheet.nrows)
]

office_oxide

from office_oxide import Document

with Document.open("legacy.xls") as doc:
    ir = doc.to_ir()

# First sheet → first section → first table
table = next(el for el in ir["sections"][0]["elements"] if el["kind"] == "Table")
rows = table["rows"]

Sheet names

xlrd

book = xlrd.open_workbook("legacy.xls")
print(book.sheet_names())

office_oxide

with Document.open("legacy.xls") as doc:
    print([s.name() for s in doc.as_xls().sheets()])

Convert .xls → .xlsx (one line)

If your downstream tooling already speaks .xlsx, the cleanest migration path is to convert the corpus once and never touch .xls again:

from office_oxide import Document

with Document.open("legacy.xls") as doc:
    doc.save_as("modern.xlsx")

For a whole directory:

from pathlib import Path
from office_oxide import Document

for src in Path("legacy").rglob("*.xls"):
    dst = src.with_suffix(".xlsx")
    with Document.open(src) as doc:
        doc.save_as(dst)

Performance

Library .xls Mean p99 Pass Rate
office_oxide 2.8 ms 75 ms 99.2%
python-calamine 9.0 ms 96 ms 90.7%
xlrd 36.6 ms 503 ms 93.1%

Office Oxide is 13× faster than xlrd and has a 6.1 percentage-point higher pass rate.

What’s lost

xlrd’s formula expressions, defined names, and shared formula caches are not surfaced through the IR. Cached formula results survive — that’s what most downstream tools actually need. For formula expressions, drop into the format-specific xls module or convert to .xlsx and use xlsx.

See also