Skip to content

Rust Office Library — Quick Start

office_oxide is a pure-Rust crate for parsing, converting, and editing Office documents — DOCX, XLSX, PPTX, plus the legacy binary formats DOC, XLS, PPT. One crate, one unified Document handle, zero native dependencies.

Install

[dependencies]
office_oxide = "0.1.0"

Optional features:

office_oxide = { version = "0.1.0", features = ["mmap"] }       # memory-mapped opens
office_oxide = { version = "0.1.0", features = ["parallel"] }   # rayon-based parallel parse helpers

Read a document

use office_oxide::Document;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let doc = Document::open("report.docx")?;
    println!("{}", doc.plain_text());
    Ok(())
}

Or the one-shot helper:

let text = office_oxide::extract_text("report.docx")?;

Core API

The unified Document handle works the same for every format — extension detection plus magic-byte sniffing pick the right parser.

use office_oxide::{Document, DocumentFormat};

let doc = Document::open("file.xlsx")?;
assert_eq!(doc.format(), DocumentFormat::Xlsx);

let plain = doc.plain_text();
let md    = doc.to_markdown();
let html  = doc.to_html();
let ir    = doc.to_ir();             // format-agnostic IR

doc.save_as("file.docx")?;            // legacy → OOXML works too

Document::open accepts AsRef<Path>; Document::from_reader takes Read + Seek + Send + 'static with an explicit DocumentFormat.

Module-level shortcuts for the common paths:

let text = office_oxide::extract_text("file.docx")?;
let md   = office_oxide::to_markdown("file.pptx")?;
let html = office_oxide::to_html("file.xlsx")?;

Format-specific access

When you need richer per-format data — sheets, slides, table cells — unwrap the inner document:

if let Some(xlsx) = doc.as_xlsx() {
    for sheet in xlsx.sheets() {
        println!("sheet: {}", sheet.name());
    }
}

The same pattern works for as_docx, as_pptx, as_doc, as_xls, and as_ppt.

Editing

EditableDocument runs read-modify-write while preserving every unmodified OPC part (images, charts, styles, relationships) verbatim. Editing is supported for DOCX, XLSX, and PPTX.

use office_oxide::edit::EditableDocument;

let mut doc = EditableDocument::open("template.docx")?;
let n = doc.replace_text("{{name}}", "Alice");
println!("{n} replacements");
doc.save("out.docx")?;

replace_text walks <w:t> elements in DOCX and <a:t> elements in PPTX. It returns the number of replacements (0 for XLSX — use set_cell instead).

Set XLSX cells

use office_oxide::edit::EditableDocument;
use office_oxide::xlsx::edit::CellValue;

let mut wb = EditableDocument::open("budget.xlsx")?;
wb.set_cell(0, "B2", CellValue::Number(42.0))?;
wb.set_cell(0, "A1", CellValue::String("Total".into()))?;
wb.set_cell(0, "C1", CellValue::Boolean(true))?;
wb.set_cell(0, "D1", CellValue::Empty)?;
wb.save("budget.xlsx")?;

Sheet indices are zero-based; cell refs use standard spreadsheet notation (A1, AA12).

Format-agnostic IR

DocumentIR is the structural bridge between formats — it powers to_html, save_as, and legacy-format conversion. It implements Serialize / Deserialize, so you can emit JSON for downstream tooling.

let legacy = Document::open("old.doc")?;
legacy.save_as("migrated.docx")?;     // CFB → OOXML in one line

Open from bytes

use std::io::Cursor;
use office_oxide::{Document, DocumentFormat};

let bytes = std::fs::read("file.pptx")?;
let doc = Document::from_reader(Cursor::new(bytes), DocumentFormat::Pptx)?;

Memory-mapped opens

With the mmap feature, Document::open_mmap avoids copying large OOXML files into the heap:

let doc = Document::open_mmap("huge.xlsx")?;

Only DOCX/XLSX/PPTX are mmap-able; the legacy CFB parsers require owned buffers.

Errors

All fallible entry points return office_oxide::Result<T> — i.e. Result<T, OfficeError>. The error enum covers IO, parse, unsupported-format, and extraction failures.

use office_oxide::{Document, OfficeError};

match Document::open("weird.file") {
    Ok(doc) => println!("{}", doc.plain_text()),
    Err(OfficeError::UnsupportedFormat(ext)) => eprintln!("cannot open .{ext}"),
    Err(e) => eprintln!("failed: {e}"),
}

Troubleshooting

Symptom Likely cause
UnsupportedFormat("(none)") Path has no extension — open via from_reader with an explicit DocumentFormat.
Garbled DOC text Source is encrypted or uses an uncommon piece-table encoding. Verify CFB magic D0 CF 11 E0.
Missing hyperlinks in DOCX Hyperlinks resolve via w:rels. Verify the .rels sidecar is present in the ZIP.
Stack overflow on tiny-stack threads office_oxide spawns a 16 MB parse thread when RLIMIT_STACK < 12 MB; in custom thread pools, set Builder::stack_size(16 * 1024 * 1024).

See also