Rust Office Library — Quick Start
office_oxide is a pure-Rust crate for parsing, converting, and editing Office documents — DOCX, XLSX, PPTX, plus the legacy binary formats DOC, XLS, PPT. One crate, one unified Document handle, zero native dependencies.
Install
[dependencies]
office_oxide = "0.1.0"
Optional features:
office_oxide = { version = "0.1.0", features = ["mmap"] } # memory-mapped opens
office_oxide = { version = "0.1.0", features = ["parallel"] } # rayon-based parallel parse helpers
Read a document
use office_oxide::Document;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let doc = Document::open("report.docx")?;
println!("{}", doc.plain_text());
Ok(())
}
Or the one-shot helper:
let text = office_oxide::extract_text("report.docx")?;
Core API
The unified Document handle works the same for every format — extension detection plus magic-byte sniffing pick the right parser.
use office_oxide::{Document, DocumentFormat};
let doc = Document::open("file.xlsx")?;
assert_eq!(doc.format(), DocumentFormat::Xlsx);
let plain = doc.plain_text();
let md = doc.to_markdown();
let html = doc.to_html();
let ir = doc.to_ir(); // format-agnostic IR
doc.save_as("file.docx")?; // legacy → OOXML works too
Document::open accepts AsRef<Path>; Document::from_reader takes Read + Seek + Send + 'static with an explicit DocumentFormat.
Module-level shortcuts for the common paths:
let text = office_oxide::extract_text("file.docx")?;
let md = office_oxide::to_markdown("file.pptx")?;
let html = office_oxide::to_html("file.xlsx")?;
Format-specific access
When you need richer per-format data — sheets, slides, table cells — unwrap the inner document:
if let Some(xlsx) = doc.as_xlsx() {
for sheet in xlsx.sheets() {
println!("sheet: {}", sheet.name());
}
}
The same pattern works for as_docx, as_pptx, as_doc, as_xls, and as_ppt.
Editing
EditableDocument runs read-modify-write while preserving every unmodified OPC part (images, charts, styles, relationships) verbatim. Editing is supported for DOCX, XLSX, and PPTX.
use office_oxide::edit::EditableDocument;
let mut doc = EditableDocument::open("template.docx")?;
let n = doc.replace_text("{{name}}", "Alice");
println!("{n} replacements");
doc.save("out.docx")?;
replace_text walks <w:t> elements in DOCX and <a:t> elements in PPTX. It returns the number of replacements (0 for XLSX — use set_cell instead).
Set XLSX cells
use office_oxide::edit::EditableDocument;
use office_oxide::xlsx::edit::CellValue;
let mut wb = EditableDocument::open("budget.xlsx")?;
wb.set_cell(0, "B2", CellValue::Number(42.0))?;
wb.set_cell(0, "A1", CellValue::String("Total".into()))?;
wb.set_cell(0, "C1", CellValue::Boolean(true))?;
wb.set_cell(0, "D1", CellValue::Empty)?;
wb.save("budget.xlsx")?;
Sheet indices are zero-based; cell refs use standard spreadsheet notation (A1, AA12).
Format-agnostic IR
DocumentIR is the structural bridge between formats — it powers to_html, save_as, and legacy-format conversion. It implements Serialize / Deserialize, so you can emit JSON for downstream tooling.
let legacy = Document::open("old.doc")?;
legacy.save_as("migrated.docx")?; // CFB → OOXML in one line
Open from bytes
use std::io::Cursor;
use office_oxide::{Document, DocumentFormat};
let bytes = std::fs::read("file.pptx")?;
let doc = Document::from_reader(Cursor::new(bytes), DocumentFormat::Pptx)?;
Memory-mapped opens
With the mmap feature, Document::open_mmap avoids copying large OOXML files into the heap:
let doc = Document::open_mmap("huge.xlsx")?;
Only DOCX/XLSX/PPTX are mmap-able; the legacy CFB parsers require owned buffers.
Errors
All fallible entry points return office_oxide::Result<T> — i.e. Result<T, OfficeError>. The error enum covers IO, parse, unsupported-format, and extraction failures.
use office_oxide::{Document, OfficeError};
match Document::open("weird.file") {
Ok(doc) => println!("{}", doc.plain_text()),
Err(OfficeError::UnsupportedFormat(ext)) => eprintln!("cannot open .{ext}"),
Err(e) => eprintln!("failed: {e}"),
}
Troubleshooting
| Symptom | Likely cause |
|---|---|
UnsupportedFormat("(none)") |
Path has no extension — open via from_reader with an explicit DocumentFormat. |
| Garbled DOC text | Source is encrypted or uses an uncommon piece-table encoding. Verify CFB magic D0 CF 11 E0. |
| Missing hyperlinks in DOCX | Hyperlinks resolve via w:rels. Verify the .rels sidecar is present in the ZIP. |
| Stack overflow on tiny-stack threads | office_oxide spawns a 16 MB parse thread when RLIMIT_STACK < 12 MB; in custom thread pools, set Builder::stack_size(16 * 1024 * 1024). |
See also
- Python Quick Start — the same API in Python
- Performance benchmarks — full numbers across 6,062 files
- Architecture — module layout and design decisions
- Crate on crates.io, docs on docs.rs