Skip to content

Office Oxide CLI — Quick Start

office-oxide is a command-line tool for fast, local Office document processing. It ships the same Rust core that powers the library — zero cloud, zero dependencies.

Install

Cargo (any platform):

cargo install office_oxide_cli

cargo-binstall (pre-built binary):

cargo binstall office_oxide_cli

From source:

git clone https://github.com/yfedoseev/office_oxide
cd office_oxide
cargo install --path crates/office_oxide_cli

The installed binary is office-oxide.

Quick Start

# Extract plain text
office-oxide text report.docx

# Convert to Markdown
office-oxide markdown data.xlsx -o data.md

# Convert to HTML
office-oxide html slides.pptx -o slides.html

# Dump the format-agnostic IR as JSON
office-oxide ir document.docx -o document.ir.json

# Convert legacy DOC → modern DOCX
office-oxide convert old.doc modern.docx

Run office-oxide --help for the full flag list, or office-oxide <command> --help for any specific command.

Commands

Command Description
text Extract plain UTF-8 text
markdown Convert to GitHub-flavored Markdown
html Convert to semantic HTML
ir Dump the format-agnostic IR as JSON
convert Convert between formats (legacy → OOXML, OOXML → OOXML)
info Show format, page/sheet/slide counts, and metadata

All commands accept any of the six supported formats: .docx, .xlsx, .pptx, .doc, .xls, .ppt.

Global options

-o, --output <PATH>   Output file (defaults to stdout for text outputs)
-v, --verbose         Show timing information
-q, --quiet           Suppress non-essential output
    --json            Wrap output in a JSON envelope

Examples

Extract text from a spreadsheet:

office-oxide text quarterly.xlsx

Migrate a corpus of legacy .doc files in parallel:

find legacy/ -iname '*.doc' | \
  parallel 'office-oxide convert {} modern/{/.}.docx'

Convert a deck for an LLM pipeline:

office-oxide markdown deck.pptx -o deck.md

Inspect a file:

office-oxide info mystery.bin
# format: xlsx, sheets: 4, named_ranges: 12, ...

Pipe through jq:

office-oxide ir report.docx | jq '.sections[].title'

Stdin / stdout

text, markdown, html, and ir write to stdout by default — handy for pipelines:

office-oxide text report.docx | grep -i "executive summary"

When --output is given, the result is written to that file instead.

See also