Introduction
unimorph-rs is a complete Rust toolkit for working with UniMorph morphological data. It provides both a command-line interface and a Rust library for downloading, querying, and analyzing morphological inflection data across 180+ languages.
What is UniMorph?
UniMorph is a collaborative project providing morphological paradigms for the world's languages. Each language dataset contains entries mapping lemmas (dictionary forms) to their inflected forms along with morphological feature annotations.
For example, in Spanish:
| Lemma | Form | Features |
|---|---|---|
| hablar | hablo | V;IND;PRS;1;SG |
| hablar | hablas | V;IND;PRS;2;SG |
| hablar | habla | V;IND;PRS;3;SG |
| hablar | hablamos | V;IND;PRS;1;PL |
Features
- Fast lookups: SQLite-backed storage with indexed queries
- 180+ languages: Access to all UniMorph language datasets
- Transparent decompression: Handles
.xz,.gz, and.zipcompressed datasets automatically - Flexible querying: Search by lemma, form, features, or part of speech
- Multiple output formats: Table, JSON, TSV for scripting
- Pipe-friendly: Output designed for Unix pipelines
- Offline-first: Data cached locally after download
- Library + CLI: Use as a Rust library or command-line tool
Use Cases
- Language learners: Look up conjugations and declensions
- NLP researchers: Training data for morphological models
- Lexicographers: Verify inflection paradigms
- Educators: Build conjugation practice tools
- Linguists: Cross-linguistic morphological analysis
Quick Example
# Download Hebrew dataset
unimorph download heb
# Look up all forms of a verb
unimorph inflect -l heb כתב
# Analyze a surface form
unimorph analyze -l heb כתבתי
# Search for plural masculine forms
unimorph search -l heb --contains PL,MASC --limit 10
Getting Started
Head to the Installation guide to get started, or jump straight to the Quick Start for a hands-on introduction.