stats

Show dataset statistics.

Alias: st

Synopsis

unimorph stats [OPTIONS] [LANG]

Description

Displays statistics about a downloaded language dataset, including entry counts, unique lemmas, unique forms, and unique feature combinations.

Arguments

ArgumentDescription
[LANG]Language code (ISO 639-3). Optional if default is configured.

Options

OptionDescription
--jsonOutput as JSON

Examples

Basic Statistics

unimorph stats heb
Statistics for heb:
  Total entries:    33177
  Unique lemmas:    1176
  Unique forms:     27286
  Unique features:  55
  Imported at:      2024-01-15 10:30:00 UTC

JSON Output

unimorph stats heb --json
{
  "total_entries": 33177,
  "unique_lemmas": 1176,
  "unique_forms": 27286,
  "unique_features": 55
}

Compare Languages

for lang in heb ita fin deu; do
  echo "=== $lang ==="
  unimorph stats "$lang"
  echo
done

Scripting

# Get entry count
unimorph stats heb --json | jq '.total_entries'

# Compare sizes
unimorph list | while read lang; do
  count=$(unimorph stats "$lang" --json | jq '.total_entries')
  echo "$lang: $count"
done | sort -t: -k2 -rn

Understanding the Statistics

MetricDescription
Total entriesNumber of (lemma, form, features) triples
Unique lemmasNumber of distinct dictionary forms
Unique formsNumber of distinct surface forms
Unique featuresNumber of distinct feature bundle combinations
Imported atWhen the dataset was downloaded

See Also

  • info - More detailed language information
  • list - List all cached languages