Fuzzy Functions¶
Fuzzy matching and string similarity functions.
Summary¶
| Function | Signature | Description |
|---|---|---|
damerau_levenshtein |
string, string -> number |
Damerau-Levenshtein distance |
hamming |
string, string -> number\|null |
Hamming distance (number of differing positions). Returns null if strings have different lengths |
jaro |
string, string -> number |
Jaro similarity (0-1) |
jaro_winkler |
string, string -> number |
Jaro-Winkler similarity (0-1) |
levenshtein |
string, string -> number |
Levenshtein edit distance |
normalized_damerau_levenshtein |
string, string -> number |
Normalized Damerau-Levenshtein similarity (0-1) |
normalized_levenshtein |
string, string -> number |
Normalized Levenshtein (0-1) |
osa_distance |
string, string -> number |
Optimal String Alignment distance (like Levenshtein but allows adjacent transpositions) |
sorensen_dice |
string, string -> number |
Sorensen-Dice coefficient (0-1) |
Functions¶
damerau_levenshtein¶
Damerau-Levenshtein distance
Signature: string, string -> number
Examples:
# Single transposition
damerau_levenshtein('ab', 'ba') -> 1
# Identical strings
damerau_levenshtein('hello', 'hello') -> 0
# Multiple edits
damerau_levenshtein('ca', 'abc') -> 2
CLI Usage:
hamming¶
Hamming distance (number of differing positions). Returns null if strings have different lengths
Signature: string, string -> number|null
Examples:
# Three differing positions
hamming('karolin', 'kathrin') -> 3
# Identical strings
hamming('hello', 'hello') -> 0
# Different lengths
hamming('hello', 'hi') -> null
CLI Usage:
jaro¶
Jaro similarity (0-1)
Signature: string, string -> number
Examples:
# Similar words
jaro('hello', 'hallo') -> 0.866...
# Identical strings
jaro('hello', 'hello') -> 1.0
# Completely different
jaro('abc', 'xyz') -> 0.0
CLI Usage:
jaro_winkler¶
Jaro-Winkler similarity (0-1)
Signature: string, string -> number
Examples:
# Similar words
jaro_winkler('hello', 'hallo') -> 0.88
# Identical strings
jaro_winkler('hello', 'hello') -> 1.0
# Completely different
jaro_winkler('abc', 'xyz') -> 0.0
CLI Usage:
levenshtein¶
Levenshtein edit distance
Signature: string, string -> number
Examples:
# Classic example
levenshtein('kitten', 'sitting') -> 3
# Identical strings
levenshtein('hello', 'hello') -> 0
# All different
levenshtein('abc', 'def') -> 3
CLI Usage:
normalized_damerau_levenshtein¶
Normalized Damerau-Levenshtein similarity (0-1)
Signature: string, string -> number
Examples:
# Identical strings
normalized_damerau_levenshtein('hello', 'hello') -> 1.0
# Transposition
normalized_damerau_levenshtein('ab', 'ba') -> 0.5
# Completely different
normalized_damerau_levenshtein('abc', 'xyz') -> 0.0
CLI Usage:
normalized_levenshtein¶
Normalized Levenshtein (0-1)
Signature: string, string -> number
Examples:
# One edit
normalized_levenshtein('ab', 'abc') -> 0.666...
# Identical
normalized_levenshtein('hello', 'hello') -> 0.0
# All different
normalized_levenshtein('abc', 'xyz') -> 1.0
CLI Usage:
osa_distance¶
Optimal String Alignment distance (like Levenshtein but allows adjacent transpositions)
Signature: string, string -> number
Examples:
# Single transposition
osa_distance('ab', 'ba') -> 1
# Identical strings
osa_distance('hello', 'hello') -> 0
# Multiple edits
osa_distance('ca', 'abc') -> 3
CLI Usage:
sorensen_dice¶
Sorensen-Dice coefficient (0-1)
Signature: string, string -> number
Examples:
# Similar words
sorensen_dice('night', 'nacht') -> 0.25
# Identical strings
sorensen_dice('hello', 'hello') -> 1.0
# No common bigrams
sorensen_dice('abc', 'xyz') -> 0.0
CLI Usage: