Index Management¶

polars-redis provides typed helpers for creating and managing RediSearch indexes, eliminating the need to write raw FT.CREATE commands.

Overview¶

Instead of manually constructing FT.CREATE commands:

# Manual approach
FT.CREATE users_idx ON HASH PREFIX 1 user: SCHEMA \
    name TEXT SORTABLE \
    age NUMERIC SORTABLE \
    status TAG

Use the typed Python API:

from polars_redis import Index, TextField, NumericField, TagField

idx = Index(
    name="users_idx",
    prefix="user:",
    schema=[
        TextField("name", sortable=True),
        NumericField("age", sortable=True),
        TagField("status"),
    ]
)
idx.create("redis://localhost:6379")

Field Types¶

TextField¶

Full-text search fields with stemming, phonetic matching, and relevance scoring.

from polars_redis import TextField

# Basic text field
TextField("title")

# With options
TextField("title", sortable=True, weight=2.0)  # Higher relevance weight
TextField("body", nostem=True)  # Disable stemming
TextField("name", phonetic="dm:en")  # Double Metaphone English
TextField("content", withsuffixtrie=True)  # Enable suffix queries (*word)

Option	Type	Default	Description
`sortable`	bool	False	Enable sorting on this field
`nostem`	bool	False	Disable stemming
`weight`	float	1.0	Relevance weight for scoring
`phonetic`	str	None	Phonetic algorithm (dm:en, dm:fr, dm:pt, dm:es)
`noindex`	bool	False	Store but don't index
`withsuffixtrie`	bool	False	Enable suffix queries

NumericField¶

Numeric fields for range queries.

from polars_redis import NumericField

NumericField("age")
NumericField("price", sortable=True)
NumericField("score", noindex=True)  # Store only

Option	Type	Default	Description
`sortable`	bool	False	Enable sorting
`noindex`	bool	False	Store but don't index

TagField¶

Exact-match fields for categories, tags, and statuses. Values are not tokenized or stemmed.

from polars_redis import TagField

TagField("status")
TagField("tags", separator="|")  # Pipe-separated values
TagField("code", casesensitive=True)  # Case-sensitive matching

Option	Type	Default	Description
`separator`	str	","	Character separating multiple values
`casesensitive`	bool	False	Case-sensitive matching
`sortable`	bool	False	Enable sorting
`noindex`	bool	False	Store but don't index
`withsuffixtrie`	bool	False	Enable suffix queries

GeoField¶

Geographic fields for radius and polygon queries.

from polars_redis import GeoField

GeoField("location")
# Store as: HSET key location "-122.4194,37.7749"

VectorField¶

Vector fields for similarity search with embeddings.

from polars_redis import VectorField

# For sentence-transformers (384 dim)
VectorField("embedding", algorithm="HNSW", dim=384, distance_metric="COSINE")

# For OpenAI embeddings (1536 dim)
VectorField("embedding", algorithm="HNSW", dim=1536, distance_metric="COSINE", m=32)

# Flat index (brute force, exact results)
VectorField("embedding", algorithm="FLAT", dim=384, distance_metric="L2")

Option	Type	Default	Description
`algorithm`	str	"HNSW"	"HNSW" (approximate) or "FLAT" (exact)
`dim`	int	384	Vector dimension
`distance_metric`	str	"COSINE"	"COSINE", "L2", or "IP" (inner product)
`initial_cap`	int	None	Initial index capacity
`m`	int	None	HNSW: edges per node (default 16)
`ef_construction`	int	None	HNSW: construction-time search width
`ef_runtime`	int	None	HNSW: query-time search width
`block_size`	int	None	FLAT: block size

GeoShapeField¶

Polygon and complex geometry fields.

from polars_redis import GeoShapeField

GeoShapeField("boundary")
GeoShapeField("region", coord_system="FLAT")  # Flat coordinate system
# Store as WKT: HSET key boundary "POLYGON((-122 37, -122 38, -121 38, -121 37, -122 37))"

Creating Indexes¶

Basic Creation¶

from polars_redis import Index, TextField, NumericField, TagField

idx = Index(
    name="products_idx",
    prefix="product:",
    schema=[
        TextField("name", sortable=True, weight=2.0),
        TextField("description"),
        NumericField("price", sortable=True),
        TagField("category"),
    ]
)

# Create the index
idx.create("redis://localhost:6379")

JSON Indexes¶

For RedisJSON documents:

idx = Index(
    name="products_idx",
    prefix="product:",
    schema=[
        TextField("name"),
        NumericField("price"),
    ],
    on="JSON",  # Index JSON documents instead of hashes
)

Index Options¶

idx = Index(
    name="articles_idx",
    prefix="article:",
    schema=[TextField("content")],

    # Multiple prefixes
    # prefix=["article:", "post:"],

    # Language and stemming
    language="english",
    # language_field="lang",  # Per-document language

    # Stopwords
    stopwords=["the", "a", "an"],  # Custom stopwords
    # stopwords=[],  # Disable stopwords

    # Memory optimizations
    nooffsets=True,  # Save memory, disable exact phrase search
    nofreqs=True,  # Save memory, affects scoring

    # Skip scanning existing keys
    skipinitialscan=True,
)

Idempotent Operations¶

ensure_exists()¶

Create the index if it doesn't exist (safe for concurrent access):

idx = Index(name="users_idx", prefix="user:", schema=[TextField("name")])

# Safe to call multiple times
idx.ensure_exists("redis://localhost:6379")
idx.ensure_exists("redis://localhost:6379")  # No-op

create() with if_not_exists¶

idx.create("redis://localhost:6379", if_not_exists=True)

Check Existence¶

if idx.exists("redis://localhost:6379"):
    print("Index already exists")

Auto-Create with search_hashes()¶

Pass an Index object to search_hashes() for automatic index creation:

from polars_redis import Index, TextField, NumericField, TagField, col, search_hashes
import polars as pl

idx = Index(
    name="users_idx",
    prefix="user:",
    schema=[
        TextField("name", sortable=True),
        NumericField("age", sortable=True),
        TagField("status"),
    ]
)

# Index is auto-created if it doesn't exist
df = search_hashes(
    "redis://localhost:6379",
    index=idx,  # Pass Index object instead of string
    query=col("age") > 30,
    schema={"name": pl.Utf8, "age": pl.Int64, "status": pl.Utf8},
).collect()

To disable auto-creation:

df = search_hashes(
    url,
    index=idx,
    query=col("age") > 30,
    schema=schema,
    create_index=False,  # Don't auto-create
).collect()

Schema Inference¶

From DataFrame¶

Infer an index schema from an existing DataFrame:

import polars as pl
from polars_redis import Index

df = pl.DataFrame({
    "name": ["Alice", "Bob"],
    "age": [30, 25],
    "department": ["engineering", "sales"],
    "salary": [100000.0, 85000.0],
})

# Infer schema (strings default to TAG)
idx = Index.from_frame(df, "employees_idx", "employee:")

# Specify which string fields should be TEXT (full-text search)
idx = Index.from_frame(
    df,
    "employees_idx",
    "employee:",
    text_fields=["name"],  # name becomes TEXT, department stays TAG
    sortable=["age", "salary"],
)

Type mapping:

Polars Type	Default Field Type
Int8-Int64, UInt8-UInt64, Float32, Float64	NUMERIC
Utf8, String	TAG (or TEXT if in text_fields)
Boolean	TAG

From Schema Dict¶

Create an index from a Polars schema dictionary:

schema = {"name": pl.Utf8, "age": pl.Int64, "active": pl.Boolean}

idx = Index.from_schema(
    schema,
    "users_idx",
    "user:",
    text_fields=["name"],
    sortable=["age"],
)

From Existing Index¶

Load an existing index definition from Redis:

idx = Index.from_redis("redis://localhost:6379", "existing_idx")
if idx:
    print(f"Index has {len(idx.schema)} fields")
    print(f"Prefix: {idx.prefix}")

Validation and Migration¶

Validate Schema¶

Check if a Polars schema matches an index:

schema = {"name": pl.Utf8, "age": pl.Int64, "email": pl.Utf8}

warnings = idx.validate_schema(schema)
for warning in warnings:
    print(warning)
# Output:
# Field 'email' in schema but not in index

Compare Schemas (diff)¶

Compare desired schema to existing index:

idx = Index(
    name="users_idx",
    prefix="user:",
    schema=[
        TextField("name"),
        NumericField("age"),
        TagField("status"),  # New field
    ]
)

diff = idx.diff("redis://localhost:6379")
print(diff)
# + status (TAG)
# ~ age: TEXT -> NUMERIC  
#   name (unchanged)

Migration¶

Destructive Operation

RediSearch doesn't support ALTER for most schema changes. Migration requires dropping and recreating the index, which re-indexes all documents.

diff = idx.diff(url)
if diff.has_changes:
    print(f"Changes needed:\n{diff}")

    # Migrate (drop and recreate)
    idx.migrate(url, drop_existing=True)

Dropping Indexes¶

# Drop index only (keep data)
idx.drop("redis://localhost:6379")

# Drop index and delete all indexed documents
idx.drop("redis://localhost:6379", delete_docs=True)

Index Information¶

Get details about an existing index:

info = idx.info("redis://localhost:6379")
if info:
    print(f"Documents indexed: {info.num_docs}")
    print(f"Index type: {info.on_type}")
    print(f"Prefixes: {info.prefixes}")
    print(f"Fields: {len(info.fields)}")

Debugging¶

View the generated FT.CREATE command:

idx = Index(
    name="users_idx",
    prefix="user:",
    schema=[
        TextField("name", sortable=True),
        NumericField("age"),
    ]
)

print(str(idx))
# FT.CREATE users_idx ON HASH PREFIX 1 user: SCHEMA name TEXT SORTABLE age NUMERIC

Complete Example¶

import polars as pl
import polars_redis as pr
from polars_redis import Index, TextField, NumericField, TagField, col

url = "redis://localhost:6379"

# 1. Create sample data
df = pl.DataFrame({
    "id": [1, 2, 3, 4, 5],
    "name": ["Alice Smith", "Bob Jones", "Charlie Brown", "Diana Ross", "Eve Wilson"],
    "age": [30, 25, 35, 28, 32],
    "department": ["engineering", "sales", "engineering", "marketing", "engineering"],
    "salary": [120000, 85000, 140000, 95000, 130000],
})

# 2. Write to Redis
pr.write_hashes(df, url, key_column="id", key_prefix="employee:")

# 3. Define index
idx = Index(
    name="employees_idx",
    prefix="employee:",
    schema=[
        TextField("name", sortable=True),
        NumericField("age", sortable=True),
        TagField("department"),
        NumericField("salary", sortable=True),
    ]
)

# 4. Create index (idempotent)
idx.ensure_exists(url)

# 5. Query with predicate pushdown
engineers = pr.search_hashes(
    url,
    index=idx,
    query=(col("department") == "engineering") & (col("salary") > 125000),
    schema={"name": pl.Utf8, "age": pl.Int64, "department": pl.Utf8, "salary": pl.Float64},
).collect()

print(engineers)
# shape: (2, 4)
# +-----------------+-----+-------------+----------+
# | name            | age | department  | salary   |
# | ---             | --- | ---         | ---      |
# | str             | i64 | str         | f64      |
# +=================+=====+=============+==========+
# | Charlie Brown   | 35  | engineering | 140000.0 |
# | Eve Wilson      | 32  | engineering | 130000.0 |
# +-----------------+-----+-------------+----------+

# 6. Clean up
idx.drop(url)

Rust API¶

The Rust API provides equivalent functionality with a builder pattern:

use polars_redis::{Index, TextField, NumericField, TagField};

let idx = Index::new("users_idx")
    .with_prefix("user:")
    .with_field(TextField::new("name").sortable())
    .with_field(NumericField::new("age").sortable())
    .with_field(TagField::new("status"));

// Create
idx.create("redis://localhost:6379")?;

// Or ensure exists
idx.ensure_exists("redis://localhost:6379")?;

// Check existence
if idx.exists("redis://localhost:6379")? {
    println!("Index exists");
}

// Drop
idx.drop("redis://localhost:6379")?;

See the Rust API Reference for full details.