Skip to content

Advanced Search Options

For fine-grained control over search behavior, use SearchOptions. This is useful for full-text search applications that need highlighting, summarization, or custom scoring.

SearchOptions

from polars_redis.options import SearchOptions
import polars_redis as pr

opts = SearchOptions(
    index="articles_idx",
    query="python programming",
    verbatim=True,           # Disable stemming
    language="english",       # Stemming language
    scorer="BM25",           # Scoring algorithm
    dialect=4,               # RediSearch dialect
)

df = pr.search_hashes(
    "redis://localhost:6379",
    options=opts,
    schema={"title": pl.Utf8, "body": pl.Utf8},
).collect()

Highlighting

Wrap matching terms in custom tags for display:

opts = SearchOptions(
    index="articles_idx",
    query="python",
).with_highlight(
    fields=["title", "body"],
    open_tag="<em>",
    close_tag="</em>",
)

df = pr.search_hashes(url, options=opts, schema=schema).collect()
# Results have matching terms wrapped: "<em>Python</em> is a great language"

Summarization

Generate text snippets with matched terms (useful for search result previews):

opts = SearchOptions(
    index="articles_idx",
    query="machine learning",
).with_summarize(
    fields=["body"],
    frags=3,          # Number of fragments
    len=30,           # Fragment length in words
    separator="...",  # Between fragments
)

Relevance Scores

Include relevance scores in results:

opts = SearchOptions(
    index="articles_idx",
    query="python tutorial",
).with_score(True, "_relevance")

df = pr.search_hashes(url, options=opts, schema=schema).collect()
# Results include _relevance column with BM25 scores

Query Modifiers Reference

Option Description
verbatim Disable stemming for exact term matching
no_stopwords Include stop words in the query
language Language for stemming (e.g., "english", "spanish", "french")
scorer Scoring function: "BM25", "TFIDF", "DISMAX"
expander Query expander: "SYNONYM"
slop Default slop for phrase queries
in_order Require phrase terms in order
dialect RediSearch dialect version (1-4)

Filtering Options Reference

Option Description
in_keys Limit search to specific document keys
in_fields Limit search to specific fields
timeout_ms Query timeout in milliseconds

Smart Scan

smart_scan() automatically detects whether a RediSearch index exists and optimizes query execution accordingly.

Basic Usage

from polars_redis import smart_scan

# Auto-detect index and use FT.SEARCH if available
df = smart_scan(
    url,
    "user:*",
    schema={"name": pl.Utf8, "age": pl.Int64},
).filter(pl.col("age") > 30).collect()

If an index exists for the pattern, it uses FT.SEARCH. Otherwise, it falls back to SCAN.

Query Explanation

See how a query will execute before running it:

from polars_redis import explain_scan

plan = explain_scan(url, "user:*", schema={"name": pl.Utf8})
print(plan.explain())
# Strategy: SEARCH
# Index: users_idx
#   Prefixes: user:
#   Type: HASH
# Server Query: *

Execution Strategies

Strategy When Used Description
SEARCH Index found Uses FT.SEARCH for server-side filtering
SCAN No index Falls back to Redis SCAN
HYBRID Partial pushdown FT.SEARCH + client-side filtering

Index Discovery

from polars_redis import list_indexes, find_index_for_pattern

# List all RediSearch indexes
indexes = list_indexes(url)
for idx in indexes:
    print(f"{idx.name}: prefixes={idx.prefixes}")

# Find index for specific pattern
idx = find_index_for_pattern(url, "user:*")
if idx:
    print(f"Found index: {idx.name}")

Explicit Index Control

# Use a specific index by name
df = smart_scan(
    url, "user:*",
    schema={"name": pl.Utf8},
    index="users_idx",
    auto_detect_index=False,
)

# Use an Index object (auto-creates if needed)
from polars_redis import Index, TextField, NumericField

idx = Index(
    name="users_idx",
    prefix="user:",
    schema=[TextField("name"), NumericField("age")],
)

df = smart_scan(url, "user:*", schema=schema, index=idx).collect()

Graceful Degradation

When RediSearch is unavailable, smart_scan falls back to SCAN without errors:

# Works whether RediSearch is available or not
df = smart_scan(url, "user:*", schema=schema).collect()

search_hashes Parameters Reference

Parameter Type Default Description
url str required Redis connection URL
index str required RediSearch index name
query str or Expr "*" Query string or expression
schema dict required Field names to Polars dtypes
include_key bool True Include Redis key as column
key_column_name str "_key" Name of key column
include_ttl bool False Include TTL as column
ttl_column_name str "_ttl" Name of TTL column
batch_size int 1000 Documents per batch
sort_by str None Field to sort by
sort_ascending bool True Sort direction
options SearchOptions None Advanced search options

smart_scan Parameters Reference

Parameter Type Default Description
url str required Redis connection URL
pattern str "*" Key pattern to match
schema dict required Field names to Polars dtypes
index str or Index None Force use of specific index
include_key bool True Include Redis key as column
key_column_name str "_key" Name of key column
include_ttl bool False Include TTL as column
ttl_column_name str "_ttl" Name of TTL column
batch_size int 1000 Documents per batch
auto_detect_index bool True Auto-detect matching indexes