Advanced Search Options¶
For fine-grained control over search behavior, use SearchOptions. This is useful for full-text search applications that need highlighting, summarization, or custom scoring.
SearchOptions¶
from polars_redis.options import SearchOptions
import polars_redis as pr
opts = SearchOptions(
index="articles_idx",
query="python programming",
verbatim=True, # Disable stemming
language="english", # Stemming language
scorer="BM25", # Scoring algorithm
dialect=4, # RediSearch dialect
)
df = pr.search_hashes(
"redis://localhost:6379",
options=opts,
schema={"title": pl.Utf8, "body": pl.Utf8},
).collect()
Highlighting¶
Wrap matching terms in custom tags for display:
opts = SearchOptions(
index="articles_idx",
query="python",
).with_highlight(
fields=["title", "body"],
open_tag="<em>",
close_tag="</em>",
)
df = pr.search_hashes(url, options=opts, schema=schema).collect()
# Results have matching terms wrapped: "<em>Python</em> is a great language"
Summarization¶
Generate text snippets with matched terms (useful for search result previews):
opts = SearchOptions(
index="articles_idx",
query="machine learning",
).with_summarize(
fields=["body"],
frags=3, # Number of fragments
len=30, # Fragment length in words
separator="...", # Between fragments
)
Relevance Scores¶
Include relevance scores in results:
opts = SearchOptions(
index="articles_idx",
query="python tutorial",
).with_score(True, "_relevance")
df = pr.search_hashes(url, options=opts, schema=schema).collect()
# Results include _relevance column with BM25 scores
Query Modifiers Reference¶
| Option | Description |
|---|---|
verbatim |
Disable stemming for exact term matching |
no_stopwords |
Include stop words in the query |
language |
Language for stemming (e.g., "english", "spanish", "french") |
scorer |
Scoring function: "BM25", "TFIDF", "DISMAX" |
expander |
Query expander: "SYNONYM" |
slop |
Default slop for phrase queries |
in_order |
Require phrase terms in order |
dialect |
RediSearch dialect version (1-4) |
Filtering Options Reference¶
| Option | Description |
|---|---|
in_keys |
Limit search to specific document keys |
in_fields |
Limit search to specific fields |
timeout_ms |
Query timeout in milliseconds |
Smart Scan¶
smart_scan() automatically detects whether a RediSearch index exists and optimizes query execution accordingly.
Basic Usage¶
from polars_redis import smart_scan
# Auto-detect index and use FT.SEARCH if available
df = smart_scan(
url,
"user:*",
schema={"name": pl.Utf8, "age": pl.Int64},
).filter(pl.col("age") > 30).collect()
If an index exists for the pattern, it uses FT.SEARCH. Otherwise, it falls back to SCAN.
Query Explanation¶
See how a query will execute before running it:
from polars_redis import explain_scan
plan = explain_scan(url, "user:*", schema={"name": pl.Utf8})
print(plan.explain())
# Strategy: SEARCH
# Index: users_idx
# Prefixes: user:
# Type: HASH
# Server Query: *
Execution Strategies¶
| Strategy | When Used | Description |
|---|---|---|
| SEARCH | Index found | Uses FT.SEARCH for server-side filtering |
| SCAN | No index | Falls back to Redis SCAN |
| HYBRID | Partial pushdown | FT.SEARCH + client-side filtering |
Index Discovery¶
from polars_redis import list_indexes, find_index_for_pattern
# List all RediSearch indexes
indexes = list_indexes(url)
for idx in indexes:
print(f"{idx.name}: prefixes={idx.prefixes}")
# Find index for specific pattern
idx = find_index_for_pattern(url, "user:*")
if idx:
print(f"Found index: {idx.name}")
Explicit Index Control¶
# Use a specific index by name
df = smart_scan(
url, "user:*",
schema={"name": pl.Utf8},
index="users_idx",
auto_detect_index=False,
)
# Use an Index object (auto-creates if needed)
from polars_redis import Index, TextField, NumericField
idx = Index(
name="users_idx",
prefix="user:",
schema=[TextField("name"), NumericField("age")],
)
df = smart_scan(url, "user:*", schema=schema, index=idx).collect()
Graceful Degradation¶
When RediSearch is unavailable, smart_scan falls back to SCAN without errors:
# Works whether RediSearch is available or not
df = smart_scan(url, "user:*", schema=schema).collect()
search_hashes Parameters Reference¶
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str | required | Redis connection URL |
index |
str | required | RediSearch index name |
query |
str or Expr | "*" |
Query string or expression |
schema |
dict | required | Field names to Polars dtypes |
include_key |
bool | True |
Include Redis key as column |
key_column_name |
str | "_key" |
Name of key column |
include_ttl |
bool | False |
Include TTL as column |
ttl_column_name |
str | "_ttl" |
Name of TTL column |
batch_size |
int | 1000 |
Documents per batch |
sort_by |
str | None |
Field to sort by |
sort_ascending |
bool | True |
Sort direction |
options |
SearchOptions | None |
Advanced search options |
smart_scan Parameters Reference¶
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str | required | Redis connection URL |
pattern |
str | "*" |
Key pattern to match |
schema |
dict | required | Field names to Polars dtypes |
index |
str or Index | None |
Force use of specific index |
include_key |
bool | True |
Include Redis key as column |
key_column_name |
str | "_key" |
Name of key column |
include_ttl |
bool | False |
Include TTL as column |
ttl_column_name |
str | "_ttl" |
Name of TTL column |
batch_size |
int | 1000 |
Documents per batch |
auto_detect_index |
bool | True |
Auto-detect matching indexes |