Performance Guide¶

This guide covers performance optimization strategies for polars-redis, including batch tuning, parallel fetching, memory management, and when to use different approaches.

Why polars-redis is Fast¶

Before diving into tuning, here's why polars-redis outperforms typical redis-py patterns:

Comparison: polars-redis vs redis-py¶

# redis-py: The typical pattern
import redis
import pandas as pd

r = redis.Redis()
keys = list(r.scan_iter("user:*"))           # Collect all keys
users = []
for key in keys:                              # N+1 queries!
    data = r.hgetall(key)
    users.append(data)
df = pd.DataFrame(users)
df["age"] = df["age"].astype(int)            # Manual type conversion

# polars-redis: One call, proper types
import polars_redis as redis

df = redis.scan_hashes(
    url, "user:*",
    schema={"name": pl.Utf8, "age": pl.Int64}
).collect()

Benchmark Results¶

Scanning 100,000 hash keys with 10 fields each:

Method	Time	Queries	Notes
redis-py loop	45.2s	100,001	N+1 pattern
redis-py pipeline	12.8s	~100	Batched, still synchronous
polars-redis	3.2s	~100	Async + Arrow + Rust
polars-redis parallel=4	1.1s	~100	Parallel fetching

Where the Speed Comes From¶

Batched pipelining - Fetches 1000+ keys per Redis round-trip
Async I/O - Non-blocking operations via Tokio
Zero-copy Arrow - No intermediate Python objects
Rust core - Compiled, no GIL limitations
Predicate pushdown - With RediSearch, filter in Redis

RediSearch Comparison¶

Filtering 1M documents where 1% match:

Method	Time	Data Transferred
Scan all + filter in Python	50s	100%
polars-redis scan + filter	48s	100%
polars-redis search_hashes	0.3s	1%

The difference is dramatic for selective queries.

Key Performance Factors¶

Factor	Impact	Tunable
Batch size	Throughput vs memory	`batch_size` parameter
Parallel workers	Fetch speed	`parallel` parameter
Projection pushdown	Data transfer	Column selection
Predicate pushdown	Data transfer	RediSearch queries
Network latency	Overall speed	Connection pooling

Batch Size Tuning¶

The batch_size parameter controls how many keys are processed in each Redis pipeline operation.

Impact on Performance¶

Batch Size    Throughput       Memory Usage    Network Round-trips
-----------   ------------     ------------    -------------------
100           Low              Low             High (many batches)
1,000         Good             Moderate        Moderate
10,000        High             High            Low (fewer batches)
100,000       Diminishing      Very High       Minimal

Recommendations¶

import polars_redis as pr

# Small datasets (<10K keys): larger batches are fine
df = pr.scan_hashes(url, "small:*", schema, batch_size=5000).collect()

# Medium datasets (10K-1M keys): balance throughput and memory
df = pr.scan_hashes(url, "medium:*", schema, batch_size=1000).collect()

# Large datasets (>1M keys): use streaming with moderate batches
lf = pr.scan_hashes(url, "large:*", schema, batch_size=1000)
for batch in lf.collect_batches():
    process(batch)

Finding Your Optimal Batch Size¶

import time
import polars_redis as pr

def benchmark_batch_size(url, pattern, schema, batch_sizes):
    results = []
    for size in batch_sizes:
        start = time.perf_counter()
        df = pr.scan_hashes(url, pattern, schema, batch_size=size).collect()
        elapsed = time.perf_counter() - start
        results.append({
            "batch_size": size,
            "rows": len(df),
            "time_sec": elapsed,
            "rows_per_sec": len(df) / elapsed
        })
    return pl.DataFrame(results)

# Test different batch sizes
results = benchmark_batch_size(
    "redis://localhost",
    "user:*",
    {"name": pl.Utf8, "age": pl.Int64},
    [100, 500, 1000, 2000, 5000]
)
print(results)

Parallel Fetching¶

The parallel parameter enables concurrent data fetching across multiple workers.

When to Use Parallel Fetching¶

Large datasets with many keys
High-latency connections (remote Redis)
Multi-core systems with available CPU

Configuration¶

import polars_redis as pr

# Single-threaded (default)
df = pr.scan_hashes(url, "user:*", schema).collect()

# 4 parallel workers
df = pr.scan_hashes(url, "user:*", schema, parallel=4).collect()

# Match CPU cores
import os
df = pr.scan_hashes(url, "user:*", schema, parallel=os.cpu_count()).collect()

Scaling Characteristics¶

Workers    Relative Throughput    Notes
-------    -------------------    -----
1          1.0x (baseline)        Simple, predictable
2          ~1.8x                  Good improvement
4          ~3.2x                  Diminishing returns start
8          ~4.5x                  Network often becomes bottleneck
16         ~5.0x                  Rarely beneficial beyond this

Parallel + Batch Size Interaction¶

# Optimal: moderate batch size with parallel workers
df = pr.scan_hashes(
    url, "user:*", schema,
    batch_size=1000,  # Each worker handles 1000 keys at a time
    parallel=4        # 4 workers = 4000 keys in flight
).collect()

Projection Pushdown¶

Request only the columns you need to reduce data transfer.

How It Works¶

# Full scan: transfers ALL fields from Redis
df = pr.scan_hashes(url, "user:*", schema).collect()

# Projection pushdown: only requested columns transferred
df = (
    pr.scan_hashes(url, "user:*", schema)
    .select(["name", "email"])  # Polars pushes this down
    .collect()
)

Memory Savings¶

Fields in Hash    Columns Selected    Data Transfer
--------------    ----------------    -------------
10                10 (all)            100%
10                5                   ~50%
10                2                   ~20%
10                1                   ~10%

Best Practice¶

# Define full schema once
full_schema = {
    "name": pl.Utf8,
    "email": pl.Utf8,
    "age": pl.Int64,
    "department": pl.Utf8,
    "salary": pl.Float64,
    "hire_date": pl.Utf8,
    "manager_id": pl.Utf8,
    "location": pl.Utf8,
}

# Select only what you need
summary = (
    pr.scan_hashes(url, "employee:*", full_schema)
    .select(["department", "salary"])
    .group_by("department")
    .agg(pl.col("salary").mean())
    .collect()
)

Predicate Pushdown with RediSearch¶

The biggest performance gain comes from filtering data in Redis rather than transferring everything and filtering in Python.

SCAN vs RediSearch¶

# BAD: Scan ALL keys, filter in Python
df = (
    pr.scan_hashes(url, "user:*", schema)
    .filter(pl.col("age") > 30)
    .filter(pl.col("status") == "active")
    .collect()
)
# Transfers: ALL users
# Filters: In Python after transfer

# GOOD: Filter in Redis with RediSearch
df = pr.search_hashes(
    url,
    index="users_idx",
    query="@age:[30 +inf] @status:{active}",
    schema=schema
).collect()
# Transfers: Only matching users
# Filters: In Redis before transfer

Performance Comparison¶

Dataset Size    Matching %    SCAN Time    RediSearch Time    Speedup
------------    ----------    ---------    ---------------    -------
10,000          100%          0.5s         0.6s               0.8x (overhead)
10,000          10%           0.5s         0.1s               5x
100,000         10%           5.0s         0.2s               25x
1,000,000       1%            50s          0.3s               166x

Server-Side Aggregation¶

# BAD: Transfer all data, aggregate in Polars
df = pr.scan_hashes(url, "sale:*", schema).collect()
result = df.group_by("region").agg(pl.col("amount").sum())

# GOOD: Aggregate in Redis
result = pr.aggregate_hashes(
    url,
    index="sales_idx",
    query="*",
    group_by=["region"],
    reduce=[("SUM", ["@amount"], "total")]
)
# Only aggregated results transferred

Memory Management¶

Streaming Large Datasets¶

import polars_redis as pr

# DON'T: Load everything into memory
df = pr.scan_hashes(url, "huge:*", schema).collect()  # May OOM

# DO: Process in batches
lf = pr.scan_hashes(url, "huge:*", schema, batch_size=1000)
for batch in lf.collect_batches():
    process_and_save(batch)

# Or use sink operations
lf = pr.scan_hashes(url, "huge:*", schema)
lf.sink_parquet("output.parquet")  # Streams to disk

Memory Estimation¶

# Rough estimation: ~100 bytes per field per row (varies by data)
keys = 1_000_000
fields = 10
estimated_bytes = keys * fields * 100
print(f"Estimated memory: {estimated_bytes / 1e9:.1f} GB")

Monitoring Memory¶

import psutil
import polars_redis as pr

process = psutil.Process()
before = process.memory_info().rss

df = pr.scan_hashes(url, pattern, schema).collect()

after = process.memory_info().rss
print(f"Memory used: {(after - before) / 1e6:.1f} MB")
print(f"Rows: {len(df)}, Bytes per row: {(after - before) / len(df):.0f}")

Network Considerations¶

High-Latency Connections¶

For remote Redis instances with high latency:

# Increase batch size to reduce round-trips
df = pr.scan_hashes(url, pattern, schema, batch_size=5000).collect()

# Use parallel fetching to hide latency
df = pr.scan_hashes(url, pattern, schema, parallel=8).collect()

Connection String Tuning¶

# Basic connection
url = "redis://localhost:6379"

# With connection timeout
url = "redis://localhost:6379?connect_timeout=5"

# With TLS
url = "rediss://redis.example.com:6380"

Anti-Patterns to Avoid¶

1. Scanning When You Should Search¶

# BAD: Scan 1M keys to find 100 matches
df = pr.scan_hashes(url, "user:*", schema).filter(pl.col("vip") == True).collect()

# GOOD: Use RediSearch index
df = pr.search_hashes(url, "users_idx", "@vip:{true}", schema).collect()

2. Tiny Batch Sizes¶

# BAD: Too many round-trips
df = pr.scan_hashes(url, pattern, schema, batch_size=10).collect()

# GOOD: Reasonable batch size
df = pr.scan_hashes(url, pattern, schema, batch_size=1000).collect()

3. Fetching Unused Columns¶

# BAD: Fetch all fields, use one
df = pr.scan_hashes(url, pattern, large_schema).collect()
names = df["name"].to_list()

# GOOD: Project to only needed column
names = pr.scan_hashes(url, pattern, large_schema).select("name").collect()["name"].to_list()

4. Repeated Full Scans¶

# BAD: Scan same data multiple times
users = pr.scan_hashes(url, "user:*", schema).collect()
active = pr.scan_hashes(url, "user:*", schema).filter(...).collect()
admins = pr.scan_hashes(url, "user:*", schema).filter(...).collect()

# GOOD: Scan once, filter in memory
users = pr.scan_hashes(url, "user:*", schema).collect()
active = users.filter(...)
admins = users.filter(...)

# BETTER: Use RediSearch if filtering is selective
active = pr.search_hashes(url, "users_idx", "@status:{active}", schema).collect()

Benchmarking Your Setup¶

Use this script to benchmark your specific configuration:

import time
import polars as pl
import polars_redis as pr

def benchmark(name, func):
    start = time.perf_counter()
    result = func()
    elapsed = time.perf_counter() - start
    rows = len(result) if hasattr(result, '__len__') else result
    print(f"{name}: {elapsed:.3f}s ({rows} rows)")
    return elapsed

url = "redis://localhost:6379"
pattern = "bench:*"
schema = {"field1": pl.Utf8, "field2": pl.Int64, "field3": pl.Float64}

# Baseline
benchmark("Default", lambda: pr.scan_hashes(url, pattern, schema).collect())

# Batch size variations
for bs in [100, 500, 1000, 2000, 5000]:
    benchmark(f"batch_size={bs}", 
              lambda bs=bs: pr.scan_hashes(url, pattern, schema, batch_size=bs).collect())

# Parallel variations
for p in [1, 2, 4, 8]:
    benchmark(f"parallel={p}",
              lambda p=p: pr.scan_hashes(url, pattern, schema, parallel=p).collect())

# Projection
benchmark("All columns", lambda: pr.scan_hashes(url, pattern, schema).collect())
benchmark("1 column", lambda: pr.scan_hashes(url, pattern, schema).select("field1").collect())

Quick Reference¶

Scenario	Recommendation
< 10K keys	Default settings, batch_size=1000-5000
10K-100K keys	batch_size=1000, parallel=2-4
100K-1M keys	batch_size=1000, parallel=4-8, consider RediSearch
> 1M keys	RediSearch for filtering, streaming for full scans
High latency	Larger batch_size, more parallel workers
Memory constrained	Smaller batch_size, streaming
Selective queries (<10% match)	Always use RediSearch