Skip to content

polars-redis

Query Redis like a database. Transform with Polars. Write back without ETL.

New to Polars?

Polars is a lightning-fast DataFrame library for Python and Rust, designed for performance and ease of use. Check out the Polars User Guide to get started.

polars-redis brings Redis into your Polars analytics workflows as a first-class data source - scan hashes, JSON, strings, sets, lists, and sorted sets directly into LazyFrames with projection pushdown and batched iteration.

What is polars-redis?

polars-redis makes Redis just another connector alongside Parquet, CSV, and databases:

import polars as pl
import polars_redis as redis

url = "redis://localhost:6379"

# Enrich Redis data with external source, write back
users = redis.read_hashes(url, "user:*", {"user_id": pl.Utf8, "region": pl.Utf8})
purchases = pl.read_parquet("purchases.parquet")

high_value = (
    users.join(purchases, on="user_id")
    .group_by("user_id")
    .agg(pl.col("amount").sum().alias("lifetime_value"))
    .filter(pl.col("lifetime_value") > 1000)
)

redis.write_hashes(high_value, url, key_prefix="whale:")

When to Use What

Your data Use Why
User profiles, configs scan_hashes() Field-level access, projection pushdown
Nested documents scan_json() Full document structure
Counters, flags, caches scan_strings() Simple key-value
Tags, memberships scan_sets() Unique members
Queues, recent items scan_lists() Ordered elements
Leaderboards, rankings scan_zsets() Score-based ordering
Event logs scan_streams() Timestamped entries
Metrics scan_timeseries() Server-side aggregation

Features

  • Scan Redis data as Polars LazyFrames
    • Hashes, JSON documents, strings, sets, lists, sorted sets, streams, and time series
    • Projection pushdown (only fetch requested fields)
    • Batched iteration for memory efficiency
    • Parallel fetching for improved throughput
  • DataFrame caching with automatic chunking
    • Cache DataFrames in Redis with Arrow IPC or Parquet format
    • Built-in compression (lz4, zstd, gzip, snappy)
    • Auto-chunking for large DataFrames (no 512MB limit)
    • @cache decorator for function memoization
    • TTL support for automatic expiration
  • Real-time streaming from Pub/Sub and Streams
    • collect_pubsub() - Collect messages into DataFrames
    • read_stream() - Consumer group support with acknowledgment
    • Batch iterators for continuous processing
  • RediSearch integration for predicate pushdown
    • search_hashes() - Server-side filtering with FT.SEARCH
    • aggregate_hashes() - Server-side aggregation with FT.AGGREGATE
    • Polars-like query builder: col("age") > 30
  • Write DataFrames to Redis
    • Hashes, JSON documents, strings, sets, lists, and sorted sets
    • Write modes: fail, replace, append
    • Optional TTL support
  • Schema inference from existing Redis data
  • Metadata columns for keys, TTL, and row indices

Supported Redis Types

Redis Type Scan Write Notes
Hash Yes Yes Field-level projection pushdown
JSON Yes Yes Requires RedisJSON module
String Yes Yes Configurable value type
Set Yes Yes One row per member
List Yes Yes One row per element, optional position
Sorted Set Yes Yes One row per member with score
Stream Yes No One row per entry with timestamp
TimeSeries Yes No Server-side aggregation support

Quick Start

import polars as pl
import polars_redis as redis

url = "redis://localhost:6379"

# Scan with schema
lf = redis.scan_hashes(
    url,
    pattern="user:*",
    schema={"name": pl.Utf8, "age": pl.Int64, "active": pl.Boolean},
)

# Filter and collect (projection pushdown applies)
active_users = lf.filter(pl.col("active")).select(["name", "age"]).collect()

# Write with TTL (1 hour)
redis.write_hashes(active_users, url, key_prefix="cache:user:", ttl=3600)

DataFrame Caching

Cache expensive computations with a single decorator - no manual serialization needed:

import polars_redis as redis

@redis.cache(url="redis://localhost:6379", ttl=3600)
def expensive_query(start_date: str, end_date: str) -> pl.DataFrame:
    # Complex computation...
    return df

# First call: computes and caches
result = expensive_query("2024-01-01", "2024-12-31")

# Second call: instant cache hit
result = expensive_query("2024-01-01", "2024-12-31")

Or cache directly with full control:

# Cache with compression and TTL
redis.cache_dataframe(df, url, "result", compression="zstd", ttl=3600)

# Retrieve later
df = redis.get_cached_dataframe(url, "result")

Why polars-redis for caching?

  • No boilerplate - One line vs 10+ lines of manual PyArrow/pickle serialization
  • Auto-chunking - Handles DataFrames of any size (bypasses Redis 512MB limit)
  • Built-in compression - lz4, zstd, gzip, snappy with configurable levels
  • Cache metadata - cache_info(), cache_ttl(), cache_exists()
  • Native Polars - No pandas conversion overhead

Requirements

  • Python 3.9+
  • Redis 7.0+ (RedisJSON module for JSON support)

License

MIT or Apache-2.0