Query Files¶
Query files let you store JMESPath expressions externally, making them reusable, version-controlled, and easier to maintain. jpx supports two formats:
- Simple query files - A single expression in a plain text file
- Query libraries (.jpx) - Multiple named queries in one file
Simple Query Files¶
The simplest approach: put your expression in a file and reference it with -Q:
# Create a query file
echo 'users[?active].{name: name, email: email}' > active-users.txt
# Use it
jpx -Q active-users.txt data.json
This is useful for: - Long, complex expressions you don't want to retype - Sharing queries across scripts - Version-controlling important queries
Query Libraries (.jpx)¶
For projects with multiple related queries, use a .jpx query library file. This format lets you define named queries with optional descriptions:
-- :name active-users
-- :desc Get all active users with their contact info
users[?active].{name: name, email: email}
-- :name admin-emails
-- :desc Extract just the admin email addresses
users[?role == `admin`].email
-- :name user-stats
-- :desc Summary statistics about users
{
total: length(users),
active: length(users[?active]),
admins: length(users[?role == `admin`])
}
Using Query Libraries¶
There are two ways to run a query from a library:
# Colon syntax (concise)
jpx -Q queries.jpx:active-users data.json
# Separate flag (explicit)
jpx -Q queries.jpx --query active-users data.json
Both are equivalent. Use whichever feels more natural.
Listing Available Queries¶
See what queries are in a library:
Output:
Queries in queries.jpx:
NAME DESCRIPTION
----------- ----------------------------------------
active-users Get all active users with their contact info
admin-emails Extract just the admin email addresses
user-stats Summary statistics about users
Use: jpx -Q queries.jpx:<query-name> <input>
Validating Queries¶
Check that all queries in a library are syntactically valid:
Output:
This is useful in CI pipelines to catch syntax errors before deployment.
.jpx Format Specification¶
The .jpx format is inspired by SQLDelight and HugSQL patterns: store multiple named queries in a single file with comment-based metadata directives.
Directives¶
| Directive | Required | Description |
|---|---|---|
-- :name <name> |
Yes | Starts a new query. The name is used for lookup via -Q file.jpx:name. |
-- :desc <text> |
No | Adds a description to the current query. Shown by --list-queries. |
-- <anything> |
- | Plain comment. Ignored by the parser. |
A query definition begins at its -- :name directive and ends at the next -- :name directive or end of file.
Grammar¶
library = { query }+
query = name_line [ desc_line ] { comment | blank | expr_line }+
name_line = "-- :name " NAME
desc_line = "-- :desc " TEXT
comment = "-- " TEXT | "--"
blank = empty or whitespace-only line
expr_line = any non-blank line not starting with "-- "
- NAME: Non-empty text after
-- :name, trimmed of leading/trailing whitespace - TEXT: Arbitrary text for the rest of the line
- Expression lines are joined with newlines in order, then the result is trimmed
Multi-Line Expressions¶
Expression lines are concatenated with newline separators, preserving indentation. The final expression is trimmed of leading and trailing whitespace. This lets you write readable multi-line constructions:
-- :name user-report
{
total: length(users),
active: length(users[?active]),
admins: length(users[?role == `admin`])
}
The resulting expression is:
Blank Lines and Comments Within Queries¶
Blank lines between -- :name directives are ignored -- they don't become part of the expression. You can use them freely for visual separation:
-- :name query-a
length(@)
-- This blank line above and this comment are ignored
-- :name query-b
keys(@)
Comments (-- lines) between directives are also ignored.
Error Reporting¶
The parser reports errors with line numbers for easy debugging:
| Error | Example | Message |
|---|---|---|
| Empty name | -- :name |
Empty query name at line 3 |
| No expression | -- :name foo followed by -- :name bar |
Query 'foo' has no expression at line 1 |
| Duplicate name | Two -- :name stats |
Duplicate query name 'stats' at line 7 |
| No queries | File with only comments | No queries found. Use '-- :name <query-name>' to define queries. |
Detection Logic¶
jpx decides how to treat a query file based on these rules:
- File extension: Files ending in
.jpxare always treated as query libraries - Content sniffing: If the first non-empty line starts with
-- :name, the file is treated as a library regardless of extension - Fallback: Everything else is a simple single-expression file
Colon Syntax and Windows Paths¶
The colon syntax -Q file.jpx:query-name is shorthand for -Q file.jpx --query query-name.
On Windows, paths containing drive letters (e.g., C:\queries\file.jpx) are handled correctly -- the parser recognizes drive-letter prefixes and only splits on the final colon that separates the file path from the query name:
# Windows
jpx -Q C:\queries\file.jpx:my-query data.json
# Unix
jpx -Q /opt/queries/file.jpx:my-query data.json
If your query name contains a colon, use the explicit --query flag instead.
Real-World Examples¶
NLP Analysis Library¶
Create reusable text processing pipelines:
-- :name clean-html
-- :desc Strip HTML tags and normalize whitespace
regex_replace(@, `<[^>]+>`, ` `) | collapse_whitespace(@)
-- :name extract-keywords
-- :desc Get top keywords from text (stemmed, no stopwords)
tokens(@) | remove_stopwords(@) | stems(@) | frequencies(@)
-- :name title-keywords
-- :desc Extract keywords from article titles
hits[*].title | join(` `, @) | tokens(@) | remove_stopwords(@) | stems(@) | frequencies(@)
-- :name reading-stats
-- :desc Get reading time and word count
{
word_count: word_count(@),
reading_time: reading_time(@),
sentence_count: sentence_count(@)
}
Use it:
# Clean HTML from a field
jpx 'story_text' data.json | jpx -Q nlp.jpx:clean-html
# Analyze Hacker News titles
jpx -Q nlp.jpx:title-keywords hn_front.json
API Response Processing¶
Standardize how you extract data from APIs:
-- :name github-repos
-- :desc Extract repo summary from GitHub API response
[*].{
name: name,
stars: stargazers_count,
language: language,
description: description | default(@, `"No description"`)
}
-- :name github-issues
-- :desc Format GitHub issues for display
[*].{
number: number,
title: title,
state: state,
author: user.login,
labels: labels[*].name | join(`, `, @)
}
-- :name paginated-total
-- :desc Get total from paginated API response
{
count: length(items),
total: total_count,
has_more: length(items) < total_count
}
Data Transformation Library¶
Common transformations for ETL pipelines:
-- :name flatten-nested
-- :desc Flatten nested user records for CSV export
[*].{
id: id,
name: profile.name,
email: profile.email,
city: profile.address.city,
country: profile.address.country,
created: metadata.created_at
}
-- :name aggregate-by-status
-- :desc Group and count records by status
group_by(@, &status) | map(&{ status: [0].status, count: length(@) }, @)
-- :name enrich-timestamps
-- :desc Add formatted date fields
[*] | map(&merge(@, {
created_date: format_datetime(created_at, `%Y-%m-%d`),
created_time: format_datetime(created_at, `%H:%M:%S`)
}), @)
Log Analysis Library¶
Queries for processing structured logs:
-- :name errors-only
-- :desc Filter to just error-level logs
[?level == `error` || level == `ERROR`]
-- :name errors-by-service
-- :desc Count errors grouped by service name
[?level == `error`] | group_by(@, &service) | map(&{ service: [0].service, count: length(@) }, @)
-- :name recent-errors
-- :desc Errors from the last hour with context
[?level == `error`] | sort_by(@, ×tamp) | reverse(@) | [:20].{
time: timestamp,
service: service,
message: message,
trace_id: trace_id
}
-- :name slow-requests
-- :desc Requests taking longer than 1 second
[?duration_ms > `1000`] | sort_by(@, &duration_ms) | reverse(@)
Bundled Examples¶
jpx ships with several example .jpx files you can reference or copy:
| File | Description |
|---|---|
examples/nlp.jpx |
Text processing and NLP pipelines |
examples/hacker-news.jpx |
Hacker News API analysis queries |
examples/github.jpx |
GitHub API response processing |
examples/logs.jpx |
Structured log analysis |
examples/data-transforms.jpx |
Common ETL transformations |
Additional query libraries used in tests:
| File | Description |
|---|---|
crates/jpx/queries/user-summary.jpx |
User data aggregation |
crates/jpx/queries/server-stats.jpx |
Server metrics queries |
crates/jpx/queries/order-report.jpx |
Order data reporting |
Best Practices¶
Naming Conventions¶
Use clear, descriptive names:
- active-users not au or query1
- errors-by-service not err-svc
- Use kebab-case for consistency
Add Descriptions¶
Always add -- :desc lines. They show up in --list-queries and help others (and future you) understand what each query does.
Organize by Domain¶
Group related queries into domain-specific libraries:
- nlp.jpx - Text processing pipelines
- api.jpx - API response transformations
- logs.jpx - Log analysis queries
- etl.jpx - Data transformation queries
Version Control¶
Query libraries are plain text files - perfect for git: - Track changes to important queries - Review query changes in PRs - Share queries across your team
Validate in CI¶
Add query validation to your CI pipeline:
# .github/workflows/validate.yml
- name: Validate query libraries
run: |
for f in queries/*.jpx; do
jpx -Q "$f" --check || exit 1
done
CLI Reference¶
| Option | Description |
|---|---|
-Q, --query-file <FILE> |
Load expression from file or query library |
--query <NAME> |
Select a named query from a .jpx library |
--list-queries |
List all queries in a .jpx file |
--check |
Validate all queries without running |
Migration from Simple Files¶
If you have many simple query files, consolidate them:
# Before: multiple files
queries/
active-users.txt
admin-emails.txt
user-stats.txt
# After: one library
queries/users.jpx
Just add -- :name headers to combine them: