Skip to content

Real-World Datasets

Learn jpx by working with real data from public APIs. Each example includes:

  • How to fetch the data
  • Data structure overview
  • Progressive examples from basic to advanced
  • Practical use cases

Available Guides

Guide Description Key Features
Standard JMESPath Only Portable queries using only spec functions 26 built-in functions, no extensions
NLP Text Processing Text analysis pipelines Tokenization, stemming, stopwords, normalization
Hacker News Tech discussions via Algolia API NLP on real content, topic detection, vocabulary analysis
USGS Earthquakes Real-time seismic data Geo functions, statistics, filtering
Nobel Prize API Laureates and prizes Multilingual data, text processing, dates
NASA Near Earth Objects Asteroids and comets Nested data, unit conversions, risk analysis
Project Management Synthetic project data Comprehensive function coverage, all categories
Large Datasets & Parquet 200K Chicago crime records Parquet I/O, group_by, geo, token savings

Quick Start

# Fetch earthquake data
curl -s "https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&limit=20&minmagnitude=5" > quakes.json

# Try a query
jpx 'features[*].{mag: properties.mag, place: properties.place}' quakes.json

What You'll Learn

Filtering & Selection

  • Complex filter expressions with multiple conditions
  • Nested field access patterns
  • Text-based filtering with contains, starts_with

Statistics & Aggregation

  • avg, median, stddev for numeric analysis
  • min, max, min_by, max_by for extremes
  • length and counting patterns

Geographic Calculations

  • geo_distance_km for distance calculations
  • Coordinate extraction and formatting
  • Distance-based sorting

Date/Time Operations

  • Unix timestamp conversion with from_unixtime
  • Date formatting with format_datetime
  • Date range filtering

Data Transformation

  • Reshaping nested structures
  • Flattening for export
  • CSV/TSV output for spreadsheets

Pipeline Patterns

  • Multi-step transformations
  • Sorting and limiting results
  • Building summary reports

Tips for Working with APIs

  1. Save data locally for faster iteration:

    curl -s "API_URL" > data.json
    jpx 'expression' data.json
    

  2. Explore structure first:

    jpx 'keys(@)' data.json          # Top-level keys
    jpx '@[0]' data.json             # First element (arrays)
    jpx 'type(@)' data.json          # Data type
    

  3. Use --compact for pipelines:

    jpx -c 'expression' data.json | jpx 'next_expression'
    

  4. Export for analysis:

    jpx --csv 'transform' data.json > output.csv
    

More Data Sources

Looking for more datasets to practice with? Check out: