Real-World Datasets¶
Learn jpx by working with real data from public APIs. Each example includes:
- How to fetch the data
- Data structure overview
- Progressive examples from basic to advanced
- Practical use cases
Available Guides¶
| Guide | Description | Key Features |
|---|---|---|
| Standard JMESPath Only | Portable queries using only spec functions | 26 built-in functions, no extensions |
| NLP Text Processing | Text analysis pipelines | Tokenization, stemming, stopwords, normalization |
| Hacker News | Tech discussions via Algolia API | NLP on real content, topic detection, vocabulary analysis |
| USGS Earthquakes | Real-time seismic data | Geo functions, statistics, filtering |
| Nobel Prize API | Laureates and prizes | Multilingual data, text processing, dates |
| NASA Near Earth Objects | Asteroids and comets | Nested data, unit conversions, risk analysis |
| Project Management | Synthetic project data | Comprehensive function coverage, all categories |
| Large Datasets & Parquet | 200K Chicago crime records | Parquet I/O, group_by, geo, token savings |
Quick Start¶
# Fetch earthquake data
curl -s "https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&limit=20&minmagnitude=5" > quakes.json
# Try a query
jpx 'features[*].{mag: properties.mag, place: properties.place}' quakes.json
What You'll Learn¶
Filtering & Selection¶
- Complex filter expressions with multiple conditions
- Nested field access patterns
- Text-based filtering with
contains,starts_with
Statistics & Aggregation¶
avg,median,stddevfor numeric analysismin,max,min_by,max_byfor extremeslengthand counting patterns
Geographic Calculations¶
geo_distance_kmfor distance calculations- Coordinate extraction and formatting
- Distance-based sorting
Date/Time Operations¶
- Unix timestamp conversion with
from_unixtime - Date formatting with
format_datetime - Date range filtering
Data Transformation¶
- Reshaping nested structures
- Flattening for export
- CSV/TSV output for spreadsheets
Pipeline Patterns¶
- Multi-step transformations
- Sorting and limiting results
- Building summary reports
Tips for Working with APIs¶
-
Save data locally for faster iteration:
-
Explore structure first:
-
Use
--compactfor pipelines: -
Export for analysis:
More Data Sources¶
Looking for more datasets to practice with? Check out:
- Awesome JSON Datasets - Curated list of public JSON APIs
- Public APIs - Collective list of free APIs
- NASA Open APIs - Space and Earth science data
- OpenWeatherMap - Weather data
- GitHub API - Repository and user data