Pipeline¶

Pipeline is the main entry point. It takes your data, study area, and method choice, runs the interpolation, and returns a result object with grids, metrics, and export helpers.

from geointerpo import Pipeline

result = Pipeline(
    data=...,      # step 1 — point data
    boundary=...,  # step 2 — study area
    method=...,    # step 3 — interpolation method(s)
).run()

Step 1 — Provide point data (`data=`)¶

Value	What it loads
`"stations.csv"`	CSV with longitude, latitude, and value columns
`"stations.shp"`	Any vector file: `.shp`, `.geojson`, `.gpkg`, `.zip`
`my_gdf`	A GeoDataFrame already in memory
`"meteostat"`	Live weather data from Meteostat
`"openaq"`	Live air quality data from OpenAQ
`"openmeteo"`	Live reanalysis data from Open-Meteo
`"sample"`	Built-in synthetic dataset — no network needed

CSV files: geointerpo detects column names automatically. It recognises lon, longitude, x for longitude and lat, latitude, y for latitude. Override with lon_col=, lat_col=, and value_col=:

Pipeline(data="my_data.csv", lon_col="x", lat_col="y", value_col="temp", ...)

Live API sources: pair with variable= and date=:

Pipeline(data="meteostat",  variable="temperature",  date="2024-07-15", ...)
Pipeline(data="openaq",     variable="pm25",          date="2024-07-15", ...)
Pipeline(data="openmeteo",  variable="precipitation", date="2024-07-15", ...)

Step 2 — Define the study area (`boundary=`)¶

See Boundaries for all input formats. Quick options:

boundary="Calgary, Alberta, Canada"      # place name (Nominatim)
boundary=(-114.5, 50.8, -113.8, 51.3)   # (min_lon, min_lat, max_lon, max_lat)
boundary="data/region.geojson"           # file path
boundary=my_gdf                          # GeoDataFrame

Step 3 — Choose a method (`method=`)¶

method="kriging"                              # single method
method=["idw", "kriging", "spline", "rbf"]   # run multiple and compare

See Methods for all 28 accepted method keys.

Override parameters per method:

Pipeline(
    method=["idw", "kriging", "rbf"],
    method_params={
        "idw":     {"power": 3},
        "kriging": {"variogram_model": "spherical", "nlags": 12},
        "rbf":     {"kernel": "thin_plate_spline"},
    },
)

All parameters¶

Parameter	Type	Default	Description
`data`	str / Path / GeoDataFrame	required	Point data source (see table above, plus `"era5"` and `"nasapower"`)
`boundary`	str / tuple / GeoDataFrame	`None`	Study area
`method`	str or list[str]	`"kriging"`	Method key(s) to run — see Methods for all 28 keys
`variable`	str	`"value"`	Column name or API variable
`date`	str	yesterday	ISO date `"YYYY-MM-DD"` for API sources
`lon_col`	str	`"lon"`	CSV longitude column
`lat_col`	str	`"lat"`	CSV latitude column
`value_col`	str	`"value"`	CSV or GeoDataFrame value column
`resolution`	float or str	`0.25`	Grid cell size — float degrees or `"5km"` / `"500m"` strings
`padding_deg`	float	`0.5`	Degrees of padding around boundary bbox
`method_params`	dict	`{}`	Per-method parameter overrides
`clip_to_boundary`	bool	`True`	Mask grid cells outside the boundary polygon
`include_dem`	bool	`False`	Add SRTM elevation as a covariate for ML and RK methods
`cv_folds`	int	`5`	Number of spatial cross-validation folds
`boundary_provider`	str	`"nominatim"`	`"nominatim"` (default) or `"osmnx"`
`search_radius`	SearchRadius	`None`	Restrict the local station neighbourhood used per prediction

Work with results¶

Pipeline.run() returns an InterpolationResult:

result.grid            # xr.DataArray — primary method surface (WGS-84)
result.grids           # dict[str, xr.DataArray] — one grid per method
result.variance_grids  # dict[str, xr.DataArray] — variance/std surfaces (kriging, cokriging, SGS)
result.stations        # gpd.GeoDataFrame — your input points
result.cv_metrics      # dict[str, dict] — RMSE, MAE, bias, r, n per method
result.boundary        # gpd.GeoDataFrame or None

Visualize:

result.plot()                     # side-by-side matplotlib figure (requires [viz])
result.plot_interactive()         # zoomable Plotly or leafmap map (requires [interactive])
result.metrics_table()            # pandas DataFrame with RMSE, MAE, bias, r

Rank methods:

# Which method scored best on RMSE?
print(result.best_method())       # e.g. 'kriging'
print(result.best_method(by="r")) # rank by correlation instead

# Full ranked table
print(result.rank_methods())
#            rmse   mae   bias     r  n  rank
# kriging    1.23  0.91  -0.02  0.97  6     1
# idw        1.81  1.34   0.11  0.94  6     2

Export:

result.save("outputs/")
# writes: <method>.tif, cv_metrics.csv, interpolation_comparison.png

Resolution strings¶

resolution= accepts float degrees or human-readable metric strings:

Pipeline(..., resolution=0.1)      # 0.1° ≈ 11 km
Pipeline(..., resolution="5km")    # exactly 5 km (≈ 0.045°)
Pipeline(..., resolution="500m")   # 500 m (≈ 0.0045°)

Variance and uncertainty grids¶

Methods that support uncertainty quantification populate result.variance_grids:

# Kriging — populated automatically
result = Pipeline(data=gdf, method="kriging").run()
var_da = result.variance_grids["kriging"]   # xr.DataArray, same shape as grid

# ML uncertainty — call the interpolator directly
from geointerpo.interpolators.ml import MLInterpolator
model = MLInterpolator(method="rf").fit(gdf)
mean, lower, upper = model.predict_with_uncertainty(bbox, resolution=0.1, alpha=0.1)

New data sources (v0.2)¶

Two new remote sources are available without API keys:

# NASA POWER — free REST API, no account
Pipeline(data="nasapower", variable="temperature", date="2024-07-15", boundary=bbox)

# ERA5 — free CDS account + pip install cdsapi
Pipeline(data="era5", variable="temperature", date="2024-07-15", boundary=bbox)

Limit the search radius¶

Use SearchRadius to control how many stations each prediction point uses — matching the ArcGIS Spatial Analyst SearchRadius parameter:

from geointerpo import Pipeline, SearchRadius

# Use the 15 nearest stations
Pipeline(..., search_radius=SearchRadius.variable(n=15))

# Use all stations within 100 km
Pipeline(..., search_radius=SearchRadius.fixed(distance_m=100_000))

Note

SearchRadius.variable(n=12) is the ArcGIS default.

Info

variable selects the nearest n stations separately for each grid cell. fixed uses all stations within distance_m metres of each grid cell.

Warning

Fixed-radius search can leave NaN gaps if no stations fall inside the radius, or if too few local stations are available for the chosen interpolator.

Note

Search-radius neighbourhoods are applied by the deterministic interpolators (idw, kriging, natural_neighbor, nearest, linear, cubic, rbf, spline, spline_tension, trend). ML-based methods (gp, rf, gbm, rk) remain global and ignore search_radius.

Tip

Local search can be noticeably slower for methods that need to refit a local model per grid cell, especially kriging and spline-style methods.