How do I …#

Data In/Out#

Import IDF file#

With imod.idf.open():

da = imod.idf.open("bottom_l1.idf")

Import multiple IDF files#

With imod.idf.open():

da = imod.idf.open("bottom_l*.idf")

Import IPF file#

With imod.ipf.read():

df = imod.ipf.read("timeseries.ipf")

Import netCDF file as Dataset#

ds = xr.open_dataset("dataset.nc")

Import a single netCDF variable as DataArray#

da = xr.open_dataarray("variable.nc")

Convert structured data to UGRID netCDF#

With imod.util.to_ugrid2d():

ugrid_ds = imod.util.to_ugrid2d(da)
ugrid_ds.to_netcdf("ds_ugrid.nc")

Data modification#

If-then-else#

Remove all values over 5.0:

da = da.where(da < 5.0)

Let’s say we want to replace all values < 5 by a 1, and all values >= 5 by a 2. The easiest way to do this is by using imod.util.where():

condition = old < 5.0
new = imod.util.where(condition, if_true=1, if_false=2)

This can also be done with xarray directly, but is less convenient:

condition = old < 5.0
new = xr.full_like(old, 2.0)
new = new.where(condition, other=1)

Alternatively:

condition = old < 5
new = xr.where(condition, x=1.0, y=2.0)

Note

When condition does not have the same dimension as x or y, you may end up with an unexpected dimension order; da = da.where(...) always preserves the dimension order of da. Using imod.util.where() avoids this.

Note

Xarray uses NaN (Not-A-Number) for nodata / fill values. NaN values have special properties in inequality operations: (np.nan < 5) is False, but also (np.nan >= 5) is False as well.

For this reason, imod.util.where() defaults to keeping NaNs by default: imod.util.where(old < 5.0, old, 5.0) will preserve the NaN values, while xr.where(old < 5.0, old, 5.0) will fill the NaN values with 5.0

Conditional evaluation#

Let’s say we want to select values between 5 and 10.

Avoid:

condition1 = da > 5
condition2 = da < 10
condition = condition1 and condition2

Do instead:

condition1 = da > 5
condition2 = da < 10
condition = condition1 & condition2

The reason is that and does work on the individual values, but expects something like a boolean (True or False). To do element-wise conditional evaluation on the individual values, use:

  • and: &

  • or: |

  • not: ~

  • xor: ^

To check there are no NaNs anywhere, use “reductions” such as .all() or .any() to reduce the array to a single boolean:

has_nodata = da.notnull().any()

Arithmetic#

da3 = da1 * da2 + 5.0

Make sure the grids have the same spatial coordinates.

Change cellsize (and extent)#

Nearest neighbor:

regridder = imod.prepare.Regridder(source, destination, method="nearest")
out = regridder.regrid(source)

Area weighted mean:

regridder = imod.prepare.Regridder(source, destination, method="mean")
out = regridder.regrid(source)

Change time resolution#

From e.g. hourly data to daily average:

new = da.resample(time="1D").mean()

See xarray documentation on resampling.

Select along a single layer#

sel() is “key” selection, this selects the layer named “1”:

da_layer1 = da.sel(layer=1)

isel() is “index” selection, this selects the first layer:

da_firstlayer = da.isel(layer=0)

Select part of the data#

Generally, raster data is y-descending, so ymax comes before ymin:

da_selection = da.sel(x=slice(xmin, xmax), y=slice(ymax, ymin))

Create an empty raster#

For just a two dimensional x-y grid:

da = imod.util.empty_2d(dx, xmin, xmax, dy, ymin, ymax)

For a three dimensional version:

da = imod.util.empty_3d(dx, xmin, xmax, dy, ymin, ymax, layer=[1, 2, 3])

For a time varying 2d grid:

da = imod.util.empty_2d_transient(
    dx, xmin, xmax, dy, ymin, ymax, time=pd.date_range("2020-01-01", "2020-02-01")
)

For a time varying 3d grid:

da = imod.util.empty_3d_transient(
    dx,
    xmin,
    xmax,
    dy,
    ymin,
    ymax,
    layer=[1, 2, 3],
    time=pd.date_range("2020-01-01", "2020-02-01")
)

Fill/Interpolate nodata#

To do nearest neighbor interpolation:

new = imod.prepare.fill(da_with_gaps)

To do interpolation along a single dimension:

new = da_with_gaps.interpolate_na(dim="x")

See the xarray documentation on interpolation of NaN values.

To do interpolation in time, see Change time resolution.

To do Laplace interplation (using a linear equation, similar to a groundwater model with constant conductivity):

da = imod.prepare.laplace_interpolate(with_gaps)

Rasterize polygon data#

A geopandas GeoDataFrame can be rasterized by providing a sample DataArray for like in imod.prepare.rasterize():

rasterized = imod.prepare.rasterize(
    geodataframe,
    column="column_name_in_geodataframe",
    like=like,
 )

For large vector datasets, reading the files into a geodataframe can take longer dan the rasterization step. To avoid this, it’s possible to skip loading the data altogether with imod.prepare.gdal_rasterize()

rasterized = imod.prepare.gdal_rasterize(
    path="path-to-vector-data.shp",
    column="column_name_in_vector_data",
    like=like,
 )

Smooth data#

We can use a convolution to smooth:

kernel = np.ones((1, 10, 10))
kernel /= kernel.sum()
da.values = scipy.ndimage.convolve(da.values, kernel)

Zonal statistics#

To compute a mean:

mean = da.groupby(zones).mean("stacked_y_x")

To compute a sum:

sum = da.groupby(zones).sum("stacked_y_x")

Note

This is not the most efficient way of computing zonal statistics. If it takes a long time or consumes a lot of memory, see e.g. xarray-spatial’s zonal stats function.

Force loading into memory / dask array to numpy array#

da = da.compute()

Alternatively:

da = da.load()

Select a single variable from a dataset#

Select "kd" from dataset ds:

da_kd = ds["kd"]

Select points (from a vector dataset)#

geometry = geodataframe.geometry
x = geometry.x
y = geometry.y
selection = imod.select.points_values(da, x=x, y=y)

For time series analysis, converting into a pandas DataFrame may be useful:

df = selection.to_dataframe()

Sum properties over layers#

total_kd = da_kd.sum("layer")

Plot a timeseries for a single cell#

transient_da.sel(x=x, y=y).plot()

Plot head of one layer at one time#

transient_da.sel(layer=1, time="2020-01-01").plot()

Unstructured#

imod uses xugrid to represent data on unstructured grids. Where possible, the functions of methods of xugrid match those xarray. However, working with the unstructured topology requires slightly different commands: any operation dealing the unstructured (x, y) dimensions requires the .ugrid accessor.

Generate a mesh#

We’ve developed a separate package called pandamesh to help generated (x, y) unstructured meshes based on vector data in the form of geopandas GeoDataFrames.

import geopandas as gpd
import pandamesh as pm
import xugrid as xu

gdf = gpd.read_file("my-study-area.shp")
gdf["cellsize"] = 100.0
mesher = pm.TriangleMesher(gdf)
vertices, cells = mesher.generate()
grid = xugrid.Ugrid2d(vertices, -1, cells)

It can be installed with:

pip install pandamesh

Note

One of the dependencies of pandamesh, the Python bindings to triangle, does not have the (binary) wheels for Python 3.9 and higher yet.

Plot a timeseries for a single cell#

transient_uda.ugrid.sel(x=x, y=y).plot()

Plot head of one layer at one time#

transient_uda.sel(layer=1, time="2020-01-01").ugrid.plot()

Note

Since layer and time do not depend on the unstructured topology, they may be indexed using the standard xarray methods, without the .ugrid accessor.

Fill/Interpolate nodata#

To do Laplace interplation (using a linear equation, similar to a groundwater model with constant conductivity):

uda = with_gaps.ugrid.laplace_interpolate()

Select points (from a vector dataset)#

geometry = geodataframe.geometry
x = geometry.x
y = geometry.y
selection = uda.ugrid.select_points(x=x, y=y)

For time series analysis, converting into a pandas DataFrame may be useful:

df = selection.to_dataframe()