Vector data and Geopandas#

Geospatial data primarily comes in two forms: raster data and vector data. This guide focuses on the latter.

Typical examples of file formats containing vector data are:

  • ESRI shapefile

  • GeoJSON

  • Geopackage

Vector data consist of vertices (corner points), optionally connected by paths. The three primary categories of vector data are:

  • Points

  • Lines

  • Polygons

In groundwater modeling, typical examples of each are:

  • Pumping wells, observation wells, boreholes

  • Canals, ditches, waterways

  • Lakes, administrative boundaries, land use

These data consist of geospatial coordinates, indicating the location in space and a number of attributes: for a canal, this could be parameters like its width, depth, and water level. In GIS software like QGIS, the geometry is visible in the map view, and the attributes can inspected via e.g. the attribute table.

In Python, such data can be represented by a geopandas.GeoDataFrame. Essentially, geopandas is a pandas DataFrame to store tabular data (the attribute table), and adds a geometry column to store the geospatial coordinates.

import geopandas as gpd
import numpy as np

import imod

tempdir = imod.util.temporary_directory()
gdf = imod.data.lakes_shp(tempdir / "lake")
gdf.iloc[:5, -3:]  # first 5 rows, last 3 columns
SHAPE_Leng SHAPE_Area geometry
0 4689.155578 511471.386406 POLYGON ((108774.371 466627.430, 108774.375 46...
1 2050.843018 86091.854890 POLYGON ((115938.892 463013.165, 115926.865 46...
2 5023.190894 625149.040467 POLYGON ((111380.977 448065.855, 111378.115 44...
3 3724.550747 467233.192689 POLYGON ((117298.904 478782.595, 117297.744 47...
4 6834.063594 809445.407846 POLYGON ((112096.692 450851.187, 112098.812 45...


This geodataframe contains all the data from the shapefile. Note the geometry column. The geometry can be plotted:

gdf.plot()
02 vector data

Out:

<AxesSubplot:>

A GeoDataFrame of points can also be easily generated from pairs of x and y coordinates.

x = np.arange(90_000.0, 120_000.0, 1000.0)
y = np.arange(450_000.0, 480_000.0, 1000.0)

geometry = gpd.points_from_xy(x, y)
points_gdf = gpd.GeoDataFrame(geometry=geometry)

points_gdf.plot()
02 vector data

Out:

<AxesSubplot:>

An important feature of every geometry is its geometry type:

gdf.geom_type

Out:

0     Polygon
1     Polygon
2     Polygon
3     Polygon
4     Polygon
       ...
72    Polygon
73    Polygon
74    Polygon
75    Polygon
76    Polygon
Length: 77, dtype: object

As expected, the points are of the type … Point:

points_gdf.geom_type

Out:

0     Point
1     Point
2     Point
3     Point
4     Point
5     Point
6     Point
7     Point
8     Point
9     Point
10    Point
11    Point
12    Point
13    Point
14    Point
15    Point
16    Point
17    Point
18    Point
19    Point
20    Point
21    Point
22    Point
23    Point
24    Point
25    Point
26    Point
27    Point
28    Point
29    Point
dtype: object

Input and output#

Geopandas supports many vector file formats. It wraps fiona, which in turns wraps OGR, which is a part of GDAL. For example, the lake polygons above are loaded from an ESRI Shapefile:

filenames = [path.name for path in (tempdir / "lake").iterdir()]
print("\n".join(filenames))

Out:

lakes.prj
lakes.dbf
lakes.shx
lakes.shp
lakes.cpg

They can be easily stored into more modern formats as well, such as GeoPackage:

points_gdf.to_file(tempdir / "points.gpkg")
filenames = [path.name for path in tempdir.iterdir()]
print("\n".join(filenames))

Out:

/opt/conda/envs/imod/lib/python3.10/site-packages/geopandas/io/file.py:362: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  pd.Int64Index,
points.gpkg
lake

… and back:

back = gpd.read_file(tempdir / "points.gpkg")
back
geometry
0 POINT (90000.00000 450000.00000)
1 POINT (91000.00000 451000.00000)
2 POINT (92000.00000 452000.00000)
3 POINT (93000.00000 453000.00000)
4 POINT (94000.00000 454000.00000)
5 POINT (95000.00000 455000.00000)
6 POINT (96000.00000 456000.00000)
7 POINT (97000.00000 457000.00000)
8 POINT (98000.00000 458000.00000)
9 POINT (99000.00000 459000.00000)
10 POINT (100000.00000 460000.00000)
11 POINT (101000.00000 461000.00000)
12 POINT (102000.00000 462000.00000)
13 POINT (103000.00000 463000.00000)
14 POINT (104000.00000 464000.00000)
15 POINT (105000.00000 465000.00000)
16 POINT (106000.00000 466000.00000)
17 POINT (107000.00000 467000.00000)
18 POINT (108000.00000 468000.00000)
19 POINT (109000.00000 469000.00000)
20 POINT (110000.00000 470000.00000)
21 POINT (111000.00000 471000.00000)
22 POINT (112000.00000 472000.00000)
23 POINT (113000.00000 473000.00000)
24 POINT (114000.00000 474000.00000)
25 POINT (115000.00000 475000.00000)
26 POINT (116000.00000 476000.00000)
27 POINT (117000.00000 477000.00000)
28 POINT (118000.00000 478000.00000)
29 POINT (119000.00000 479000.00000)


Conversion to raster#

From the perspective of MODFLOW groundwater modeling, we are often interested in the properties of cells in specific polygons or zones. Refer to the examples or the API reference for imod.prepare.

GeoPandas provides a full suite of vector based GIS operations, such as intersections, spatial joins, or plotting.

Total running time of the script: ( 0 minutes 1.206 seconds)

Gallery generated by Sphinx-Gallery