API reference

The geodatasets package has top-level functions that will cover 95% of use cases and other tooling handling the database.

Top-level API

In most cases, you will be using get_path() to download the data and get the path to the local storage, get_url() to get the link to the original dataset in its online location and fetch() to pre-download data to the local storage.

geodatasets.get_path(name)

Get the absolute path to a file in the local storage.

If it’s not in the local storage, it will be downloaded.

name is queried using query_name(), so it only needs to contain the same letters in the same order as the item’s name irrespective of the letter case, spaces, dashes and other characters.

For Datasets containing multiple files, the archive is automatically extracted.

Parameters:
namestr

Name of the data item. Formatting does not matter.

See also

get_url
fetch

Examples

When it does not exist in the cache yet, it gets downloaded first:

>>> path = geodatasets.get_path('GeoDa AirBnB')
Downloading file 'airbnb.zip' from 'https://geodacenter.github.io/data-and-lab//data/airbnb.zip' to '/Users/martin/Library/Caches/geodatasets'.
>>> path
'/Users/martin/Library/Caches/geodatasets/airbnb.zip'

Every other call returns the path directly:

>>> path2 = geodatasets.get_path("geoda_airbnb")
>>> path2
'/Users/martin/Library/Caches/geodatasets/airbnb.zip'
geodatasets.get_url(name)

Get the URL from which the dataset can be fetched.

name is queried using query_name(), so it only needs to contain the same letters in the same order as the item’s name irrespective of the letter case, spaces, dashes and other characters.

No data is downloaded.

Parameters:
namestr

Name of the data item. Formatting does not matter.

Returns:
str

link to the online dataset

See also

get_path

Examples

>>> geodatasets.get_url('GeoDa AirBnB')
'https://geodacenter.github.io/data-and-lab//data/airbnb.zip'
>>> geodatasets.get_url('geoda_airbnb')
'https://geodacenter.github.io/data-and-lab//data/airbnb.zip'
geodatasets.fetch(name)

Download the data to the local storage.

This is useful when it is expected that some data will be needed later but you want to avoid download at that time.

name is queried using query_name(), so it only needs to contain the same letters in the same order as the item’s name irrespective of the letter case, spaces, dashes and other characters.

For Datasets containing multiple files, the archive is automatically extracted.

Parameters:
namestr, list

Name of the data item(s). Formatting does not matter.

See also

get_path

Examples

>>> geodatasets.fetch('nybb')
Downloading file 'nybb_22c.zip' from 'https://data.cityofnewyork.us/api/geospatial/tqmj-j8zm?method=export&format=Original' to '/Users/martin/Library/Caches/geodatasets'.
Extracting 'nybb_22c/nybb.shp' from '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip' to '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip.unzip'
Extracting 'nybb_22c/nybb.shx' from '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip' to '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip.unzip'
Extracting 'nybb_22c/nybb.dbf' from '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip' to '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip.unzip'
Extracting 'nybb_22c/nybb.prj' from '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip' to '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip.unzip'
>>> geodatasets.fetch(['geoda airbnb', 'geoda guerry'])
Downloading file 'airbnb.zip' from 'https://geodacenter.github.io/data-and-lab//data/airbnb.zip' to '/Users/martin/Library/Caches/geodatasets'.
Downloading file 'guerry.zip' from 'https://geodacenter.github.io/data-and-lab//data/guerry.zip' to '/Users/martin/Library/Caches/geodatasets'.
Extracting 'guerry/guerry.shp' from '/Users/martin/Library/Caches/geodatasets/guerry.zip' to '/Users/martin/Library/Caches/geodatasets/guerry.zip.unzip'
Extracting 'guerry/guerry.dbf' from '/Users/martin/Library/Caches/geodatasets/guerry.zip' to '/Users/martin/Library/Caches/geodatasets/guerry.zip.unzip'
Extracting 'guerry/guerry.shx' from '/Users/martin/Library/Caches/geodatasets/guerry.zip' to '/Users/martin/Library/Caches/geodatasets/guerry.zip.unzip'
Extracting 'guerry/guerry.prj' from '/Users/martin/Library/Caches/geodatasets/guerry.zip' to '/Users/martin/Library/Caches/geodatasets/guerry.zip.unzip'

Database-level API

The database of dataset metadata is handled via custom dict-based classes.

class geodatasets.Dataset

A dict with attribute-access and that can be called to update keys.

class geodatasets.Bunch

A dict with attribute-access

Bunch is used to store Dataset objects.

Methods

filter([keyword, name, geometry_type, function])

Return a subset of the Bunch matching the filter conditions

flatten()

Return the nested Bunch collapsed into the one level dictionary.

query_name(name)

Return Dataset based on the name query

filter(keyword: str | None = None, name: str | None = None, geometry_type: str | None = None, function: Callable[[Dataset], bool] = None) Bunch

Return a subset of the Bunch matching the filter conditions

Each Dataset within a Bunch is checked against one or more specified conditions and kept if they are satisfied or removed if at least one condition is not met.

Parameters:
keywordstr (optional)

Condition returns True if keyword string is present in any string value in a Dataset object. The comparison is not case sensitive.

namestr (optional)

Condition returns True if name string is present in the name attribute of Dataset object. The comparison is not case sensitive.

geometry_typestr (optional)

Condition returns True if Dataset.geometry_type() is matches the geometry_type. Possible options are ["Point", "LineString", "Polygon", "Mixed"]. The comparison is not case sensitive.

functioncallable (optional)

Custom function taking Dataset as an argument and returns bool. If function is given, other parameters are ignored.

Returns:
filteredBunch

Examples

>>> from geodatasets import data

You can filter all Point datasets:

>>> points = data.filter(geometry_type="Point")

Or all datasets with chicago in the name:

>>> chicago_datasets = data.filter(name="chicago")

You can use keyword search to find all datasets in a CSV format:

>>> csv_datasets = data.filter(keyword="csv")

You can combine multiple conditions to find datasets based with chicago in name of Polygon geometry type:

>>> chicago_polygons = data.filter(name="chicago", geometry_type="Polygon")

You can also pass custom function that takes Dataset and returns boolean value. You can then find all datasets with nrows smaller than 100:

>>> def small_data(dataset):
...    if hasattr(dataset, "nrows") and dataset.nrows < 100:
...        return True
...    return False
>>> small = data.filter(function=small_data)
flatten() dict

Return the nested Bunch collapsed into the one level dictionary.

Dictionary keys are Dataset names (e.g. geoda.airbnb) and its values are Dataset objects.

Returns:
flatteneddict

dictionary of Dataset objects

query_name(name: str) Dataset

Return Dataset based on the name query

Returns a matching Dataset from the Bunch if the name contains the same letters in the same order as the item’s name irrespective of the letter case, spaces, dashes and other characters. See examples for details.

Parameters:
namestr

Name of the data item. Formatting does not matter.

Returns:
match: Dataset