API reference¶

The geodatasets package has top-level functions that will cover 95% of use cases and other tooling handling the database.

Top-level API¶

In most cases, you will be using get_path() to download the data and get the path to the local storage, get_url() to get the link to the original dataset in its online location and fetch() to pre-download data to the local storage.

geodatasets.get_path(name)¶

Get the absolute path to a file in the local storage.

If it’s not in the local storage, it will be downloaded.

name is queried using query_name(), so it only needs to contain the same letters in the same order as the item’s name irrespective of the letter case, spaces, dashes and other characters.

For Datasets containing multiple files, the archive is automatically extracted.

Parameters:

namestr: Name of the data item. Formatting does not matter.

See also

get_path

Examples

>>> geodatasets.fetch('nybb')
Downloading file 'nybb_22c.zip' from 'https://data.cityofnewyork.us/api/geospatial/tqmj-j8zm?method=export&format=Original' to '/Users/martin/Library/Caches/geodatasets'.
Extracting 'nybb_22c/nybb.shp' from '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip' to '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip.unzip'
Extracting 'nybb_22c/nybb.shx' from '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip' to '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip.unzip'
Extracting 'nybb_22c/nybb.dbf' from '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip' to '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip.unzip'
Extracting 'nybb_22c/nybb.prj' from '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip' to '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip.unzip'

>>> geodatasets.fetch(['geoda airbnb', 'geoda guerry'])
Downloading file 'airbnb.zip' from 'https://geodacenter.github.io/data-and-lab//data/airbnb.zip' to '/Users/martin/Library/Caches/geodatasets'.
Downloading file 'guerry.zip' from 'https://geodacenter.github.io/data-and-lab//data/guerry.zip' to '/Users/martin/Library/Caches/geodatasets'.
Extracting 'guerry/guerry.shp' from '/Users/martin/Library/Caches/geodatasets/guerry.zip' to '/Users/martin/Library/Caches/geodatasets/guerry.zip.unzip'
Extracting 'guerry/guerry.dbf' from '/Users/martin/Library/Caches/geodatasets/guerry.zip' to '/Users/martin/Library/Caches/geodatasets/guerry.zip.unzip'
Extracting 'guerry/guerry.shx' from '/Users/martin/Library/Caches/geodatasets/guerry.zip' to '/Users/martin/Library/Caches/geodatasets/guerry.zip.unzip'
Extracting 'guerry/guerry.prj' from '/Users/martin/Library/Caches/geodatasets/guerry.zip' to '/Users/martin/Library/Caches/geodatasets/guerry.zip.unzip'

Database-level API¶

The database of dataset metadata is handled via custom dict-based classes.

class geodatasets.Dataset(*args, **kwargs)¶

A dict with attribute-access and that can be called to update keys

Attributes:

path: Get the absolute path to a file in the local storage.

A dict with attribute-access and that can be called to update keys.

property path: str¶

Get the absolute path to a file in the local storage.

If it’s not in the local storage, it will be downloaded.

For Datasets containing multiple files, the archive is automatically extracted.

Returns:

str: loacal path

class geodatasets.Bunch¶

A dict with attribute-access

Bunch is used to store Dataset objects.

Methods

`filter`([keyword, name, geometry_type, function])	Return a subset of the `Bunch` matching the filter conditions
`flatten`()	Return the nested `Bunch` collapsed into the one level dictionary.
`query_name`(name)	Return `Dataset` based on the name query

filter(keyword: str | None = None, name: str | None = None, geometry_type: str | None = None, function: Callable[[Dataset], bool] = None) → Bunch¶

Return a subset of the Bunch matching the filter conditions

Each Dataset within a Bunch is checked against one or more specified conditions and kept if they are satisfied or removed if at least one condition is not met.

Parameters:

keywordstr (optional): Condition returns True if keyword string is present in any string value in a Dataset object. The comparison is not case sensitive.
namestr (optional): Condition returns True if name string is present in the name attribute of Dataset object. The comparison is not case sensitive.
geometry_typestr (optional): Condition returns True if Dataset.geometry_type() is matches the geometry_type. Possible options are ["Point", "LineString", "Polygon", "Mixed"]. The comparison is not case sensitive.
functioncallable (optional): Custom function taking Dataset as an argument and returns bool. If function is given, other parameters are ignored.

Returns:

filteredBunch

Examples

>>> from geodatasets import data

You can filter all Point datasets:

>>> points = data.filter(geometry_type="Point")

Or all datasets with chicago in the name:

>>> chicago_datasets = data.filter(name="chicago")

You can use keyword search to find all datasets in a CSV format:

>>> csv_datasets = data.filter(keyword="csv")

You can combine multiple conditions to find datasets based with chicago in name of Polygon geometry type:

>>> chicago_polygons = data.filter(name="chicago", geometry_type="Polygon")

You can also pass custom function that takes Dataset and returns boolean value. You can then find all datasets with nrows smaller than 100:

>>> def small_data(dataset):
...    if hasattr(dataset, "nrows") and dataset.nrows < 100:
...        return True
...    return False
>>> small = data.filter(function=small_data)

flatten() → dict¶

Return the nested Bunch collapsed into the one level dictionary.

Dictionary keys are Dataset names (e.g. geoda.airbnb) and its values are Dataset objects.

Returns:

flatteneddict: dictionary of Dataset objects

query_name(name: str) → Dataset¶

Return Dataset based on the name query

Returns a matching Dataset from the Bunch if the name contains the same letters in the same order as the item’s name irrespective of the letter case, spaces, dashes and other characters. See examples for details.

Parameters:

namestr: Name of the data item. Formatting does not matter.

Returns:

match: Dataset