API reference¶
The geodatasets
package has top-level functions that will cover 95% of use cases and
other tooling handling the database.
Top-level API¶
In most cases, you will be using get_path()
to download the data and get the path
to the local storage, get_url()
to get the link to the original dataset in its
online location and fetch()
to pre-download data to the local storage.
- geodatasets.get_path(name)¶
Get the absolute path to a file in the local storage.
If it’s not in the local storage, it will be downloaded.
name
is queried usingquery_name()
, so it only needs to contain the same letters in the same order as the item’s name irrespective of the letter case, spaces, dashes and other characters.For Datasets containing multiple files, the archive is automatically extracted.
- Parameters:
- namestr
Name of the data item. Formatting does not matter.
Examples
When it does not exist in the cache yet, it gets downloaded first:
>>> path = geodatasets.get_path('GeoDa AirBnB') Downloading file 'airbnb.zip' from 'https://geodacenter.github.io/data-and-lab//data/airbnb.zip' to '/Users/martin/Library/Caches/geodatasets'. >>> path '/Users/martin/Library/Caches/geodatasets/airbnb.zip'
Every other call returns the path directly:
>>> path2 = geodatasets.get_path("geoda_airbnb") >>> path2 '/Users/martin/Library/Caches/geodatasets/airbnb.zip'
- geodatasets.get_url(name)¶
Get the URL from which the dataset can be fetched.
name
is queried usingquery_name()
, so it only needs to contain the same letters in the same order as the item’s name irrespective of the letter case, spaces, dashes and other characters.No data is downloaded.
- Parameters:
- namestr
Name of the data item. Formatting does not matter.
- Returns:
- str
link to the online dataset
See also
Examples
>>> geodatasets.get_url('GeoDa AirBnB') 'https://geodacenter.github.io/data-and-lab//data/airbnb.zip'
>>> geodatasets.get_url('geoda_airbnb') 'https://geodacenter.github.io/data-and-lab//data/airbnb.zip'
- geodatasets.fetch(name)¶
Download the data to the local storage.
This is useful when it is expected that some data will be needed later but you want to avoid download at that time.
name
is queried usingquery_name()
, so it only needs to contain the same letters in the same order as the item’s name irrespective of the letter case, spaces, dashes and other characters.For Datasets containing multiple files, the archive is automatically extracted.
- Parameters:
- namestr, list
Name of the data item(s). Formatting does not matter.
See also
Examples
>>> geodatasets.fetch('nybb') Downloading file 'nybb_22c.zip' from 'https://data.cityofnewyork.us/api/geospatial/tqmj-j8zm?method=export&format=Original' to '/Users/martin/Library/Caches/geodatasets'. Extracting 'nybb_22c/nybb.shp' from '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip' to '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip.unzip' Extracting 'nybb_22c/nybb.shx' from '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip' to '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip.unzip' Extracting 'nybb_22c/nybb.dbf' from '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip' to '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip.unzip' Extracting 'nybb_22c/nybb.prj' from '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip' to '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip.unzip'
>>> geodatasets.fetch(['geoda airbnb', 'geoda guerry']) Downloading file 'airbnb.zip' from 'https://geodacenter.github.io/data-and-lab//data/airbnb.zip' to '/Users/martin/Library/Caches/geodatasets'. Downloading file 'guerry.zip' from 'https://geodacenter.github.io/data-and-lab//data/guerry.zip' to '/Users/martin/Library/Caches/geodatasets'. Extracting 'guerry/guerry.shp' from '/Users/martin/Library/Caches/geodatasets/guerry.zip' to '/Users/martin/Library/Caches/geodatasets/guerry.zip.unzip' Extracting 'guerry/guerry.dbf' from '/Users/martin/Library/Caches/geodatasets/guerry.zip' to '/Users/martin/Library/Caches/geodatasets/guerry.zip.unzip' Extracting 'guerry/guerry.shx' from '/Users/martin/Library/Caches/geodatasets/guerry.zip' to '/Users/martin/Library/Caches/geodatasets/guerry.zip.unzip' Extracting 'guerry/guerry.prj' from '/Users/martin/Library/Caches/geodatasets/guerry.zip' to '/Users/martin/Library/Caches/geodatasets/guerry.zip.unzip'
Database-level API¶
The database of dataset metadata is handled via custom dict-based classes.
- class geodatasets.Dataset(*args, **kwargs)¶
A dict with attribute-access and that can be called to update keys
- Attributes:
path
Get the absolute path to a file in the local storage.
A dict with attribute-access and that can be called to update keys.
- property path: str¶
Get the absolute path to a file in the local storage.
If it’s not in the local storage, it will be downloaded.
For Datasets containing multiple files, the archive is automatically extracted.
- Returns:
- str
loacal path
- class geodatasets.Bunch¶
A dict with attribute-access
Bunch
is used to storeDataset
objects.Methods
filter
([keyword, name, geometry_type, function])Return a subset of the
Bunch
matching the filter conditionsflatten
()Return the nested
Bunch
collapsed into the one level dictionary.query_name
(name)Return
Dataset
based on the name query- filter(keyword: str | None = None, name: str | None = None, geometry_type: str | None = None, function: Callable[[Dataset], bool] = None) Bunch ¶
Return a subset of the
Bunch
matching the filter conditionsEach
Dataset
within aBunch
is checked against one or more specified conditions and kept if they are satisfied or removed if at least one condition is not met.- Parameters:
- keywordstr (optional)
Condition returns
True
ifkeyword
string is present in any string value in aDataset
object. The comparison is not case sensitive.- namestr (optional)
Condition returns
True
ifname
string is present in the name attribute ofDataset
object. The comparison is not case sensitive.- geometry_typestr (optional)
Condition returns
True
ifDataset.geometry_type()
is matches thegeometry_type
. Possible options are["Point", "LineString", "Polygon", "Mixed"]
. The comparison is not case sensitive.- functioncallable (optional)
Custom function taking
Dataset
as an argument and returns bool. Iffunction
is given, other parameters are ignored.
- Returns:
- filteredBunch
Examples
>>> from geodatasets import data
You can filter all Point datasets:
>>> points = data.filter(geometry_type="Point")
Or all datasets with
chicago
in the name:>>> chicago_datasets = data.filter(name="chicago")
You can use keyword search to find all datasets in a CSV format:
>>> csv_datasets = data.filter(keyword="csv")
You can combine multiple conditions to find datasets based with
chicago
in name of Polygon geometry type:>>> chicago_polygons = data.filter(name="chicago", geometry_type="Polygon")
You can also pass custom function that takes
Dataset
and returns boolean value. You can then find all datasets withnrows
smaller than 100:>>> def small_data(dataset): ... if hasattr(dataset, "nrows") and dataset.nrows < 100: ... return True ... return False >>> small = data.filter(function=small_data)
- flatten() dict ¶
Return the nested
Bunch
collapsed into the one level dictionary.Dictionary keys are
Dataset
names (e.g.geoda.airbnb
) and its values areDataset
objects.- Returns:
- flatteneddict
dictionary of
Dataset
objects
- query_name(name: str) Dataset ¶
Return
Dataset
based on the name queryReturns a matching
Dataset
from theBunch
if thename
contains the same letters in the same order as the item’s name irrespective of the letter case, spaces, dashes and other characters. See examples for details.- Parameters:
- namestr
Name of the data item. Formatting does not matter.
- Returns:
- match: Dataset