API reference¶
The geodatasets package has top-level functions that will cover 95% of use cases and
other tooling handling the database.
Top-level API¶
In most cases, you will be using get_path() to download the data and get the path
to the local storage, get_url() to get the link to the original dataset in its
online location and fetch() to pre-download data to the local storage.
- geodatasets.get_path(name)¶
Get the absolute path to a file in the local storage.
If it’s not in the local storage, it will be downloaded.
nameis queried usingquery_name(), so it only needs to contain the same letters in the same order as the item’s name irrespective of the letter case, spaces, dashes and other characters.For Datasets containing multiple files, the archive is automatically extracted.
- Parameters:
- namestr
Name of the data item. Formatting does not matter.
Examples
When it does not exist in the cache yet, it gets downloaded first:
>>> path = geodatasets.get_path('GeoDa AirBnB') Downloading file 'airbnb.zip' from 'https://geodacenter.github.io/data-and-lab//data/airbnb.zip' to '/Users/martin/Library/Caches/geodatasets'. >>> path '/Users/martin/Library/Caches/geodatasets/airbnb.zip'
Every other call returns the path directly:
>>> path2 = geodatasets.get_path("geoda_airbnb") >>> path2 '/Users/martin/Library/Caches/geodatasets/airbnb.zip'
- geodatasets.get_url(name)¶
Get the URL from which the dataset can be fetched.
nameis queried usingquery_name(), so it only needs to contain the same letters in the same order as the item’s name irrespective of the letter case, spaces, dashes and other characters.No data is downloaded.
- Parameters:
- namestr
Name of the data item. Formatting does not matter.
- Returns:
- str
link to the online dataset
See also
Examples
>>> geodatasets.get_url('GeoDa AirBnB') 'https://geodacenter.github.io/data-and-lab//data/airbnb.zip'
>>> geodatasets.get_url('geoda_airbnb') 'https://geodacenter.github.io/data-and-lab//data/airbnb.zip'
- geodatasets.fetch(name)¶
Download the data to the local storage.
This is useful when it is expected that some data will be needed later but you want to avoid download at that time.
nameis queried usingquery_name(), so it only needs to contain the same letters in the same order as the item’s name irrespective of the letter case, spaces, dashes and other characters.For Datasets containing multiple files, the archive is automatically extracted.
- Parameters:
- namestr, list
Name of the data item(s). Formatting does not matter.
See also
Examples
>>> geodatasets.fetch('nybb') Downloading file 'nybb_22c.zip' from 'https://data.cityofnewyork.us/api/geospatial/tqmj-j8zm?method=export&format=Original' to '/Users/martin/Library/Caches/geodatasets'. Extracting 'nybb_22c/nybb.shp' from '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip' to '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip.unzip' Extracting 'nybb_22c/nybb.shx' from '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip' to '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip.unzip' Extracting 'nybb_22c/nybb.dbf' from '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip' to '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip.unzip' Extracting 'nybb_22c/nybb.prj' from '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip' to '/Users/martin/Library/Caches/geodatasets/nybb_22c.zip.unzip'
>>> geodatasets.fetch(['geoda airbnb', 'geoda guerry']) Downloading file 'airbnb.zip' from 'https://geodacenter.github.io/data-and-lab//data/airbnb.zip' to '/Users/martin/Library/Caches/geodatasets'. Downloading file 'guerry.zip' from 'https://geodacenter.github.io/data-and-lab//data/guerry.zip' to '/Users/martin/Library/Caches/geodatasets'. Extracting 'guerry/guerry.shp' from '/Users/martin/Library/Caches/geodatasets/guerry.zip' to '/Users/martin/Library/Caches/geodatasets/guerry.zip.unzip' Extracting 'guerry/guerry.dbf' from '/Users/martin/Library/Caches/geodatasets/guerry.zip' to '/Users/martin/Library/Caches/geodatasets/guerry.zip.unzip' Extracting 'guerry/guerry.shx' from '/Users/martin/Library/Caches/geodatasets/guerry.zip' to '/Users/martin/Library/Caches/geodatasets/guerry.zip.unzip' Extracting 'guerry/guerry.prj' from '/Users/martin/Library/Caches/geodatasets/guerry.zip' to '/Users/martin/Library/Caches/geodatasets/guerry.zip.unzip'
Database-level API¶
The database of dataset metadata is handled via custom dict-based classes.
- class geodatasets.Dataset(*args, **kwargs)¶
A dict with attribute-access and that can be called to update keys
- Attributes:
pathGet the absolute path to a file in the local storage.
A dict with attribute-access and that can be called to update keys.
- property path: str¶
Get the absolute path to a file in the local storage.
If it’s not in the local storage, it will be downloaded.
For Datasets containing multiple files, the archive is automatically extracted.
- Returns:
- str
loacal path
- class geodatasets.Bunch¶
A dict with attribute-access
Bunchis used to storeDatasetobjects.Methods
filter([keyword, name, geometry_type, function])Return a subset of the
Bunchmatching the filter conditionsflatten()Return the nested
Bunchcollapsed into the one level dictionary.query_name(name)Return
Datasetbased on the name query- filter(keyword: str | None = None, name: str | None = None, geometry_type: str | None = None, function: Callable[[Dataset], bool] = None) Bunch¶
Return a subset of the
Bunchmatching the filter conditionsEach
Datasetwithin aBunchis checked against one or more specified conditions and kept if they are satisfied or removed if at least one condition is not met.- Parameters:
- keywordstr (optional)
Condition returns
Trueifkeywordstring is present in any string value in aDatasetobject. The comparison is not case sensitive.- namestr (optional)
Condition returns
Trueifnamestring is present in the name attribute ofDatasetobject. The comparison is not case sensitive.- geometry_typestr (optional)
Condition returns
TrueifDataset.geometry_type()is matches thegeometry_type. Possible options are["Point", "LineString", "Polygon", "Mixed"]. The comparison is not case sensitive.- functioncallable (optional)
Custom function taking
Datasetas an argument and returns bool. Iffunctionis given, other parameters are ignored.
- Returns:
- filteredBunch
Examples
>>> from geodatasets import data
You can filter all Point datasets:
>>> points = data.filter(geometry_type="Point")
Or all datasets with
chicagoin the name:>>> chicago_datasets = data.filter(name="chicago")
You can use keyword search to find all datasets in a CSV format:
>>> csv_datasets = data.filter(keyword="csv")
You can combine multiple conditions to find datasets based with
chicagoin name of Polygon geometry type:>>> chicago_polygons = data.filter(name="chicago", geometry_type="Polygon")
You can also pass custom function that takes
Datasetand returns boolean value. You can then find all datasets withnrowssmaller than 100:>>> def small_data(dataset): ... if hasattr(dataset, "nrows") and dataset.nrows < 100: ... return True ... return False >>> small = data.filter(function=small_data)
- flatten() dict¶
Return the nested
Bunchcollapsed into the one level dictionary.Dictionary keys are
Datasetnames (e.g.geoda.airbnb) and its values areDatasetobjects.- Returns:
- flatteneddict
dictionary of
Datasetobjects
- query_name(name: str) Dataset¶
Return
Datasetbased on the name queryReturns a matching
Datasetfrom theBunchif thenamecontains the same letters in the same order as the item’s name irrespective of the letter case, spaces, dashes and other characters. See examples for details.- Parameters:
- namestr
Name of the data item. Formatting does not matter.
- Returns:
- match: Dataset