Contributing to geodatasets
¶
Contributions to geodatasets
are very welcome. They are likely to be accepted more
quickly if they follow these guidelines.
There are two main groups of contributions - adding new data sources and contributions to the codebase and documentation.
Data sources¶
If you want to add a new dataset, simply add its details to
geodatasets/json/database.json
.
You can add a single Dataset
or a Bunch
of Dataset
s. Use the following
schema to add a single dataset:
{
"dataset_name": {
"url": "https://your-site.com/direct-link-to/my_file.zip",
"license": "CC-0",
"attribution": "University of Github",
"name": "dataset_name",
"description": "Contents of my file",
"geometry_type": "Polygon",
"nrows": 77,
"ncols": 20,
"details": "https://your-site.com/link-to-explanantion/",
"hash": "a2ab1e3f938226d287dd76cde18c00e2d3a260640dd826da7131827d9e76c824",
"filename": "my_file.zip"
},
}
If you want to add a bunch of related datasets (e.g. different files from a single source),
you can group then within a Bunch
using the following schema:
{
"provider_bunch_name": {
"first_dataset_name": {
"url": "https://your-site.com/direct-link-to/my_file.zip",
"license": "CC-0",
"attribution": "University of Github",
"name": "dataset_name",
"description": "Contents of my file",
"geometry_type": "Polygon",
"nrows": 77,
"ncols": 20,
"details": "https://your-site.com/link-to-explanantion/",
"hash": "a2ab1e3f938226d287dd76cde18c00e2d3a260640dd826da7131827d9e76c824",
"filename": "my_file.zip"
},
"second_dataset_name": {
"url": "https://your-site.com/direct-link-to/my_file.zip",
"license": "CC-0",
"attribution": "University of Github",
"name": "dataset_name",
"description": "Contents of my file",
"geometry_type": "Point",
"nrows": 77,
"ncols": 20,
"details": "https://your-site.com/link-to-explanantion/",
"hash": "a2ab1e3f938226d287dd76cde18c00e2d3a260640dd826da7131827d9e76c824",
"filename": "my_file.zip",
"members": ["use_only_this.geojson"]
}
},
}
It is mandatory to always specify at least name
, url
, hash
and filename
. hash
is a sha256 hash of the file to check that a user gets the expected file and a
filename
specifies how the downloaded file will be called. Ensure that it has a correct
suffix. Don’t forget to add any other custom attributes you’d like. Attribute members
has
a specific meaning and specifies file (or files in case of ESRI Shapefile) that shall be
extracted from the archive and used.
Code and documentation¶
At this stage of geodatasets
development, the priorities are to define a simple,
usable, and stable API and to have clean, maintainable, readable code.
In general, geodatasets
follows the conventions of the GeoPandas project where
applicable.
In particular, when submitting a pull request:
All existing tests should pass. Please make sure that the test suite passes, both locally and on GitHub Actions. Status on GHA will be visible on a pull request. GHA are automatically enabled on your own fork as well. To trigger a check, make a PR to your own fork.
Ensure that documentation has built correctly. It will be automatically built for each PR.
New functionality should include tests. Please write reasonable tests for your code and make sure that they pass on your pull request.
Classes, methods, functions, etc. should have docstrings and type hints. The first line of a docstring should be a standalone summary. Parameters and return values should be documented explicitly.
Follow PEP 8 when possible. We use Black and Flake8 to ensure a consistent code format throughout the project. For more details see the GeoPandas contributing guide.
Imports should be grouped with standard library imports first, 3rd-party libraries next, and
geodatasets
imports third. Within each grouping, imports should be alphabetized. Always use absolute imports when possible, and explicit relative imports for local imports when necessary in tests.geodatasets
supports Python 3.7+ only. When possible, do not introduce additional dependencies. If that is necessary, make sure they can be treated as optional.