251 lines
11 KiB
Plaintext
251 lines
11 KiB
Plaintext
Metadata-Version: 2.4
|
|
Name: pooch
|
|
Version: 1.9.0
|
|
Summary: A friend to fetch your data files
|
|
Author-email: The Pooch Developers <fatiandoaterra@protonmail.com>
|
|
Maintainer-email: Leonardo Uieda <leo@uieda.com>
|
|
License-Expression: BSD-3-Clause
|
|
Project-URL: Documentation, https://www.fatiando.org/pooch
|
|
Project-URL: Changelog, https://www.fatiando.org/pooch/latest/changes.html
|
|
Project-URL: Bug Tracker, https://github.com/fatiando/pooch/issues
|
|
Project-URL: Source Code, https://github.com/fatiando/pooch
|
|
Keywords: data,download,caching,http
|
|
Classifier: Development Status :: 5 - Production/Stable
|
|
Classifier: Intended Audience :: Science/Research
|
|
Classifier: Intended Audience :: Developers
|
|
Classifier: Intended Audience :: Education
|
|
Classifier: Operating System :: OS Independent
|
|
Classifier: Topic :: Scientific/Engineering
|
|
Classifier: Topic :: Software Development :: Libraries
|
|
Classifier: Typing :: Typed
|
|
Classifier: Programming Language :: Python :: 3 :: Only
|
|
Classifier: Programming Language :: Python :: 3.9
|
|
Classifier: Programming Language :: Python :: 3.10
|
|
Classifier: Programming Language :: Python :: 3.11
|
|
Classifier: Programming Language :: Python :: 3.12
|
|
Classifier: Programming Language :: Python :: 3.13
|
|
Classifier: Programming Language :: Python :: 3.14
|
|
Requires-Python: >=3.9
|
|
Description-Content-Type: text/markdown
|
|
License-File: LICENSE.txt
|
|
License-File: AUTHORS.md
|
|
Requires-Dist: platformdirs>=2.5.0
|
|
Requires-Dist: packaging>=20.0
|
|
Requires-Dist: requests>=2.19.0
|
|
Provides-Extra: progress
|
|
Requires-Dist: tqdm<5.0.0,>=4.41.0; extra == "progress"
|
|
Provides-Extra: sftp
|
|
Requires-Dist: paramiko>=2.7.0; extra == "sftp"
|
|
Provides-Extra: xxhash
|
|
Requires-Dist: xxhash>=1.4.3; extra == "xxhash"
|
|
Provides-Extra: test
|
|
Requires-Dist: pytest-httpserver; extra == "test"
|
|
Requires-Dist: pytest-localftpserver; extra == "test"
|
|
Dynamic: license-file
|
|
|
|
<img src="https://github.com/fatiando/pooch/raw/main/doc/_static/readme-banner.png" alt="Pooch: A friend to fetch your data files">
|
|
|
|
<p align="center">
|
|
<a href="https://www.fatiando.org/pooch"><strong>Documentation</strong> (latest)</a> •
|
|
<a href="https://www.fatiando.org/pooch/dev"><strong>Documentation</strong> (main branch)</a> •
|
|
<a href="https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md"><strong>Contributing</strong></a> •
|
|
<a href="https://www.fatiando.org/contact/"><strong>Contact</strong></a> •
|
|
<a href="https://github.com/orgs/fatiando/discussions"><strong>Ask a question</strong></a>
|
|
</p>
|
|
|
|
<p align="center">
|
|
Part of the <a href="https://www.fatiando.org"><strong>Fatiando a Terra</strong></a> project
|
|
</p>
|
|
|
|
<p align="center">
|
|
<a href="https://pypi.python.org/pypi/pooch"><img src="http://img.shields.io/pypi/v/pooch.svg?style=flat-square" alt="Latest version on PyPI"></a>
|
|
<a href="https://github.com/conda-forge/pooch-feedstock"><img src="https://img.shields.io/conda/vn/conda-forge/pooch.svg?style=flat-square" alt="Latest version on conda-forge"></a>
|
|
<a href="https://codecov.io/gh/fatiando/pooch"><img src="https://img.shields.io/codecov/c/github/fatiando/pooch/main.svg?style=flat-square" alt="Test coverage status"></a>
|
|
<a href="https://pypi.python.org/pypi/pooch"><img src="https://img.shields.io/pypi/pyversions/pooch.svg?style=flat-square" alt="Compatible Python versions."></a>
|
|
<a href="https://doi.org/10.21105/joss.01943"><img src="https://img.shields.io/badge/doi-10.21105%2Fjoss.01943-blue?style=flat-square" alt="DOI used to cite Pooch"></a>
|
|
</p>
|
|
|
|
## About
|
|
|
|
> Just want to download a file without messing with `requests` and `urllib`?
|
|
> Trying to add sample datasets to your Python package?
|
|
> **Pooch is here to help!**
|
|
|
|
*Pooch* is a **Python library** that can manage data by **downloading files**
|
|
from a server (only when needed) and storing them locally in a data **cache**
|
|
(a folder on your computer).
|
|
|
|
* Pure Python and minimal dependencies.
|
|
* Download files over HTTP, FTP, and from data repositories like Zenodo and figshare.
|
|
* Built-in post-processors to unzip/decompress the data after download.
|
|
* Designed to be extended: create custom downloaders and post-processors.
|
|
|
|
Are you a **scientist** or researcher? Pooch can help you too!
|
|
|
|
* Host your data on a repository and download using the DOI.
|
|
* Automatically download data using code instead of telling colleagues to do it themselves.
|
|
* Make sure everyone running the code has the same version of the data files.
|
|
|
|
## Projects using Pooch
|
|
|
|
[SciPy](https://github.com/scipy/scipy),
|
|
[scikit-image](https://github.com/scikit-image/scikit-image),
|
|
[xarray](https://github.com/pydata/xarray),
|
|
[Ensaio](https://github.com/fatiando/ensaio),
|
|
[GemPy](https://github.com/cgre-aachen/gempy),
|
|
[MetPy](https://github.com/Unidata/MetPy),
|
|
[napari](https://github.com/napari/napari),
|
|
[Satpy](https://github.com/pytroll/satpy),
|
|
[yt](https://github.com/yt-project/yt),
|
|
[PyVista](https://github.com/pyvista/pyvista),
|
|
[icepack](https://github.com/icepack/icepack),
|
|
[histolab](https://github.com/histolab/histolab),
|
|
[seaborn-image](https://github.com/SarthakJariwala/seaborn-image),
|
|
[Open AR-Sandbox](https://github.com/cgre-aachen/open_AR_Sandbox),
|
|
[climlab](https://github.com/climlab/climlab),
|
|
[mne-python](https://github.com/mne-tools/mne-python),
|
|
[GemGIS](https://github.com/cgre-aachen/gemgis),
|
|
[SHTOOLS](https://github.com/SHTOOLS/SHTOOLS),
|
|
[MOABB](https://github.com/NeuroTechX/moabb),
|
|
[GeoViews](https://github.com/holoviz/geoviews),
|
|
[ScopeSim](https://github.com/AstarVienna/ScopeSim),
|
|
[Brainrender](https://github.com/brainglobe/brainrender),
|
|
[pyxem](https://github.com/pyxem/pyxem),
|
|
[cellfinder](https://github.com/brainglobe/cellfinder),
|
|
[PVGeo](https://github.com/OpenGeoVis/PVGeo),
|
|
[geosnap](https://github.com/oturns/geosnap),
|
|
[BioCypher](https://github.com/biocypher/biocypher),
|
|
[cf-xarray](https://github.com/xarray-contrib/cf-xarray),
|
|
[Scirpy](https://github.com/scverse/scirpy),
|
|
[rembg](https://github.com/danielgatis/rembg),
|
|
[DASCore](https://github.com/DASDAE/dascore),
|
|
[scikit-mobility](https://github.com/scikit-mobility/scikit-mobility),
|
|
[Py-ART](https://github.com/ARM-DOE/pyart),
|
|
[HyperSpy](https://github.com/hyperspy/hyperspy),
|
|
[RosettaSciIO](https://github.com/hyperspy/rosettasciio),
|
|
[eXSpy](https://github.com/hyperspy/exspy),
|
|
[SPLASH](https://github.com/Adam-Boesky/astro_SPLASH),
|
|
[xclim](https://github.com/Ouranosinc/xclim),
|
|
[CLISOPS](https://github.com/roocs/clisops),
|
|
[scXpand](https://github.com/yizhak-lab-ccg/scXpand)
|
|
|
|
|
|
> If you're using Pooch, **send us a pull request** adding your project to the list.
|
|
|
|
## Example
|
|
|
|
For a **scientist downloading a data file** for analysis:
|
|
|
|
```python
|
|
import pooch
|
|
import pandas as pd
|
|
|
|
# Download a file and save it locally, returning the path to it.
|
|
# Running this again will not cause a download. Pooch will check the hash
|
|
# (checksum) of the downloaded file against the given value to make sure
|
|
# it's the right file (not corrupted or outdated).
|
|
fname_bathymetry = pooch.retrieve(
|
|
url="https://github.com/fatiando-data/caribbean-bathymetry/releases/download/v1/caribbean-bathymetry.csv.xz",
|
|
known_hash="md5:a7332aa6e69c77d49d7fb54b764caa82",
|
|
)
|
|
|
|
# Pooch can also download based on a DOI from certain providers.
|
|
fname_gravity = pooch.retrieve(
|
|
url="doi:10.5281/zenodo.5882430/southern-africa-gravity.csv.xz",
|
|
known_hash="md5:1dee324a14e647855366d6eb01a1ef35",
|
|
)
|
|
|
|
# Load the data with Pandas
|
|
data_bathymetry = pd.read_csv(fname_bathymetry)
|
|
data_gravity = pd.read_csv(fname_gravity)
|
|
```
|
|
|
|
For **package developers** including sample data in their projects:
|
|
|
|
```python
|
|
"""
|
|
Module mypackage/datasets.py
|
|
"""
|
|
from importlib import resources
|
|
import pandas
|
|
import pooch
|
|
|
|
# Get the version string from your project. You have one of these, right?
|
|
from . import version
|
|
|
|
# Create a new friend to manage your sample data storage
|
|
GOODBOY = pooch.create(
|
|
# Folder where the data will be stored. For a sensible default, use the
|
|
# default cache folder for your OS.
|
|
path=pooch.os_cache("mypackage"),
|
|
# Base URL of the remote data store. Will call .format on this string
|
|
# to insert the version (see below).
|
|
base_url="https://github.com/myproject/mypackage/raw/{version}/data/",
|
|
# Pooches are versioned so that you can use multiple versions of a
|
|
# package simultaneously. Use PEP440 compliant version number. The
|
|
# version will be appended to the path.
|
|
version=version,
|
|
# If a version as a "+XX.XXXXX" suffix, we'll assume that this is a dev
|
|
# version and replace the version with this string.
|
|
version_dev="main",
|
|
# An environment variable that overwrites the path.
|
|
env="MYPACKAGE_DATA_DIR",
|
|
# The cache file registry. A dictionary with all files managed by this
|
|
# pooch. Keys are the file names (relative to *base_url*) and values
|
|
# are their respective SHA256 hashes. Files will be downloaded
|
|
# automatically when needed (see fetch_gravity_data).
|
|
registry={"gravity-data.csv": "89y10phsdwhs09whljwc09whcowsdhcwodcydw"}
|
|
)
|
|
# You can also load the registry from a file. Each line contains a file
|
|
# name and it's sha256 hash separated by a space. This makes it easier to
|
|
# manage large numbers of data files. The registry file should be packaged
|
|
# and distributed with your software.
|
|
GOODBOY.load_registry(
|
|
resources.open_text("mypackage", "registry.txt")
|
|
)
|
|
|
|
# Define functions that your users can call to get back the data in memory
|
|
def fetch_gravity_data():
|
|
"""
|
|
Load some sample gravity data to use in your docs.
|
|
"""
|
|
# Fetch the path to a file in the local storage. If it's not there,
|
|
# we'll download it.
|
|
fname = GOODBOY.fetch("gravity-data.csv")
|
|
# Load it with numpy/pandas/etc
|
|
data = pandas.read_csv(fname)
|
|
return data
|
|
```
|
|
|
|
## Getting involved
|
|
|
|
🗨️ **Contact us:**
|
|
Find out more about how to reach us at
|
|
[fatiando.org/contact](https://www.fatiando.org/contact/).
|
|
|
|
👩🏾💻 **Contributing to project development:**
|
|
Please read our
|
|
[Contributing Guide](https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md)
|
|
to see how you can help and give feedback.
|
|
|
|
🧑🏾🤝🧑🏼 **Code of conduct:**
|
|
This project is released with a
|
|
[Code of Conduct](https://github.com/fatiando/community/blob/main/CODE_OF_CONDUCT.md).
|
|
By participating in this project you agree to abide by its terms.
|
|
|
|
> **Imposter syndrome disclaimer:**
|
|
> We want your help. **No, really.** There may be a little voice inside your
|
|
> head that is telling you that you're not ready, that you aren't skilled
|
|
> enough to contribute. We assure you that the little voice in your head is
|
|
> wrong. Most importantly, **there are many valuable ways to contribute besides
|
|
> writing code**.
|
|
>
|
|
> *This disclaimer was adapted from the*
|
|
> [MetPy project](https://github.com/Unidata/MetPy).
|
|
|
|
## License
|
|
|
|
This is free software: you can redistribute it and/or modify it under the terms
|
|
of the **BSD 3-clause License**. A copy of this license is provided in
|
|
[`LICENSE.txt`](https://github.com/fatiando/pooch/blob/main/LICENSE.txt).
|