A Python package to prepare (download, extract, process input data) for GEOCIF and related models
We recommend that you use the conda package manager to install the geoprepare
library and all its
dependencies. If you do not have it installed already, you can get it from the Anaconda distribution
If you intend to download AgERA5 data, you will need to install the CDS API. You can do this by following the instructions here
geoprepare
requires multiple Python GIS packages including gdal
and rasterio
. These packages are not always easy
to install, especially on Windows. To make the process easier, you can optionally create a new environment using the
following commands, specify the python version you have on your machine (python >= 3.9 is recommended). we use the pygis
library
to install multiple Python GIS packages including gdal
and rasterio
.
conda create --name <name_of_environment> python=3.x
conda activate <name_of_environment>
conda install -c conda-forge mamba
mamba install -c conda-forge gdal
mamba install -c conda-forge rasterio
mamba install -c conda-forge xarray
mamba install -c conda-forge rioxarray
mamba install -c conda-forge pyresample
mamba install -c conda-forge cdsapi
mamba install -c conda-forge pygis
pip install wget
pip install pyl4c
Install the octvi package to download MODIS data
pip install git+https://github.com/ritviksahajpal/octvi.git
Downloading from the NASA distributed archives (DAACs) requires a personal app key. Users must
configure the module using a new console script, octviconfig
. After installation, run octviconfig
in your command prompt to prompt the input of your personal app key. Information on obtaining app keys
can be found here
pip install --upgrade geoprepare
pip install --upgrade --no-deps --force-reinstall git+https://github.com/ritviksahajpal/geoprepare.git
Navigate to the directory containing setup.py
and run the following command:
pip install .
geoprepare.run([r”PATH_TO_geoprepare.txt”])
* Execute the following code to extract crop masks and EO data
```python
from geoprepare import geoextract
# Extract crop masks and EO variables
geoextract.run([r"PATH_TO_geoprepare.txt", r"PATH_TO_geoextract.txt"])
geomerge.run([r”PATH_TO_geoprepare.txt”, r”PATH_TO_geoextract.txt”])
Before running the code above, we need to specify the two configuration files:
* `geoprepare.txt` contains configuration settings for downloading and processing the input data.
* `geoextract.txt` contains configuration settings for extracting crop masks and EO variables.
## Configuration files
### geoprepare.txt
* `datasets`: Specify which datasets need to be downloaded and processed
* `dir_base`: Path where to store the downloaded and processed files
* `start_year`, `end_year`: Specify time-period for which data should be downloaded and processed
* `logfile`: What directory name to use for the log files
* `level`: Which level to use for [logging](https://www.loggly.com/ultimate-guide/python-logging-basics/)
* `parallel_process`: Whether to use multiple CPUs
* `fraction_cpus`: What fraction of available CPUs to use
```python
[DATASETS]
datasets = ['CPC', 'SOIL-MOISTURE', 'LST', 'CPC', 'AVHRR', 'AGERA5', 'CHIRPS', 'CHIRPS-GEFS']
[PATHS]
dir_base = /home/servir/GEOCIF
dir_input = ${dir_base}/input
dir_log = ${dir_base}/log
dir_interim = ${dir_input}/interim
dir_download = ${dir_input}/download
dir_output = ${dir_base}/output
dir_global_datasets = ${dir_input}/global_datasets
dir_masks = ${dir_global_datasets}/masks
dir_regions = ${dir_global_datasets}/regions
dir_regions_shp = ${dir_regions}/shps
dir_crop_masks = ${dir_input}/crop_masks
dir_models = ${dir_input}/models
[AGERA5]
start_year = 2022
[AVHRR]
data_dir = https://www.ncei.noaa.gov/data/avhrr-land-normalized-difference-vegetation-index/access
[CHIRPS]
fill_value = -2147483648
prelim = /pub/org/chc/products/CHIRPS-2.0/prelim/global_daily/tifs/p05/
final = /pub/org/chc/products/CHIRPS-2.0/global_daily/tifs/p05/
start_year = 2022
[CHIRPS-GEFS]
fill_value = -2147483648
data_dir = /pub/org/chc/products/EWX/data/forecasts/CHIRPS-GEFS_precip_v12/15day/precip_mean/
[CPC]
data_dir = ftp://ftp.cdc.noaa.gov/Datasets
[ESI]
data_dir = https://gis1.servirglobal.net//data//esi//
[FLDAS]
[LST]
num_update_days = 7
[NDVI]
product = MOD09CMG
vi = ndvi
scale_glam = False
scale_mark = True
print_missing = False
[SOIL-MOISTURE]
data_dir = https://gimms.gsfc.nasa.gov/SMOS/SMAP/L03/
[LOGGING]
level = ERROR
[DEFAULT]
logfile = log
parallel_process = False
fraction_cpus = 0.5
start_year = 2022
end_year = 2022
countries
: List of countries to processforecast_seasons
: List of seasons to processmask
: Name of file to use as a mask for cropland/croptyperedo
: Redo the processing for all days (True
) or only days with new data (False
)threshold
: Use a threshold
value (True
) or a percentile
(False
) on the cropland/croptype maskfloor
: Value below which to set the mask to 0ceil
: Value above which to set the mask to 1eo_model
: List of datasets to extract from
```python
[kenya]
category = EWCM
scales = [‘admin_1’] ; can be admin_1 (state level) or admin_2 (county level)
growing_seasons = [1] ; 1 is primary/long season, 2 is secondary/short season
crops = [‘mz’, ‘sr’, ‘ml’, ‘rc’, ‘ww’, ‘tf’]
use_cropland_mask = True[rwanda] category = EWCM scales = [‘admin_1’] ; can be admin_1 (state level) or admin_2 (county level) growing_seasons = [1] ; 1 is primary/long season, 2 is secondary/short season crops = [‘mz’, ‘sr’, ‘ml’, ‘rc’, ‘ww’, ‘tf’] use_cropland_mask = True
[malawi] category = EWCM scales = [‘admin_1’] ; can be admin_1 (state level) or admin_2 (county level) growing_seasons = [1] ; 1 is primary/long season, 2 is secondary/short season crops = [‘mz’, ‘sr’, ‘ml’, ‘rc’, ‘ww’, ‘tf’] use_cropland_mask = True
[zambia] category = EWCM scales = [‘admin_1’] ; can be admin_1 (state level) or admin_2 (county level) growing_seasons = [1] ; 1 is primary/long season, 2 is secondary/short season crops = [‘mz’, ‘sr’, ‘ml’, ‘rc’, ‘ww’, ‘tf’] use_cropland_mask = True
[united_republic_of_tanzania] category = EWCM scales = [‘admin_1’] ; can be admin_1 (state level) or admin_2 (county level) growing_seasons = [1] ; 1 is primary/long season, 2 is secondary/short season crops = [‘mz’, ‘sr’, ‘ml’, ‘rc’, ‘ww’, ‘tf’] use_cropland_mask = True
[ww] mask = cropland_v9.tif
[mz] mask = cropland_v9.tif
[sb] mask = cropland_v9.tif
[rc] mask = cropland_v9.tif
[tf] mask = cropland_v9.tif
[sr] mask = cropland_v9.tif
[ml] mask = cropland_v9.tif
[EWCM] calendar_file = EWCM_2021-6-17.xlsx
[AMIS] calendar_file = AMISCM_2021-6-17.xlsx
[DEFAULT] redo = False threshold = True floor = 20 ceil = 90 scales = [‘admin_1’] growing_seasons = [1] countries = [‘kenya’] forecast_seasons = [2022] mask = cropland_v9.tif shp_boundary = EWCM_Level_1.shp statistics_file = statistics.csv zone_file = countries.csv calendar_file = crop_calendar.csv eo_model = [‘ndvi’, ‘cpc_tmax’, ‘cpc_tmin’, ‘chirps’, ‘chirps_gefs’, ‘esi_4wk’, ‘soil_moisture_as1’, ‘soil_moisture_as2’]
## Accessing EO data using the earthaccess library
```python
import geopandas as gpd
from tqdm import tqdm
from pathlib import Path
from geoprepare.eoaccess import eoaccess
dg = gpd.read_file(PATH_TO_SHAPEFILE, engine="pyogrio")
# Convert to CRS 4326 if not already
if dg.crs != "EPSG:4326":
dg = dg.to_crs("EPSG:4326")
# Iterate over each row of the shapefile
for index, row in tqdm(dg.iterrows(), desc="Iterating over shapefile", total=len(dg)):
# Get bbox from geometry of the row
bbox = row.geometry.bounds
obj = eoaccess.NASAEarthAccess(
dataset=["HLSL30", "HLSS30"],
bbox=bbox,
temporal=(f"{row['year']}-01-01", f"{row['year']}-12-31"),
output_dir=".",
)
obj.search_data()
if obj.results:
obj.download_parallel()
obj = eoaccess.EarthAccessProcessor(
dataset=["HLSL30", "HLSS30"],
input_dir=".",
shapefile=Path(PATH_TO_SHAPEFILE),
)
obj.mosaic()
setup.py
and run the following command:
pip freeze > requirements.txt
python setup.py sdist
twine upload dist/geoprepare-A.B.C.tar.gz
This package was created with Cookiecutter and the giswqs/pypackage project template.