Download, Process, Extract#

We have developed a Python library called geoprepare to download and process the input data and then extract crop masks and EO variables needed for running a crop yield model or generating AgMet graphics.

To execute this library, we will be using Python configuration files to separate our code and user configurable settings. Here is a tutorial on how Python handles configuration files.

Python code#

We will be using the geoprepare package that you installed here. Before running any code, consider using tmux or screen to retain access to the terminal even if you are accidentally logged out.

from geoprepare import geoprepare, geoextract, geomerge

# Provide full path to the configuration files
# Download and preprocess data
geoprepare.run(['PATH_TO_geoprepare.txt'])

# Extract crop masks and EO variables
geoextract.run(['PATH_TO_geoprepare.txt', 'PATH_TO_geoextract.txt'])

# Merge EO files into one, this is needed to create AgMet graphics and to run the crop yield model
geomerge.run(['PATH_TO_geoprepare.txt', 'PATH_TO_geoextract.txt'])

Before running the code above, we need to specify the two configuration files. geoprepare.txt contains configuration settings for downloading and processing the input data. geoextract.txt contains configuration settings for extracting crop masks and EO variables.

Configuration files#

geoprepare.txt#

datasets: Specify which datasets need to be downloaded and processed
dir_base: Path where to store the downloaded and processed files
start_year, end_year: Specify time-period for which data should be downloaded and processed
logfile: What directory name to use for the log files
level: Which level to use for logging
parallel_process: Whether to use multiple CPUs
fraction_cpus: What fraction of available CPUs to use

[DATASETS]
datasets = ['CPC', 'SOIL-MOISTURE', 'NDVI', 'CHIRPS', 'CHIRPS-GEFS', 'ESI']

[PATHS]
dir_base = /home/servir/GEOCIF
dir_input = ${dir_base}/input
dir_log = ${dir_base}/log
dir_interim = ${dir_input}/interim
dir_download = ${dir_input}/download
dir_output = ${dir_base}/output
dir_global_datasets = ${dir_input}/global_datasets
dir_masks = ${dir_global_datasets}/masks
dir_regions = ${dir_global_datasets}/regions
dir_regions_shp = ${dir_regions}/shps
dir_crop_masks = ${dir_input}/crop_masks
dir_models = ${dir_input}/models

[AGERA5]
start_year = 2022

[AVHRR]
data_dir = https://www.ncei.noaa.gov/data/avhrr-land-normalized-difference-vegetation-index/access

[CHIRPS]
fill_value = -2147483648
prelim = /pub/org/chc/products/CHIRPS-2.0/prelim/global_daily/tifs/p05/
final = /pub/org/chc/products/CHIRPS-2.0/global_daily/tifs/p05/
start_year = 2022

[CHIRPS-GEFS]
fill_value = -2147483648
data_dir = /pub/org/chc/products/EWX/data/forecasts/CHIRPS-GEFS_precip_v12/15day/precip_mean/

[CPC]
data_dir = ftp://ftp.cdc.noaa.gov/Datasets

[ESI]
data_dir = https://gis1.servirglobal.net//data//esi//

[FLDAS]

[LST]
num_update_days = 7

[NDVI]
product = MOD09CMG
vi = ndvi
scale_glam = False
scale_mark = True
print_missing = False

[SOIL-MOISTURE]
data_dir = https://gimms.gsfc.nasa.gov/SMOS/SMAP/L03/

[LOGGING]
level = ERROR

[DEFAULT]
logfile = log
parallel_process = False
fraction_cpus = 0.5
start_year = 2022
end_year = 2022

geoextract.txt#

countries: List of countries to process
forecast_seasons: List of seasons to process
mask: Name of file to use as a mask for cropland/croptype
redo: Redo the processing for all days (True) or only days with new data (False)
threshold: Use a threshold value (True) or a percentile (False) on the cropland/croptype mask
floor: Value below which to set the mask to 0
ceil: Value above which to set the mask to 1
eo_model: List of datasets to extract from
calendar_file: File with crop calendar information
statistics_file: File with crop yield, production and area statistics

[kenya]
category = EWCM
scale = ['admin1']  ; can be admin1 (state level) or admin2 (county level)
season = [1]  ; 1 is primary/long season, 2 is secondary/short season
crops = ['mz', 'sr', 'ml', 'rc', 'ww', 'tf']
use_cropland_mask = True

[ww]
mask = cropland_v9.tif

[mz]
mask = cropland_v9.tif

[sb]
mask = cropland_v9.tif

[rc]
mask = cropland_v9.tif

[tf]
mask = cropland_v9.tif

[sr]
mask = cropland_v9.tif

[ml]
mask = cropland_v9.tif

[DEFAULT]
redo = False
threshold = True
floor = 20
ceil = 90
scale = ['admin1']
season = [1]
countries = ['kenya']
forecast_seasons = [2022]
mask = cropland_v9.tif
calendar_file = crop_calendar.xlsx
shp_boundary = EWCM_Level_1.shp
statistics_file = 'statistics.csv'
eo_model = ['ndvi', 'cpc_tmax', 'cpc_tmin', 'cpc_precip', 'esi_4wk', 'soil_moisture_as1', 'soil_moisture_as2']

geocif.txt#

seasons: List of seasons to process: 1 (first/long season), 2 (second/short season)
region: admin0: country, admin1: state, admin2: county

[kenya]
crops = ['mz']
seasons = [1]
region = admin1  ; admin0: country, admin1: state, admin2: county

[AGMET]
eo_plot = ['ndvi', 'cpc_tmax', 'cpc_tmin', 'chirps', 'esi_4wk', 'soil_moisture_as1', 'soil_moisture_as2']

[ML]
models = ['merf']

[MLOPS]
neptune_username = ritvik
neptune_project = geocif

[DEFAULT]
countries = ['kenya']
forecast_seasons = [2022]
model = ['merf']
eo_model = ['ndvi', 'cpc_tmax', 'cpc_tmin', 'cpc_precip', 'esi_4wk']