The basics

Model structure

A model is composed of three main elements: bricks, processes, and fluxes. The bricks are any component that can contain water, such as a snowpack, a glacier, or a ground reservoir. They can contain one or more water containers. For example, the snowpack has a snow and a liquid water container. These bricks are assigned with processes that can extract water. Processes are for example snowmelt, ET, or outflow according some behaviour. The water extracted from the bricks by the processes are then transferred to fluxes, which deliver it to other bricks, the atmosphere, or the outlet.

For now, only pre-built structures are available. One can create a pre-built instance of a model by using the provided class (to be considered as the blueprint) with some options. The options and the existing models are detailed in the models page.

socont = models.Socont(soil_storage_nb=2)

Spatial structure

The catchment is discretized into sub units named hydro units. These hydro units can represent HRUs (hydrological response units), pixels, elevation bands, etc. Their properties are loaded from csv files containing at minimum data on each unit area and elevation (mean elevation of each hydro unit). Loading such a file can be done as follows:

hydro_units = hb.HydroUnits()
hydro_units.load_from_csv(
   'path/to/file.csv', area_unit='m2', column_elevation='elevation',
   column_area='area')

The default land cover is named ground and it has no specific behaviour. When there is more than one land cover, these can be specified. Each hydro unit is then assigned a fraction of the provided land covers For example, for a catchment with a pure ice glacier and a debris-covered glacier, one then needs to provide the area for each land cover type and for each hydro unit (more information in the Python API):

land_cover_names = ['ground', 'glacier_ice', 'glacier_debris']
land_cover_types = ['ground', 'glacier', 'glacier']

hydro_units = hb.HydroUnits(land_cover_types, land_cover_names)
hydro_units.load_from_csv(
   'path/to/file.csv', area_unit='km', column_elevation='Elevation',
   columns_areas={'ground': 'Area Non Glacier',
                  'glacier_ice': 'Area Ice',
                  'glacier_debris': 'Area Debris'})

The csv file containing elevation bands data can look like the following example.

Example of a csv file containing elevation bands data.

Elevation, Area Non Glacier, Area Ice, Area Debris
3986, 2.408, 0, 0
4022, 2.516, 0, 0
4058, 2.341, 0, 0.003
4094, 2.351, 0, 0.006
4130, 2.597, 0, 0.01
4166, 2.726, 0, 0.006
4202, 2.687, 0, 0.061
4238, 2.947, 0, 0.065
4274, 2.924, 0.013, 0.06
4310, 2.785, 0.019, 0.058
4346, 2.578, 0.052, 0.176
4382, 2.598, 0.072, 0.369
4418, 2.427, 0.129, 0.384
4454, 2.433, 0.252, 0.333
4490, 2.210, 0.288, 0.266
4526, 2.136, 0.341, 0.363
4562, 1.654, 0.613, 0.275

Parameters

The parameters are managed as parameter sets in an object that is an instance of the ParameterSet class. It means that there is a single variable containing all the parameters for a model. Within it, different properties are defined for each parameter (more information in the Python API):

component: the component to which it refers to (e.g., glacier, slow_reservoir)
name: the detailed name of the parameter (e.g., degree_day_factor)
unit: the parameter unit (e.g., mm/d/°C)
aliases: aliases for the parameter name; this is the short version of the parameter name (e.g., a_snow)
value: the value assigned to the parameter
min: the minimum value the parameter can accept
max: the maximum value the parameter can accept
default_value: the parameter default value; only few parameters have default values, such as the melting temperature, and these are usually not necessary to calibrate
mandatory: defines if the parameter value needs to be provided by the user (i.e. it has no default value)
prior: prior distribution to use for the calibration. See the calibration page

Creating a parameter set

When using a pre-build model structure, the parameters for this structure can be generated using the model.generate_parameters() function. For example, the following code creates a definition of the Socont model structure and generates the parameter set for the given structure, accounting for the options, such as the number of soil storages. Within this parameter set, the basic attributes are defined, such as the name, aliases, units, min/max values, etc.

socont = models.Socont(soil_storage_nb=2)
parameters = socont.generate_parameters()

Assigning the parameter values

To set parameter values, the set_values() function of the parameter set can be used with a dictionary as argument. The dictionary can use the full parameter names (e.g. snowpack:degree_day_factor with no space), or one of the aliases (e.g., a_snow):

parameters.set_values({'A': 100, 'k_slow': 0.01, 'a_snow': 5})

Parameter constraints

Some constraints can be added between parameters. Some of these are built-in when the parameter set is generated and are described in the respective model description. For example, in GSM-Socont, the degree day for the snow must be inferior to the one for the ice (a_snow < a_ice).

Constraints between parameters can be added by the user as follows:

parameters.define_constraint('k_slow_2', '<', 'k_slow_1')

The supported operators are: > (or gt), >= (or ge), < (or lt), <= (or le).

On the contrary, pre-defined constraints can be removed:

parameters.remove_constraint('a_snow', '<', 'a_ice')

Parameter ranges

The parameters are usually generated with a pre-defined range. This range is used to ensure that a provided value falls within the authorized range and to define the boundaries for the calibration algorithm. The pre-defined ranges can be changed as follows:

parameters.change_range('a_snow', 2, 5)

Forcing data

The meteorological data is handled by the Forcing class. It handles the spatialization of the weather data to create per-unit time series. Therefore, when creating an instance of this class, the hydro units must be provided:

forcing = hb.Forcing(hydro_units)

The data, for example station time series, can the be loaded from csv files. Multiple files can be loaded successively, or a single file can contain different variables (as different columns). One needs to specify which column contains the dates, their format, and which column header represent what kind of variable. For example (more information in the Python API):

forcing.load_from_csv(
    'path/to/forcing.csv', column_time='Date', time_format='%d/%m/%Y',
    content={'precipitation': 'precip(mm/day)', 'temperature': 'temp(C)',
             'pet': 'pet_sim(mm/day)'})

A csv file containing forcing data can look like the following example:

Example of a csv file containing forcing data.

Date,precip(mm/day),temp(C),sunshine_dur(h),pet_sim(mm/day)
01/01/1981,8.24,-0.98,0.42,0.58
02/01/1981,4.02,-3.35,0.08,0
03/01/1981,22.27,0.96,0.44,0.95
04/01/1981,28.85,-2.11,0.08,0
05/01/1981,8.89,-5.62,0.07,0.06
06/01/1981,17.49,-4.72,0.09,0
07/01/1981,8.26,-8.58,0.14,0
08/01/1981,0.14,-11.47,81.73,0
09/01/1981,0.91,-7.37,0.1,0.05
10/01/1981,0.54,-3.23,0.09,0
11/01/1981,0.02,-4.57,1.94,0
12/01/1981,2.28,-4.01,69.95,0
13/01/1981,7.03,-6.39,0.04,0
14/01/1981,9.68,-7.54,73.98,0
15/01/1981,16.23,-3.95,0.23,0.01
16/01/1981,2.77,-7.28,0.18,0.19
17/01/1981,6.49,-1.57,1.29,0.19
18/01/1981,5.53,-3.7,0.07,0
...

Spatialization

The spatialization operation needs to be specified to generate per-unit timeseries. This definition needs information on the variable, the method to use and its parameters:

forcing.define_spatialization(
    variable='temperature', method='additive_elevation_gradient',
    ref_elevation=1250, gradient=-0.6)

As we might also want to calibrate the parameters for such operations, these can also be specified as a reference to a parameter instead of a fixed value. In such case, one must add a data parameter as in the following example:

forcing.define_spatialization(
    variable='temperature', method='additive_elevation_gradient',
    ref_elevation=1250, gradient='param:temp_gradients')

parameters.add_data_parameter('temp_gradients', -0.6, min_value=-1, max_value=0)

The variables supported so far are: temperature, precipitation, pet. The methods and parameters are described in the Python API.

Running the model

Once the hydro units, parameters and forcing defined, the model can be set up and run:

socont.setup(spatial_structure=hydro_units, output_path='/path/to/dir',
             start_date='1981-01-01', end_date='2020-12-31')

socont.run(parameters=parameters, forcing=forcing)

Then, the outlet discharge (in mm/d) can be retrieved:

sim_ts = socont.get_outlet_discharge()

More outputs can be extracted and saved to a netCDF file for further analysis:

socont.dump_outputs('/output/dir/')

The state variables can be initialized using the initialize_state_variables() function between the setup() and the run() functions. The initialization runs the model for the given period and saves the final state variables. These values are then used as initial state variables for the next run:

socont.initialize_state_variables(parameters=parameters, forcing=forcing)
socont.run(parameters=parameters, forcing=forcing)

When the model is executed multiple times successively, it clears its previous states. When the states initialization provided by initialize_state_variables() has been used, the model resets its state variables to these saved values.

Evaluation

Some metrics can be computed by providing the observation time series (in mm/d):

# Preparation of the obs data
obs = hb.Observations()
obs.load_from_csv('/path/to/obs.csv', column_time='Date', time_format='%d/%m/%Y',
                  content={'discharge': 'Discharge (mm/d)'})
obs_ts = obs.data_raw[0]

nse = socont.eval('nse', obs_ts)
kge_2012 = socont.eval('kge_2012', obs_ts)

The metrics are provided by the HydroErr package . All the metrics listed under their website can be used and are named according to their function names.

Outputs

The results can be accessed in different ways and with different levels of detail:

The direct outputs from the model instance.
A dumped netCDF file containing more details for each hydro unit.
Other outputs such as the spatialized forcing or the SPOTPY outputs.

Direct outputs

Some outputs from the model instance are available after a model run as long as the Python session is still alive. The first one is the discharge time series at the outlet, provided by get_outlet_discharge():

sim_ts = model.get_outlet_discharge()

Some outputs provide integrated values over the simulation period:

get_total_outlet_discharge(): Integrated discharge at the outlet
get_total_et(): Integrated ET
get_total_water_storage_changes(): Changes in all water reservoirs between the beginning of the period and the end.
get_total_snow_storage_changes(): Changes in snow storage between the beginning of the period and the end.

Dumped netCDF file

A detailed netCDF file can be dumped with model.dump_outputs('some/path'). The content of the file depends on the option record_all provided at model creation. When True, all fluxes and states are recorded, which slows down the model execution.

The file has the following dimensions:

time: The temporal dimension
hydro_units: The hydro units (e.g., elevation bands)
aggregated_values: Elements recorded at the catchment scale (lumped)
distributed_values: Elements recorded at each hydro unit ([semi-]distributed)
land_covers: The different land covers

It contains three important global attributes:

labels_aggregated: The labels of the lumped elements (fluxes and states)
labels_distributed: The labels of the distributed elements (fluxes and states)
labels_land_covers: The labels of the land covers

For example, for the GSM-Socont model with two different glacier types provides the following attributes:

labels_aggregated =
   "glacier-area-rain-snowmelt-storage:content",
   "glacier-area-rain-snowmelt-storage:outflow:output",
   "glacier-area-icemelt-storage:content",
   "glacier-area-icemelt-storage:outflow:output",
   "outlet";

labels_distributed =
   "ground:content",
   "ground:infiltration:output",
   "ground:runoff:output",
   "glacier-ice:content",
   "glacier-ice:outflow-rain-snowmelt:output",
   "glacier-ice:melt:output",
   "glacier-debris:content",
   "glacier-debris:outflow-rain-snowmelt:output",
   "glacier-debris:melt:output",
   "ground-snowpack:content",
   "ground-snowpack:snow",
   "ground-snowpack:melt:output",
   "glacier-ice-snowpack:content",
   "glacier-ice-snowpack:snow",
   "glacier-ice-snowpack:melt:output",
   "glacier-debris-snowpack:content",
   "glacier-debris-snowpack:snow",
   "glacier-debris-snowpack:melt:output",
   "slow-reservoir:content",
   "slow-reservoir:et:output",
   "slow-reservoir:outflow:output",
   "slow-reservoir:percolation:output",
   "slow-reservoir:overflow:output",
   "slow-reservoir-2:content",
   "slow-reservoir-2:outflow:output",
   "surface-runoff:content",
   "surface-runoff:outflow:output";

labels_land_covers =
   "ground",
   "glacier-ice",
   "glacier-debris";

Then, it provides the following variables:

time (1D): The dates as Modified Julian Dates (days since 1858-11-17 00:00).
hydro_units_ids (1D): The IDs of the hydro units.
hydro_units_areas (1D): The area of the hydro units.
sub_basin_values (2D): The time series of the aggregated elements (c.f. labels_aggregated)
hydro_units_values (2D): the time series of the distributed elements (c.f. labels_distributed). Please not here the differences between:
- the fluxes (mm), i.e. output elements are already weighted by the land cover fraction and the relative hydro unit area. Thus, these elements can be directly summed over all hydro units to obtain the total contribution of a given component (e.g., ice melt), even when the hydro units have different areas.
- the state variables (mm) such as content or snow elements represent the water stored in the respective reservoirs. In this case, this value is not weighted and cannot be summed over the catchment, but must be weighted by the land cover fraction and the relative hydro unit area.
land_cover_fractions (2D, optional): the temporal evolution of the land cover fractions.

Others

Some other outputs are available:

Dumbed forcing: the forcing object can also be saved as a netCDF file using the forcing.create_file(). It thus contains the spatialized forcing time series.
During the calibration procedure, SPOTPY saves all assessments in csv or sql tables.