Components

Project files

hydrobricks.load_project(source: str | Path | dict, base_dir: str | Path | None = None, setup: bool = True) → Project[source]

Build a ready-to-run model setup from a YAML project file (or dict).

The configuration is validated as a whole before anything is built: unknown keys (with ‘did you mean’ suggestions), missing files, missing CSV columns, wrong types and unknown model or parameter names are all reported together in a single ConfigurationError.

Parameters:

source – Path to a YAML project file, or an equivalent (already parsed) mapping.
base_dir – Directory used to resolve the relative paths in the configuration. Defaults to the project file directory (or the current working directory when source is a dict).
setup – Whether to setup() the model over the full simulation span (default). Pass False when something must happen between the model construction and its setup — e.g. configuring recordings for auxiliary observations — then call Project.setup() yourself.

Returns:

The wired (model, forcing, parameters, observations, periods) bundle.

Return type:

Project

class hydrobricks.Project(model: Model, forcing: Forcing, parameters: ParameterSet, observations: DischargeObservations | None, periods: Periods, config: dict, path: Path | None = None, output_dir: Path | None = None, hydro_units: HydroUnits | None = None, catchment: Any | None = None)[source]

Bases: object

The wired-up objects built from a project file by load_project().

model

The model instance, already setup() over the full simulation span (with the project spin-up). Call run() or model.run(...).

Type:: hydrobricks.models.model.Model

forcing

The Forcing with its spatialization operations defined (applied lazily at run time).

Type:: hydrobricks.forcing.Forcing

parameters

The generated ParameterSet, with the values from the project file applied. If the file does not value every parameter, set the remaining ones before running.

Type:: hydrobricks.parameters.ParameterSet

observations

The loaded observed discharge, or None when the project file has no observations section.

Type:: hydrobricks.evaluation.discharge.DischargeObservations | None

periods

The Periods (calibration / validation / simulation and spin-up policy) declared in the project file.

Type:: hydrobricks.periods.Periods

config

The raw configuration mapping the project was built from.

Type:: dict

path

The project file path, or None when built from a dict.

Type:: pathlib.Path | None

output_dir

The resolved output directory the model writes to.

Type:: pathlib.Path | None

hydro_units

The HydroUnits (loaded or delineated).

Type:: hydrobricks.hydro_units.HydroUnits | None

catchment

The Catchment, when the project declares an outline/dem; otherwise None.

Type:: Any | None

run() → pandas.Series[source]

Run the model over the simulation span and return the discharge.

Return type:: The simulated outlet discharge as a date-indexed series.
Raises:: ConfigurationError – If some parameters still have no value (they are listed with their valid ranges).

Catchment

Bases: object

Creation of catchment-related data

Parameters:

outline – Path to the outline of the catchment.
land_cover_types – The land cover types of the catchment.
land_cover_names – The land cover names of the catchment.
hydro_units_data – The hydro units data of the catchment.

area

The area of the catchment.

Type:: float

crs

The crs of the catchment outline.

Type:: str

outline

The outline of the catchment.

Type:: shapely.geometry.Polygon

dem

The DEM of the catchment [m].

Type:: rasterio.DatasetReader

dem_data

The masked DEM data of the catchment.

Type:: np.ndarray

slope

The slope map of the catchment [degrees].

Type:: np.ndarray

aspect

The aspect map of the catchment.

Type:: np.ndarray

map_unit_ids

The unit ids as a numpy array matching the DEM extent.

Type:: np.ndarray

hydro_units

The hydro units of the catchment.

Type:: HydroUnits

static calculate_cast_shadows(*args, **kwargs) → numpy.ndarray[source]: Call the calculate_cast_shadows method of the PotentialSolarRadiation class.

calculate_connectivity(*args, **kwargs) → pandas.DataFrame[source]: Call the calculate_connectivity method of the Connectivity class.

calculate_daily_potential_radiation(*args, **kwargs) → None[source]: Call the calculate_daily_potential_radiation method of the PotentialSolarRadiation class.

calculate_slope_aspect() → None[source]: Call the calculate_slope_aspect method of the Topography class.

close() → None[source]

Close all open resources (DEM dataset and MemoryFiles).

Called automatically when using the catchment as a context manager, or can be called manually to explicitly release resources.

Examples

Using as context manager (recommended):

>>> with Catchment(outline='boundary.shp') as catchment:
...     catchment.extract_dem('dem.tif')

Manual cleanup:

>>> catchment = Catchment(outline='boundary.shp')
>>> try:
...     catchment.extract_dem('dem.tif')
... finally:
...     catchment.close()

property connectivity: Any

Lazy-loaded connectivity module.

Returns:: Connectivity processor for the catchment, loaded on first access.
Return type:: CatchmentConnectivity

create_dem_pixel_geometry(i: int, j: int) → shapely.geometry.Polygon[source]

Create a shapely geometry of the DEM pixel.

Parameters:

i – The row of the pixel.
j – The column of the pixel.

Return type:

The shapely geometry of the pixel.

create_elevation_bands(*args, **kwargs) → None[source]: Call the create_elevation_bands method of the Discretization class.

property discretization: Any

Lazy-loaded discretization module.

Returns:: Discretization processor for the catchment, loaded on first access.
Return type:: CatchmentDiscretization

discretize_by(*args, **kwargs) → None[source]: Call the discretize_by method of the Discretization class.

extract_attribute_raster(raster_path: str | Path, attr_name: str, resample_to_dem_resolution: bool = True, resampling: str = 'average', replace_nans_by_zeros: bool = True, reproject_crs: bool = False) → bool[source]

Extract spatial attributes (raster) for the catchment.

Parameters:

raster_path – Path of the raster file containing the attribute data.
attr_name – Name of the attribute to store in self.attributes dictionary.
resample_to_dem_resolution – If True, resample the attribute raster to DEM resolution. Default: True
resampling – Resampling method to use when resample_to_dem_resolution is True. Options: ‘nearest’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘lanczos’, ‘average’, ‘mode’, ‘gauss’, ‘max’, ‘min’, ‘med’, ‘q1’, ‘q3’, ‘sum’, ‘rms’ Default: ‘average’
replace_nans_by_zeros – If True, replace NaN values with zero in the output raster. Default: True
reproject_crs – If True, the raster is warped onto the DEM grid even when its CRS differs from the catchment CRS (the raster is not required to match the catchment CRS and is not masked by the outline beforehand). Use this for datasets provided in a different CRS, e.g. ESA WorldCover (EPSG:4326) against a projected DEM. Implies resampling onto the DEM grid. Default: False

Returns:

True if extraction was successful, False otherwise.

Return type:

bool

Raises:

FileNotFoundError – If the raster file does not exist.
ValueError – If the resampling method is not recognized.

extract_dem(raster_path: str | Path) → bool[source]

Extract the DEM data for the catchment. Does not handle change in coordinates.

Parameters:: raster_path – Path of the DEM raster file.
Returns:: True if extraction was successful, False otherwise.
Return type:: bool
Raises:: FileNotFoundError – If the raster file does not exist.

extract_land_cover_from_raster(*args, **kwargs) → None[source]

Extract land-cover fractions from a categorical raster (e.g. ESA WorldCover).

Call the extract_from_raster method of the CatchmentLandCover class.

extract_land_cover_from_shapefile(*args, **kwargs) → None[source]

Extract land-cover fractions from a vector dataset (e.g. swissTLMRegio).

Call the extract_from_shapefile method of the CatchmentLandCover class.

extract_unit_mean_lat_lon(mask_unit: numpy.ndarray) → tuple[float, float][source]

Extract the mean latitude and longitude for a hydro unit.

Calculates the mean coordinates of pixels within a unit mask and converts them from the catchment CRS to latitude/longitude (EPSG:4326).

Parameters:: mask_unit – Boolean mask array identifying the cells of the hydro unit.
Returns:: Tuple of (mean_latitude, mean_longitude) in degrees.
Return type:: tuple[float, float]

get_attribute_raster_x_resolution(attr_name: str = 'dem') → float[source]

Get the given attribute raster x resolution.

Parameters:: attr_name – Name of the attribute.
Return type:: The attribute raster x resolution.

get_attribute_raster_y_resolution(attr_name: str = 'dem') → float[source]

Get the given attribute raster y resolution.

Parameters:: attr_name – Name of the attribute.
Return type:: The attribute raster y resolution.

get_dem_mean_lat_lon() → tuple[float, float][source]

Get the mean latitude and longitude of the DEM extent.

Calculates the central coordinates of the catchment DEM and converts them from the catchment CRS to latitude/longitude (EPSG:4326).

Returns:: Tuple of (mean_latitude, mean_longitude) in degrees.
Return type:: tuple[float, float]

get_dem_pixel_area() → float[source]

Get the DEM pixel area.

Return type:: The DEM pixel area.

get_dem_x_resolution() → float[source]

Get the DEM x resolution.

Return type:: The DEM x resolution.

get_dem_y_resolution() → float[source]

Get the DEM y resolution.

Return type:: The DEM y resolution.

get_hillshade(*args, **kwargs) → numpy.ndarray[source]: Call the get_hillshade method of the Topography class.

get_hydro_unit_count() → int[source]

Get the number of hydro units.

Return type:: The number of hydro units.

get_hydro_units_attributes() → HydroUnits[source]

Extract the hydro units attributes.

Return type:: The hydro units attributes.

get_hydro_units_elevations() → numpy.ndarray[source]

Get the elevation of the hydro units.

Return type:: The elevation of the hydro units.

get_mean_elevation() → float[source]: Call the get_mean_elevation method of the Topography class.

static get_solar_azimuth_to_north(*args, **kwargs) → float | np.ndarray[source]: Call the get_solar_azimuth_to_north method of the PotentialSolarRadiation class.

static get_solar_azimuth_to_south(*args, **kwargs) → numpy.ndarray[source]: Call the get_solar_azimuth_to_south method of the PotentialSolarRadiation class.

static get_solar_declination_rad(*args, **kwargs) → float[source]: Call the get_solar_declination_rad method of the PotentialSolarRadiation class.

static get_solar_hour_angle_limit(*args, **kwargs) → float | np.ndarray[source]: Call the get_solar_hour_angle_limit method of the PotentialSolarRadiation class.

static get_solar_zenith(*args, **kwargs) → float | np.ndarray[source]: Call the get_solar_zenith method of the PotentialSolarRadiation class.

initialize_area_from_land_cover_change(land_cover_name: str, land_cover_change: pandas.DataFrame) → None[source]

Initialize the HydroUnits cover area from a land cover change object.

Must be called before Model.setup(): it updates the land cover fractions that the model reads at build time. Calling it afterwards changes the hydro units’ settings but does not propagate to an already-built model.

Parameters:

land_cover_name – The name of the land cover to initialize.
land_cover_change – The land cover change dataframe.

initialize_land_cover_fractions() → None[source]

Initialize land cover fractions for all hydro units.

Sets up the initial fractional areas for each land cover type within each hydro unit based on the available land cover data.

property land_cover: Any

Lazy-loaded land-cover extraction module.

Returns:: Land-cover extraction processor for the catchment, loaded on first access.
Return type:: CatchmentLandCover

load_hydro_units_from_csv(path: str | Path) → None[source]

Load hydro units from a csv file.

Parameters:: path – Path to the csv file.

load_mean_annual_radiation_raster(*args, **kwargs) → None[source]: Call the load_mean_annual_radiation_raster method of the PotentialSolarRadiation class.

load_unit_ids_from_raster(path: str, filename: str = 'unit_ids.tif') → None[source]

Load hydro units from a raster file.

Parameters:

path – Path to the directory containing the raster file. If the path is a file, it will be used as the full path.
filename – Name of the raster file. Default is ‘unit_ids.tif’.

mask_dem(shapefile: gpd.GeoDataFrame, nodata: float = -9999, all_touched: bool = True) → np.ndarray[source]

Rasterize vector geometries onto the DEM grid.

Masks the catchment DEM with the geometries of shapefile (which must be in the catchment CRS): cells covered by a geometry keep their DEM value, cells outside get nodata. Used to derive per-cell presence masks from polygons (e.g. land cover, glacier extent, sub-catchments).

Parameters:

shapefile – GeoDataFrame of geometries, expressed in the catchment CRS.
nodata – Value assigned to cells outside the geometries. Default: -9999.
all_touched – If True (default), every cell touched by a geometry is included; if False, only cells whose centre falls within a geometry.

Returns:

2D array over the DEM grid holding the DEM value where a geometry is present and nodata elsewhere.

Return type:

np.ndarray

resample_dem_and_calculate_slope_aspect(*args, **kwargs) → tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray][source]: Call the resample_dem_and_calculate_slope_aspect method of the Topography class.

save_hydro_units_to_csv(path: str | Path) → None[source]

Save the hydro units to a csv file.

Parameters:: path – Path to the output file.

save_unit_ids_raster(output_path: str | Path, output_filename: str = 'unit_ids.tif') → None[source]

Save the unit ids raster to a file.

Parameters:

output_path – Path to the output file.
output_filename – Name of the output file. Default is ‘unit_ids.tif’.

property solar_radiation: Any

Lazy-loaded solar radiation module.

Returns:: Solar radiation processor for the catchment, loaded on first access.
Return type:: PotentialSolarRadiation

property topography: Any

Lazy-loaded topography analysis module.

Returns:: Topography processor for the catchment, loaded on first access.
Return type:: CatchmentTopography

upscale_and_save_mean_annual_radiation_rasters(*args, **kwargs) → None[source]: Call the upscale_and_save_mean_annual_radiation_rasters method of the PotentialSolarRadiation class.

HydroUnits

class hydrobricks.HydroUnits(land_cover_types: list[str] | None = None, land_cover_names: list[str] | None = None, data: pd.DataFrame | None = None)[source]

Bases: object

Class for the hydro units

Parameters:

land_cover_types – List of land cover types. Default: [‘open’]
land_cover_names – List of land cover names. Default: [‘open’]
data – DataFrame containing the hydro units data.

land_cover_types

List of land cover types. Default: [‘open’]

Type:: list[str]

land_cover_names

List of land cover names. Default: [‘open’]

Type:: list[str]

hydro_units

Dataframe containing the hydro units data.

Type:: pd.DataFrame

FRACTION_PREFIX: ClassVar[str] = 'fraction-'

GENERIC_COVER_ALIASES = ('open', 'ground', 'generic', 'generic_land_cover'): Generic (soil-bearing) land cover aliases. ‘open’ is the canonical name; the others are accepted for backward compatibility. The generic cover absorbs the residual area when other land cover fractions change.

add_land_cover(name: str, cover_type: str = 'generic_land_cover') → None[source]

Register a new land cover on the hydro units.

Appends the land cover (name + type) and adds its fraction-<name> column, initialized to 0.0 for every hydro unit. Existing fractions are left untouched, so the generic soil cover keeps absorbing the residual area until the new cover’s fractions are set (e.g. via initialize_from_land_cover_change).

Parameters:

name – The name of the land cover to add.
cover_type – The land cover type identifier (e.g. ‘glacier’, ‘forest’, ‘open’). Default: ‘generic_land_cover’.

Raises:

DataError – If a land cover with the same name already exists.

add_property(column_tuple: tuple[str, str], values: numpy.ndarray, set_first: bool = False) → None[source]

Add a property to the hydro units.

Adds a new column to the hydro_units DataFrame with the specified property name, unit, and values. Can optionally insert as the first column.

Parameters:

column_tuple – Tuple containing (property_name, unit_string). Example: (‘elevation’, ‘m’)
values – Numpy array containing the property values for each hydro unit.
set_first – If True, the property is added as the first column. Default: False

Raises:

ValueError – If values length doesn’t match the number of hydro units.

check_land_cover_fractions_not_empty() → None[source]

Check that the land cover fractions are not empty.

Validates that all land cover fractions have been defined. If there is a single land cover type (e.g. ‘open’), automatically sets it to 1.0 for all hydro units.

Raises:: ValueError – If any land cover fraction contains NaN values (when multiple land cover types exist).

get_generic_cover_name() → str[source]

Return the generic soil land cover name (the one that absorbs residual area).

Picks the land cover whose name or type is a generic alias (‘open’, ‘ground’, …). Falls back to the first land cover if none is explicitly generic.

get_hydro_unit_count() → int[source]

Get the number of hydro units.

Return type:: Number of hydro units.

get_ids() → pandas.Series[source]: Get the hydro unit ids.

has(prop: str) → bool[source]

Check if the hydro units have a given property.

Parameters:: prop – The property name to check. Should match a column name in the hydro_units DataFrame.
Returns:: True if the property is present, False otherwise.
Return type:: bool

initialize_from_land_cover_change(land_cover_name: str, land_cover_change: pandas.DataFrame) → None[source]

Initialize the hydro units from the first values of a land cover change dataframe.

Updates the land cover fractions for specified hydro units based on a land cover change dataframe. Automatically adjusts the generic soil land cover fraction (‘open’, or its ‘ground’ alias) to maintain conservation.

Must be called before Model.setup(): the fractions set here become the model’s initial extent, captured at build time. Calling it afterwards updates the settings but does not propagate to an already-built model.

Parameters:

land_cover_name – The name of the land cover to initialize.
land_cover_change – The land cover change dataframe with columns ‘hydro_unit’ and area values.

Raises:

ValueError – If computed land cover fraction is not in the range [0, 1].

initialize_land_cover_fractions() → None[source]

Initialize land cover fractions with default values.

Sets the generic soil land cover fraction (‘open’, or its ‘ground’ alias) to 1.0 and all other land cover types to 0.0 for all hydro units. Used as a starting point before applying specific land cover changes.

Read hydro units properties from CSV file. The file must contain two header rows. The first row contains the column names and the second row contains the units. The file must contain at minimum the units area.

Parameters:

path – Path to the CSV file containing hydro units data.
column_elevation – Column name containing the elevation values. If None, looks for ‘elevation’ column. Default: None
column_area – Column name containing the total area values. If None, looks for ‘area’ column. Default: None
columns_areas – Dictionary mapping land cover names to area column names. Cannot be used with column_area. Default: None
other_columns – Dictionary mapping property names to column names in the CSV file. Example: {‘slope’: ‘Slope’, ‘aspect’: ‘Aspect’} Default: None

Raises:

FileNotFoundError – If the CSV file does not exist.
ValueError – If required columns are missing or are inconsistent.

populate_bounded_instance() → None[source]

Populate the SettingsBasin instance from current hydro units data.

Updates the internal SettingsBasin object with current hydro unit properties, sorted by elevation in descending order. Includes land cover fractions and all custom properties.

save_as(path: str) → None[source]

Create a file containing the hydro unit properties. Such a file can be used in the command-line version of hydrobricks.

Saves hydro units and land cover information to a netCDF4 file with proper dimensions and variable attributes.

Parameters:: path – Path of the file to create.
Raises:: ImportError – If netcdf4 is not installed.

save_to_csv(path: str | Path) → None[source]

Save the hydro units to a csv file.

Exports the hydro units DataFrame to a CSV file with multi-level header containing both property names and units.

Parameters:: path – Path to the output file.
Raises:: ValueError – If no hydro units data is available.

set_connectivity(connectivity: pd.DataFrame | Path | str) → None[source]

Set the connectivity of the hydro units.

Configures how water flows between hydro units by setting lateral connections with specified ratios. Can accept connectivity data as a DataFrame or load from file.

Parameters:

connectivity – File or Dataframe containing the connectivity information as generated by catchment.calculate_connectivity().

Raises:

TypeError – If connectivity is not a DataFrame or valid file path.
ValueError – If connectivity DataFrame is missing required columns or has invalid values.

ParameterSet

class hydrobricks.ParameterSet[source]

Bases: object

Class for the parameter sets

add_aliases(parameter_name: str, aliases: list[str] | str) → None[source]

Add aliases to a parameter.

Parameters:

parameter_name – The name of the parameter with the related component (e.g., snowpack:degree_day_factor).
aliases – Aliases to the parameter name, such as names used in other implementations (e.g., kgl, an). Aliases must be unique.

Add a parameter related to the data.

Parameters:

name – The name of the parameter.
value – The parameter value.
min_val – Minimum value allowed for the parameter.
max_val – Maximum value allowed for the parameter.
unit – The unit of the parameter.

property allow_changing: list[str]

Get the list of parameters to assess during calibration.

Returns:: List of parameter names that are allowed to change.
Return type:: list[str]

are_valid() → bool[source]

Check if all the parameters are defined and have a value. Alias of is_valid.

Returns:: True if all parameters are defined and have a value, False otherwise.
Return type:: bool

change_range(parameter: str, min_val: float | None = None, max_val: float | None = None) → None[source]

Change the value range of a parameter.

Only the bounds that are provided are changed; passing None (the default) leaves that bound untouched. This allows raising a lower bound without having to restate the maximum.

Parameters:

parameter – Name (or alias) of the parameter
min_val – New minimum value, or None to keep the current minimum.
max_val – New maximum value, or None to keep the current maximum.

constraints_satisfied() → bool[source]

Check if the constraints between parameters are satisfied.

Returns:: True if constraints are satisfied, False otherwise.
Return type:: bool

define_constraint(parameter_1: str, operator: str, parameter_2: str) → None[source]

Defines a constraint between 2 parameters (e.g., paramA > paramB)

Parameters:

parameter_1 – The name of the first parameter.
operator – The operator (e.g. ‘<=’).
parameter_2 – The name of the second parameter.

Examples

parameter_set.define_constraint(‘paramA’, ‘>=’, ‘paramB’)

Define a parameter by setting its properties.

Parameters:

component – The component (brick) name to which the parameter refer (e.g., snowpack, glacier, surface_runoff). It can be a string of a list of components when the parameter is shared between components (e.g., melt_factor in the temperature index method).
name – The name of the parameter in the C++ code of hydrobricks (e.g., degree_day_factor, response_factor).
unit – The unit of the parameter.
aliases – Aliases to the parameter name, such as names used in other implementations (e.g., kgl, an). Aliases must be unique.
min_val – Minimum value allowed for the parameter.
max_val – Maximum value allowed for the parameter.
default – The parameter default value.
mandatory – If the parameter needs to be defined or if it can silently use the default value.

generate_parameters(land_cover_types: list[str], land_cover_names: list[str], options: dict, structure: dict)[source]

Generate a parameters object for the provided model options and structure.

Parameters:

land_cover_types – The land cover types.
land_cover_names – The land cover names.
options – The model options.
structure – The model structure.

get(name: str) → float[source]

Get the value of a parameter by name.

Parameters:: name – The name of the parameter.
Returns:: The parameter value.
Return type:: float

get_for_spotpy() → list[spotpy.parameter][source]

Get the parameters to assess ready to be used in spotpy.

Return type:: A list of the parameters as spotpy objects.

get_model_parameters() → pandas.DataFrame[source]

Get the model-only parameters (excluding data-related parameters).

Returns:: DataFrame containing model parameters only.
Return type:: pd.DataFrame

get_transform(name: str) → ParameterTransform | None[source]

Return the transform attached to a parameter, or None.

Parameters:: name – The name or one of the aliases of the parameter.
Returns:: The transform if one is set, otherwise None. Parameters without a transform (or unknown names) return None.
Return type:: ParameterTransform | None

get_transformed(name: str) → float[source]

Get the transformed value of a parameter by name.

Returns the parameter value mapped through its transform (if any). If the parameter has no transform, the real value is returned unchanged.

Parameters:: name – The name of the parameter.
Returns:: The transformed parameter value.
Return type:: float

get_undefined() → list[str][source]

Get the undefined parameters.

Returns:: List of the undefined parameter names.
Return type:: list[str]

has(name: str) → bool[source]

Check if a parameter exists.

Parameters:: name – The name of the parameter.
Returns:: True if found, False otherwise.
Return type:: bool

is_for_forcing(parameter_name: str) → bool[source]

Check if the parameter relates to forcing data.

Parameters:: parameter_name – The name of the parameter.
Returns:: True if relates to forcing data, False otherwise.
Return type:: bool

is_valid() → bool[source]

Check if all the parameters are defined and have a value.

Returns:: True if all parameters are defined and have a value, False otherwise.
Return type:: bool

list_constraints() → None[source]

List the constraints currently defined.

Prints all defined parameter constraints to the console.

needs_random_forcing() → bool[source]

Check if one of the parameters to assess involves the meteorological data.

Return type:: True if one of the parameters to assess involves the meteorological data.

range_satisfied() → bool[source]

Check if the parameter value ranges are satisfied.

Returns:: True if ranges are satisfied, False otherwise.
Return type:: bool

remove_constraint(parameter_1: str, operator: str, parameter_2: str) → None[source]

Removes a constraint between 2 parameters (e.g., paramA > paramB)

Parameters:

parameter_1 – The name of the first parameter.
operator – The operator (e.g. ‘<=’).
parameter_2 – The name of the second parameter.

Examples

parameter_set.remove_constraint(‘paramA’, ‘>=’, ‘paramB’)

save_as(directory: str, name: str, file_type: str = 'both')[source]

Create a configuration file containing the parameter values.

Such a file can be used when using the command-line version of hydrobricks. It contains the model parameter values.

Parameters:

directory – The directory to write the file.
name – The name of the generated file.
file_type – The type of file to generate: ‘json’, ‘yaml’, or ‘both’.

set_prior(parameter: str, prior: spotpy.parameter) → None[source]

Set a prior distribution for a parameter.

Assigns a prior probability distribution to a parameter for use in Bayesian calibration methods.

Parameters:

parameter – Name (or alias) of the parameter
prior – The prior distribution (instance of spotpy.parameter)

Raises:

ImportError – If spotpy is not installed.

set_random_values(parameters: list[str]) → pandas.DataFrame[source]

Set the provided parameter to random values.

Randomly assigns values to specified parameters within their defined ranges. Iterates until all constraints are satisfied.

Parameters:: parameters – The name or alias of the parameters to set to random values. Example: [‘kr’, ‘A’]
Returns:: A dataframe with the assigned parameter values.
Return type:: pd.DataFrame
Raises:: ValueError – If parameter constraints cannot be satisfied after 1000 iterations.

set_transform(parameter: str, to_transformed: Callable[[float], float], to_real: Callable[[float], float]) → None[source]

Attach a transform between the real and transformed values of a parameter.

The real value (used by the C++ engine and stored in the parameter set) and the transformed value (used for optimisation or to express the parameter in its original formulation) are related by the two provided functions. Either representation can be set; the other is derived. Both functions must be monotonic.

Parameters:

parameter – Name (or alias) of the parameter.
to_transformed – Function mapping the real value to the transformed value.
to_real – Function mapping the transformed value back to the real value.

Raises:

ConfigurationError – If the parameter is list-valued (transforms are not supported for lists).

set_values(values: dict, check_range: bool = True, allow_adapt: bool = False, transformed: bool = False) → None[source]

Set the parameter values.

Parameters:

values – The values must be provided as a dictionary with the parameter name with the related component or one of its aliases as the key. Example: {‘k’: 32, ‘A’: 300} or {‘slow_reservoir:capacity’: 300}
check_range – Check that the parameter value falls into the allowed range.
allow_adapt – Allow the parameter values to be adapted to enforce defined constraints (e.g.: min, max).
transformed – If True, the provided values are in transformed space and are mapped back to the real value (using each parameter’s transform) before being stored. Parameters without a transform are treated as real values. The range check and storage always operate on the real value.

Forcing

class hydrobricks.Forcing(spatial_entity: HydroUnits | Catchment)[source]

Bases: object

Class for managing forcing (meteorological) data for hydrological models.

class Variable(value)[source]

Bases: StrEnum

Enumeration of supported meteorological variables.

P = 'p'

PET = 'pet'

PRESSURE = 'pressure'

RH = 'rh'

RH_MAX = 'rh_max'

RH_MIN = 'rh_min'

R_NET = 'r_net'

R_SOLAR = 'r_solar'

SD = 'sd'

T = 't'

T_DEW_POINT = 't_dew_point'

T_MAX = 't_max'

T_MIN = 't_min'

WIND = 'wind'

apply_operations(parameters: ParameterSet | None = None, apply_to_all: bool = True) → None[source]

Apply the pre-defined operations.

Executes all spatialization, correction, and PET computation operations that were previously defined. Operations are applied in a fixed order: 1. Prior corrections 2. Station data spatialization 3. Gridded data spatialization 4. PET computation

Parameters:

parameters – The parameter object instance. Required if operations reference parameters using the ‘param:’ prefix.
apply_to_all – If True, the operations will be applied to all variables. If False, the operations will only be applied to the variables related to parameters defined in the parameters.allow_changing list. This is useful to avoid re-applying, during the calibration phase, operations that have already been applied previously. Default: True

Raises:

ValueError – If operations reference parameters but no parameter object is provided.

compute_pet(method: str, use: list[str], lat: float | None = None, **kwargs: Any) → None[source]

Define a PET computation operation using the pyet library. The PET is computed for all hydro units.

The operation is stored and applied later (deferred execution). The method is validated immediately.

Parameters:

method – Name of the method to use. Possible values are those provided in the table from the pyet documentation: https://pypi.org/project/pyet/. The method name or the pyet function name can be used.
use – List of the meteorological variables to use to compute the PET. Only the variables listed here will be used. The variables must be named according to the pyet naming convention (see the pyet API documentation: https://pyet.readthedocs.io/en/latest/api/index.html) and must be available (loaded in the forcing) and spatialized. Example: use=[‘t’, ‘tmin’, ‘tmax’, ‘lat’, ‘elevation’]
lat – Latitude of the catchment (degrees). If not provided, the latitude computed for each hydro unit is used.
**kwargs – Additional function-specific options passed through to the pyet function (see the pyet documentation).

Raises:

DependencyError – If pyet is not installed.
ForcingError – If the PET method is not recognized.

correct_station_data(variable: str, method: str = 'multiplicative', correction_factor: float | str | None = None) → None[source]

Define a prior correction operation to apply to station forcing data.

The operation is stored and applied later (deferred execution). The variable and method are validated immediately so mistakes are reported at the call site.

Parameters:

variable – Name or alias of the variable to correct (e.g. ‘precipitation’, ‘p’).
method – Correction method: ‘multiplicative’ (default) or ‘additive’.
correction_factor – Value of the correction factor (to add or multiply). A 'param:<name>' string can be given to have the value taken from the parameter set at calibration time.

Raises:

ForcingError – If the variable/method is not recognized or correction_factor is missing.

Examples

>>> forcing.correct_station_data(
...     variable='temperature',
...     method='additive',
...     correction_factor=0.5  # Add 0.5 degrees
... )

get_total_precipitation() → float[source]

Calculate the catchment-average total precipitation.

Computes the weighted average precipitation across all hydro units based on their areas.

Returns:: Total precipitation in mm (or original units if data is in different units).
Return type:: float

get_variable_enum(variable: str) → Variable[source]

Match a variable name string to the corresponding Variable enum value.

Parameters:: variable – Variable name or alias (e.g., ‘precipitation’, ‘precip’, ‘p’, ‘P’).
Returns:: The corresponding Variable enum value.
Return type:: Variable
Raises:: ValueError – If the variable name is not recognized.

Examples

>>> forcing = Forcing(hydro_units)
>>> var = forcing.get_variable_enum('precip')
>>> var == Forcing.Variable.P
True

is_initialized() → bool[source]

Check if the forcing has been initialized.

Returns:: True if the forcing has been initialized, False otherwise.
Return type:: bool

load_from(path: str | Path) → None[source]

Load data from a netCDF file created using save_as().

Reads a previously saved netCDF file containing spatialized forcing data and loads it into the Forcing object’s data2D structure.

Parameters:

path – Path of the file to read.

Raises:

ImportError – If netcdf4 is not installed.
ValueError – If the hydro units in the file don’t match the Forcing object’s hydro units.

Notes

The loaded data will have the same structure as if it was created through spatialization operations. The Forcing object will be marked as initialized after successful loading.

load_station_data_from_csv(path: str | Path, column_time: str, time_format: str, content: dict[str, str] | None = None) → None[source]

Read 1D time series data from CSV file for a single station.

Parameters:

path – Path to the CSV file containing station data.
column_time – Column name containing the time values.
time_format – Format string for parsing time values (e.g., ‘%Y-%m-%d’).
content – Dictionary mapping variable names/aliases to CSV column names. Example: {‘precipitation’: ‘Precipitation (mm)’, ‘temperature’: ‘Temp (C)’} Default: None

Raises:

FileNotFoundError – If the CSV file does not exist.
KeyError – If required columns are not found in the CSV file.

Examples

>>> forcing.load_station_data_from_csv(
...     'weather.csv',
...     'Date',
...     '%Y-%m-%d',
...     {'precipitation': 'P (mm)', 'temperature': 'T (C)'}
... )

save_as(path: str | Path, max_compression: bool = False) → None[source]

Create a netCDF file with the forcing data.

Saves the 2D spatialized forcing data to a netCDF4 file with the structure suitable for later loading with load_from().

Parameters:

path – Path of the file to create.
max_compression – Option to allow maximum compression for data in file. When True, uses compression with least_significant_digit=3 for better storage efficiency. Default: False

Raises:

ImportError – If netcdf4 is not installed.

Notes

If apply_operations() has not been called, it will be called automatically before saving to ensure data is properly spatialized.

Define a spatialization operation from gridded data to all hydro units.

The operation is stored and applied later (deferred execution). The variable and method are validated immediately.

Parameters:

variable – Name or alias of the variable to spatialize.
method – Name of the method to use. Currently only ‘regrid_from_netcdf’ (the ‘default’).
path – Path to the file containing the data or to a folder containing multiple files.
file_pattern – Pattern of the files to read. If None, the path is considered a single file.
data_crs – CRS (as EPSG id) of the data file. If None, the CRS is read from the file.
var_name – Name of the variable to read in the netCDF file.
dim_time – Name of the time dimension (default ‘time’).
dim_x – Name of the x dimension (default ‘x’).
dim_y – Name of the y dimension (default ‘y’).
raster_hydro_units – Path to a raster containing the hydro unit ids used for the spatialization.
apply_data_gradient – If True, elevation-based gradients are retrieved from the data and applied to the hydro units (e.g. for temperature and precipitation). If None, the default depends on the variable (True for temperature and precipitation, False otherwise). Requires the Forcing to be built from a Catchment with a single DEM.
gradient_type – ‘additive’ or ‘multiplicative’. If None, a per-variable default is used.

Raises:

ForcingError – If the variable or method is not recognized.

Define a spatialization operation from station data to all hydro units.

The operation is stored and applied later (deferred execution). The variable and method are validated immediately. Any of the numeric options may be given as a 'param:<name>' string to be resolved from the parameter set at calibration time.

Parameters:

variable – Name or alias of the variable to spatialize (e.g. ‘temperature’, ‘t’).
method –
Name of the method to use:
- ’default’: pick a sensible method for the variable (additive gradient for temperature, multiplicative for precipitation, constant for PET).
- ’constant’: the same value is used for every hydro unit.
- ’additive_elevation_gradient’: additive elevation gradient, constant or one value per month. Uses ‘ref_elevation’ and ‘gradient’.
- ’multiplicative_elevation_gradient’: multiplicative elevation gradient, constant or one value per month. Uses ‘ref_elevation’ and ‘gradient’.
- ’multiplicative_elevation_threshold_gradients’: as above but with a gradient below and above an elevation threshold. Uses ‘ref_elevation’, ‘gradient’, ‘gradient_2’ and ‘elevation_threshold’.
ref_elevation – Reference (station) elevation. Required for the gradient methods.
gradient – Gradient of the variable per 100 m (e.g. °C/100 m). Either a single value or a list of 12 monthly values.
gradient_1 – Alias of gradient (used if gradient is not provided).
gradient_2 – Gradient per 100 m for the units above elevation_threshold (threshold method only).
elevation_threshold – Threshold elevation to switch from gradient to gradient_2.

Raises:

ForcingError – If the variable or method is not recognized.

Discharge observations

Bases: TimeSeries1D

Observed discharge time series (the primary calibration signal).

compute_reference_metric(metric: str, start_date: str | None = None, end_date: str | None = None, with_exclusion: bool = False, mean_discharge: bool = False, all_combinations: bool = False, n_evals: int = 100) → float[source]

Compute a reference for the provided metric (goodness of fit) by block bootstrapping the observed series n_evals times (100 times by default), evaluating the bootstrapped series using the provided metric and computing the mean of the results.

Parameters:

metric – The abbreviation of the function as defined in HydroErr (https://hydroerr.readthedocs.io/en/stable/list_of_metrics.html) Examples: ‘nse’, ‘kge_2012’, ‘rmse’, etc.
start_date – Start date string for period of interest (format: ‘YYYY-MM-DD’). If None, uses full time series. Default: None
end_date – End date string for period of interest (format: ‘YYYY-MM-DD’). If None, uses full time series. Default: None
with_exclusion – If True, avoid using the same year’s data for the same position in the bootstrapped sample, ensuring no self-selection for specific years. Default: False
mean_discharge – If True, computes the average on the discharge directly rather than on the result of the HydroErr function. Default: False
all_combinations – If True uses all combinations possible for the bootstrapping. If False, randomly samples n_evals combinations. Default: False
n_evals – Number of random evaluations to perform (ignored if all_combinations=True). Default: 100

Returns:

The mean value of n_evals realizations of the selected metric.

Return type:

float

Raises:

DataError – If there is only one year of data (insufficient for block bootstrapping).
ValueError – If metric is not recognized or if time series setup is invalid.

Examples

>>> obs = DischargeObservations('2000-01-01', '2010-12-31')
>>> obs.load_from_csv('data.csv', 'date', '%Y-%m-%d', {'discharge': 'Q'})
>>> ref_metric = obs.compute_reference_metric('nse', n_evals=100)

Read discharge observations from a CSV file.

Restricted by default to [start_date, end_date] as given to the constructor, so the loaded series already matches the simulation period (no manual pre-slicing needed). Pass start_date/end_date here to load a different range instead (e.g. to deliberately load a period wider than the constructor’s, for testing).

Parameters:

path – Path to the CSV file containing the discharge data.
column_time – Column name containing the time values.
time_format – Format string for parsing time values (e.g., ‘%Y-%m-%d’).
content – Dictionary mapping variable names/enums to column names in the CSV. Example: {‘discharge’: ‘Discharge (mm/d)’}
start_date – Overrides the constructor’s start_date/end_date for this load.
end_date – Overrides the constructor’s start_date/end_date for this load.

Raises:

FileNotFoundError – If the specified file does not exist.
KeyError – If required columns are not found in the CSV file.

Examples

>>> obs = DischargeObservations('2000-01-01', '2010-12-31')
>>> obs.load_from_csv('data.csv', 'date', '%Y-%m-%d', {'discharge': 'Q'})

Periods

Bases: object

A named, inclusive date range (e.g. the calibration period).

Parameters:

start – Period bounds (inclusive), as 'YYYY-MM-DD' strings, datetimes or Timestamps.
end – Period bounds (inclusive), as 'YYYY-MM-DD' strings, datetimes or Timestamps.
name – Optional label (e.g. 'calibration'), used in tables and error messages.

property bounds: tuple[str, str]: The period bounds as ('YYYY-MM-DD', 'YYYY-MM-DD') strings.

classmethod coerce(value: Period | tuple | list, name: str | None = None) → Period[source]: Build a Period from a Period or a (start, end) pair.

date_range() → pandas.DatetimeIndex[source]: The daily date axis of the period.

mask(time: pd.DatetimeIndex | pd.Series) → np.ndarray[source]: Boolean mask selecting the period on the given time axis.

property n_days: int: The number of days in the (inclusive) period.

Bases: object

The canonical modelling periods and the spin-up policy.

Groups the calibration period, the validation period and the simulation span, so a single object can drive the model setup, the calibration and the per-period evaluation (split-sample testing).

Parameters:

calibration – The calibration period, as a Period or (start, end) pair.
validation – The validation period. Optional.
simulation – The simulation span. Defaults to the union span of the other periods (earliest start to latest end).
spinup – The spin-up policy applied to a model set up over one of these periods: the first years/days of the period are replayed (unlogged) to initialize the states before the run restarts at the period start. Either a number of days (int) or a string like '4y' (default: 4 years). A spin-up longer than a period is clamped to that period (i.e. the whole period is replayed once).

Examples

>>> periods = Periods(
...     calibration=('1981-01-01', '2000-12-31'),
...     validation=('2001-01-01', '2020-12-31'),
...     spinup='4y',
... )
>>> periods.calibration.bounds
('1981-01-01', '2000-12-31')

defined_periods() → dict[str, Period][source]: The defined periods, keyed by name (calibration/validation/simulation).

property full_span: Period: The simulation span (earliest start to latest end).

spinup_days_for(period: Period) → int[source]: The spin-up duration in days for the given period (clamped to it).

hydrobricks.evaluate_periods(model: Model, observations: DischargeObservations | np.ndarray, periods: Periods, metrics: Iterable[str] = ('kge_2012',)) → pd.DataFrame[source]

Evaluate a simulation on each declared period (split-sample table).

The model must have been run over a span covering the periods (typically the full span, periods.simulation); each metric is then computed on the date slice of every defined period. This is the recommended validation workflow: calibrate on the calibration period, re-run the best parameters over the full span, and read the calibration/validation scores from this table.

Parameters:

model – A model that has been setup() and run().
observations – The observed discharge: a DischargeObservations (sliced by its own dates) or an array aligned with the simulated series.
periods – The periods to evaluate on.
metrics – HydroErr metric names (e.g. 'nse', 'kge_2012').

Return type:

A DataFrame with one row per period and one column per metric.

Results

class hydrobricks.Results(filename: str)[source]

Bases: object

Class for the detailed results of a model run. This class is used to read the results of a model run (from a netCDF file) and to provide methods to extract the results.

close() → None[source]: Close the netCDF dataset and release the file handle.

get_hydro_units_structure_ids() → numpy.ndarray[source]

Get the model-structure id used by each hydro unit.

Units sharing the same subsurface use the same structure; an exclusive land cover (e.g. a lake) places a unit on a different structure variant. Useful to identify which units a given (possibly NaN-omitted) component applies to.

Returns:: Structure id per hydro unit (1D array, defaults to 1).
Return type:: np.ndarray

get_hydro_units_values(component: str, start_date: str | None = None, end_date: str | None = None) → numpy.ndarray[source]

Get the values of a component at the hydro units.

Retrieves time series or snapshot data for a specific model component distributed across hydro units. Supports optional temporal slicing.

Parameters:

component – The name of the component (e.g., ‘snowpack’, ‘soil_moisture’). Use list_hydro_units_components() to see available options.
start_date – The start date of the period to extract (format: ‘YYYY-MM-DD’). If None, returns full time series from the beginning. Default: None
end_date – The end date of the period to extract (format: ‘YYYY-MM-DD’). If None, returns up to end of time series. Default: None

Returns:

Values of the component at the hydro units. Shape: (n_timesteps, n_hydro_units) for time series, or (n_hydro_units,) for single date if only start_date provided.

Return type:

np.ndarray

Raises:

ValueError – If the component is not found in the results.
KeyError – If date selection fails or dates are not in the time series.

get_land_cover_areas(land_cover: str, start_date: str | None = None, end_date: str | None = None) → numpy.ndarray[source]

Get the areas of a land cover across the hydro units.

Calculates the spatial distribution of a specific land cover type across hydro units over time by multiplying the land cover fractions with the hydro unit areas. Supports optional temporal slicing (matching the behaviour of get_hydro_units_values).

Parameters:

land_cover – The name of the land cover type (e.g., ‘glacier’, ‘ground’, ‘forest’).
start_date – The start date of the period to extract (format: ‘YYYY-MM-DD’). If None, returns the full time series. Default: None
end_date – The end date of the period to extract (format: ‘YYYY-MM-DD’). If None (with start_date set), returns a single-date snapshot. Default: None

Returns:

Areas of the land cover across the hydro units (2D array: units × time), or (units,) for a single date. Units match the hydro unit area units (typically m² or km²).

Return type:

np.ndarray

Raises:

ValueError – If the land cover is not found in the results.
IndexError – If labels_land_cover is None or empty.

get_mean_hydro_units_values(land_cover: str, component: str, start_date: str | None = None, end_date: str | None = None) → numpy.ndarray[source]

Get the mean values of a component across the hydro units weighted by land cover area.

Computes area-weighted average of a component for a specific land cover type, accounting for spatial variation in land cover distribution across hydro units.

Parameters:

land_cover – The name of the land cover type to weight by (e.g., ‘glacier’, ‘ground’, ‘forest’).
component – The name of the component (e.g., ‘snowpack’, ‘soil_moisture’). Use list_hydro_units_components() to see available options.
start_date – The start date of the period to extract (format: ‘YYYY-MM-DD’). If None, returns full time series. Default: None
end_date – The end date of the period to extract (format: ‘YYYY-MM-DD’). If None, returns up to end of time series. Default: None

Returns:

Weighted mean values of the component across the hydro units (1D time series). Weights are based on the land cover area in each hydro unit.

Return type:

np.ndarray

Raises:

ValueError – If the land cover or component is not found in the results.

get_mean_swe(start_date: str | None = None, end_date: str | None = None) → numpy.ndarray[source]

Get the mean snow water equivalent (SWE) across the hydro units weighted by land cover.

Computes the catchment-wide average snow water equivalent by aggregating SWE values across all land cover types and hydro units, weighted by their respective areas.

Parameters:

start_date – The start date of the period to extract (format: ‘YYYY-MM-DD’). If None, returns full time series. Default: None
end_date – The end date of the period to extract (format: ‘YYYY-MM-DD’). If None, returns up to end of time series. Default: None

Returns:

Mean SWE across the hydro units (1D time series, units: mm water equivalent). A scalar for a single date.

Return type:

np.ndarray

Notes

Land covers without a snowpack (e.g. open water) contribute zero SWE over their area, diluting the average as expected.

get_time_array(start_date: str, end_date: str) → numpy.ndarray[source]

Get the time array.

Extracts the time coordinates from the results dataset for a specified date range. Useful for creating time-aligned arrays for plotting or analysis.

Parameters:

start_date – The start date of the period to extract (format: ‘YYYY-MM-DD’).
end_date – The end date of the period to extract (format: ‘YYYY-MM-DD’).

Returns:

Array of time values (typically datetime64) for the specified period.

Return type:

np.ndarray

Raises:

KeyError – If dates are not found in the results time coordinates.

get_total_swe(start_date: str | None = None, end_date: str | None = None) → numpy.ndarray[source]

Get the total snow water equivalent (SWE) per hydro unit, aggregated across land covers.

SWE is stored per land cover. For each hydro unit this combines the per-land-cover snowpack into a single unit-average SWE depth, weighted by the land cover areas within the unit. Unlike get_mean_swe(), the hydro unit dimension is preserved (no catchment-wide averaging).

Parameters:

start_date – The start date of the period to extract (format: ‘YYYY-MM-DD’). If None, returns full time series. Default: None
end_date – The end date of the period to extract (format: ‘YYYY-MM-DD’). If None (with start_date set), returns a single-date snapshot. Default: None

Returns:

Total SWE per hydro unit (2D array: units × time, units: mm water equivalent), or (units,) for a single date.

Return type:

np.ndarray

Notes

Land covers without a snowpack (e.g. open water) contribute zero SWE over their area, diluting the per-unit average as expected.

list_hydro_units_components() → None[source]

Print a list of the distributed (hydro unit level) components.

Displays all component names that have values distributed across individual hydro units. These are typically state variables like snowpack, soil moisture, or groundwater storage.

list_sub_basin_components() → None[source]

Print a list of the aggregated (sub-basin level) components.

Displays all component names that have aggregated values at the sub-basin level. These are typically fluxes or flows that are summed across the entire catchment (e.g., total runoff, evapotranspiration).

TimeSeries

class hydrobricks.TimeSeries[source]

Bases: object

Class for generic time series data

get_dates_as_mjd() → float | np.ndarray[source]

Convert time series dates to modified Julian dates.

Returns:: Modified Julian dates. Returns float if single date, array if multiple dates.
Return type:: float | np.ndarray

StructureGraph

class hydrobricks.structure.StructureGraph(nodes: list[Node], edges: list[Edge], *, structure_id: int = 1, n_structures: int = 1, model_name: str | None = None, solver: str | None = None)[source]

Bases: object

The structure graph of one model structure variant.

classmethod from_settings(structures: list[dict], structure_id: int = 1, *, model_name: str | None = None, solver: str | None = None, with_forcing: bool = True) → StructureGraph[source]

Build the graph from the C++ SettingsModel.get_structure() export.

Parameters:

structures – The list of structure-variant dicts returned by get_structure().
structure_id – Which structure variant to build (default 1, the primary). Models with glacier/lake covers have several variants.
model_name – Optional metadata shown in the summary.
solver – Optional metadata shown in the summary.
with_forcing – Add forcing sources (precipitation, pet, …) as nodes feeding the components that consume them.

plot(path: str | None = None, fmt: str = 'png', view: bool = False, legend: bool = True, nodesep: float = 0.5, ranksep: float = 0.7, dpi: int = 200)[source]

Render the structure as a directed graph with Graphviz.

Parameters:

path – Output file path without extension (e.g. 'structure'). If None, the rendered graph object is returned without writing a file.
fmt – Output format (e.g. ‘png’, ‘pdf’, ‘svg’). Vector formats (‘pdf’, ‘svg’) are resolution-independent and give the sharpest result.
view – Open the rendered file with the default viewer.
legend – Add a legend describing the node and flux styles (default True).
nodesep – Graphviz spacing (inches) between nodes in a rank and between ranks; larger values give the diagram more breathing room.
ranksep – Graphviz spacing (inches) between nodes in a rank and between ranks; larger values give the diagram more breathing room.
dpi – Raster (e.g. PNG) resolution in dots per inch; raise it for a crisper image. Ignored for vector formats.

Return type:

The graphviz.Digraph object.

Raises:

DependencyError – If the optional graphviz package is not installed.

to_dict() → dict[str, Any][source]: Return the graph as a plain dict (nodes, edges and metadata).

to_dot(legend: bool = True, nodesep: float = 0.5, ranksep: float = 0.7, dpi: int = 200) → str[source]

Return the graph as a Graphviz DOT string (no dependency required).

nodesep / ranksep set the Graphviz spacing (inches) between nodes in a rank and between ranks; larger values give the diagram more breathing room. dpi sets the raster (e.g. PNG) resolution; it is ignored for vector formats.

to_json(indent: int = 2) → str[source]: Return the graph as a JSON string.

to_text() → str[source]: Return a compact textual summary of the structure (TensorFlow-like).

to_yaml() → str[source]: Return the graph as a YAML string.

Trainer

class hydrobricks.trainer.SpotpySetup(model: Model | list[Model] | None = None, params: ParameterSet | None = None, forcing: Forcing | list[Forcing] | None = None, discharge: DischargeObservations | list[DischargeObservations] | None = None, warmup: int | None = None, obj_func: str | Callable[[np.ndarray, np.ndarray], float] | None = None, dump_outputs: bool = False, dump_forcing: bool = False, dump_dir: str = '', setup_factory: Callable[[], tuple] | None = None, extra_observations: list[AuxiliaryObservation] | None = None, combine: str = 'weighted', discharge_weight: float = 1.0, normalize: bool = True, periods: Periods | None = None)[source]

Bases: object

Setup class for SPOTPY optimization framework integration.

evaluation() → list[numpy.ndarray][source]

classmethod from_factory(setup_factory: Callable[[], tuple], params: ParameterSet, **kwargs: Any) → SpotpySetup[source]

Create a picklable setup for parallel calibration.

The setup_factory is called once in each worker process to (re)build the model, forcing, and observations; the result is cached and reused across that worker’s evaluations. This avoids pickling the C++-backed objects, which is required for SPOTPY parallel='mpc'/'mpi' (and in particular on Windows, where workers are spawned, not forked).

Parameters:

setup_factory – Picklable callable taking no arguments and returning a (model, forcing, obs) tuple (each a single instance or a list). Must be a top-level/module-level function (lambdas and closures cannot be pickled by the standard backends).
params – ParameterSet defining the model parameters to calibrate. Must itself be picklable (plain hydrobricks ParameterSet instances are).
**kwargs – Forwarded to SpotpySetup (warmup, obj_func, dump_outputs, dump_forcing, dump_dir).

Returns:

A setup that builds its models lazily and can be pickled to workers.

Return type:

SpotpySetup

objectivefunction(simulation: list[np.ndarray], evaluation: list[np.ndarray], params: spotpy.parameter | None = None) → float | list[float][source]

parameters() → Any[source]

Generate random parameter sets that satisfy constraints.

Returns:: SPOTPY parameter object with valid random parameter values.
Return type:: Any
Raises:: RuntimeError – If unable to generate valid parameters after 1000 attempts.

simulation(x: spotpy.parameter) → list[np.ndarray] | None[source]

Run model simulation with given parameter set.

Parameters:: x – SPOTPY parameter object containing parameter values.
Returns:: List of simulated discharge time series (one per model), or None if simulation failed or constraints were violated.
Return type:: list[np.ndarray] | None

hydrobricks.trainer.calibrate(spot_setup: SpotpySetup, algorithm: str, repetitions: int, dbname: str | None = None, dbformat: str = 'ram', parallel: str = 'seq', save_sim: bool = True, n_workers: int | None = None, sample_kwargs: dict[str, Any] | None = None, **algorithm_kwargs: Any) → Any[source]

Run a SPOTPY calibration, optionally across multiple processes.

Thin convenience wrapper around the SPOTPY algorithm classes that validates the requested parallel backend before sampling. For parallel='mpc' the spot_setup must have been built with SpotpySetup.from_factory() so that models can be rebuilt in each worker process.

Parameters:

spot_setup – The configured SpotpySetup.
algorithm – Name of the SPOTPY algorithm (e.g. 'mc', 'lhs', 'sceua', 'dream'), resolved from spotpy.algorithms.
repetitions – Number of repetitions passed to sampler.sample().
dbname – SPOTPY database name (passed through to the algorithm).
dbformat – SPOTPY database format ('ram', 'csv', …). Default: 'ram'.
parallel – SPOTPY parallel backend: 'seq' (default), 'mpc' (multiprocessing, requires the pathos package), 'mpi' (requires mpi4py), or 'umpc'. Note that not all algorithms parallelize well: independent samplers (mc, lhs, rope) and dream benefit most, whereas SCE-UA is largely sequential.
save_sim – Whether SPOTPY stores the simulated series. Default: True.
n_workers – Number of worker processes for parallel='mpc'/'umpc'. Defaults to None, i.e. all logical CPUs. Ignored for 'seq' and 'mpi' (the MPI process count is set by the MPI launcher, e.g. mpiexec -n).
sample_kwargs – Extra keyword arguments forwarded to sampler.sample(). Needed by some algorithms; e.g. the multi-objective NSGAII (used for the glacier mass-balance 'pareto' mode) requires {'n_obj': 2} and accepts 'n_pop'. Default: None.
**algorithm_kwargs – Extra keyword arguments forwarded to the SPOTPY algorithm constructor.

Return type:

The SPOTPY sampler instance (call sampler.getdata() for results).

hydrobricks.trainer.calibrate_from_factory(setup_factory: Callable[[], tuple], algorithm: str, repetitions: int, allow_changing: list[str] | None = None, warmup: int | None = None, obj_func: str | Callable[[np.ndarray, np.ndarray], float] | None = None, dump_outputs: bool = False, dump_forcing: bool = False, dump_dir: str = '', dbname: str | None = None, dbformat: str = 'ram', parallel: str = 'seq', save_sim: bool = True, n_workers: int | None = None, sample_kwargs: dict[str, Any] | None = None, extra_observations: list[AuxiliaryObservation] | None = None, combine: str = 'weighted', discharge_weight: float = 1.0, normalize: bool = True, periods: Periods | None = None, **algorithm_kwargs: Any) → Any[source]

Build a calibration setup from a single factory and run it (parallel-ready).

This is the simplest way to run a calibration, including in parallel: provide one factory that builds everything, and this function assembles the (picklable) SpotpySetup and runs the sampler. It removes the boilerplate of extracting the parameters, calling SpotpySetup.from_factory(), and then calibrate() separately.

For parallel='mpc'/'mpi' the factory is shipped to each worker process to rebuild the model there, so it must be a top-level (module-level) function — not a lambda or closure — and the call must be guarded by if __name__ == '__main__': (required on platforms that spawn workers, such as Windows).

Parameters:

setup_factory – Callable taking no arguments and returning a (model, parameters, forcing, discharge) tuple. Called once in the main process (to obtain the parameters and build the local setup) and once in each worker (to rebuild the model). Must be picklable for parallel runs.
algorithm – Name of the SPOTPY algorithm (e.g. 'mc', 'lhs', 'sceua').
repetitions – Number of repetitions passed to sampler.sample().
allow_changing – Optional list of parameter names/aliases to calibrate. If given, it overrides any parameters.allow_changing set inside the factory.
warmup – Forwarded to SpotpySetup. With periods, the factory must set the model up over periods.calibration (typically with spinup=periods.spinup) and warmup must be left unset.
obj_func – Forwarded to SpotpySetup. With periods, the factory must set the model up over periods.calibration (typically with spinup=periods.spinup) and warmup must be left unset.
dump_outputs – Forwarded to SpotpySetup. With periods, the factory must set the model up over periods.calibration (typically with spinup=periods.spinup) and warmup must be left unset.
dump_forcing – Forwarded to SpotpySetup. With periods, the factory must set the model up over periods.calibration (typically with spinup=periods.spinup) and warmup must be left unset.
dump_dir – Forwarded to SpotpySetup. With periods, the factory must set the model up over periods.calibration (typically with spinup=periods.spinup) and warmup must be left unset.
periods – Forwarded to SpotpySetup. With periods, the factory must set the model up over periods.calibration (typically with spinup=periods.spinup) and warmup must be left unset.
extra_observations – Forwarded to SpotpySetup to calibrate (also) on auxiliary signals such as glacier mass balance. extra_observations are light, picklable objects, so they are passed directly rather than rebuilt by the factory. normalize (default True) combines the weighted terms as benchmark skill scores so a discharge KGE/NSE and an auxiliary RMSE share a comparable range.
combine – Forwarded to SpotpySetup to calibrate (also) on auxiliary signals such as glacier mass balance. extra_observations are light, picklable objects, so they are passed directly rather than rebuilt by the factory. normalize (default True) combines the weighted terms as benchmark skill scores so a discharge KGE/NSE and an auxiliary RMSE share a comparable range.
discharge_weight – Forwarded to SpotpySetup to calibrate (also) on auxiliary signals such as glacier mass balance. extra_observations are light, picklable objects, so they are passed directly rather than rebuilt by the factory. normalize (default True) combines the weighted terms as benchmark skill scores so a discharge KGE/NSE and an auxiliary RMSE share a comparable range.
normalize – Forwarded to SpotpySetup to calibrate (also) on auxiliary signals such as glacier mass balance. extra_observations are light, picklable objects, so they are passed directly rather than rebuilt by the factory. normalize (default True) combines the weighted terms as benchmark skill scores so a discharge KGE/NSE and an auxiliary RMSE share a comparable range.
dbname – Forwarded to calibrate().
dbformat – Forwarded to calibrate().
parallel – Forwarded to calibrate().
save_sim – Forwarded to calibrate().
n_workers – Forwarded to calibrate().
sample_kwargs – Forwarded to calibrate().
**algorithm_kwargs – Extra sampler options, forwarded to calibrate() as well.

Return type:

The SPOTPY sampler instance (call sampler.getdata() for results).

hydrobricks.trainer.get_best(sampler: Any, parameters: ParameterSet | None = None) → dict[str, Any][source]

Return the best parameter set and its score (skill space, higher is better).

Convenience over get_results() for single-objective calibrations: it selects the run with the highest skill (after undoing any optimizer sign flip), so the reported score is the true metric value — a KGE of 0.7 is 0.7, not -0.7.

Parameters:

sampler – The sampler returned by calibrate() / calibrate_from_factory().
parameters – The ParameterSet that was calibrated. The stored parameter values are in the optimizer’s transformed space; the returned 'parameters' are mapped back to real values via each parameter’s transform, so they can be passed straight to ParameterSet.set_values(...). Parameters without a transform are unchanged. This argument is optional: calibrate() stashes the calibrated ParameterSet on the sampler, so the back-transform happens automatically. Pass it only to override that stashed set (e.g. an unpickled sampler that lost it).

Returns:

{'score': float, 'parameters': {name: value}, 'index': int} where index is the row in sampler.getdata().

Return type:

dict

Raises:

ConfigurationError – For a multi-objective (combine='pareto') calibration, where a single best run is not defined — use get_results() and select a point on the Pareto front instead.

hydrobricks.trainer.get_results(sampler: Any, parameters: ParameterSet | None = None) → Any[source]

Return the calibration results as a DataFrame, scores in skill space.

SPOTPY stores the objective exactly as it was handed to the optimizer, which is the negated skill for minimizing algorithms (SCE-UA, NSGA-II, PADDS). This helper flips it back so every score column is a skill where higher is always better, regardless of the algorithm — e.g. a KGE of 0.7 reads as 0.7, never -0.7. (Error metrics such as rmse are negated by the skill convention, so a smaller error shows as a larger, less-negative score.)

Parameters:

sampler – The sampler returned by calibrate() / calibrate_from_factory().
parameters – The ParameterSet that was calibrated. SPOTPY samples — and therefore the stored par columns — are in the optimizer’s transformed space; each parameter column is mapped back to its real value using that parameter’s transform, so the returned values match what the model actually used. Parameters without a transform are left unchanged. This argument is optional: calibrate() stashes the calibrated ParameterSet on the sampler, so the back-transform happens automatically. Pass it only to override that stashed set (e.g. an unpickled sampler that lost it).

Returns:

One row per evaluated parameter set, with the calibrated parameter columns (the SPOTPY par prefix stripped) and a score column — or score1, score2, … for a multi-objective (combine='pareto') calibration. The simulated series are not included.

Return type:

pandas.DataFrame

Evaluation (auxiliary observations)

Model-evaluation data: the observed signals a model is compared against.

This subpackage holds the reference series used to evaluate and calibrate a model (as opposed to the meteorological forcing, which is model input and lives in forcing.py):

DischargeObservations — the primary signal (observed discharge).
AuxiliaryObservation — base class for additional signals.
GlacierMassBalanceObservations — observed glacier mass balance.
SnowCoverObservations — observed snow cover fraction (e.g. MODIS).

It also exposes evaluate(), the HydroErr-based goodness-of-fit helper.

class hydrobricks.evaluation.AuxiliaryObservation[source]

Bases: object

Base class for an auxiliary calibration/evaluation signal.

Besides the primary discharge, a model can be evaluated against additional observed signals — glacier mass balance, snow cover, … Each such signal is represented by a subclass that knows how to provide the observed values and to compute the matching simulated values from a run model. The two are returned as aligned value vectors (observed() and simulated(model) must have the same length and ordering), which keeps the contract agnostic to whether the signal is a time series or a set of per-period / per-band / per-(date, unit) targets.

A signal also carries how it should be used during calibration:

mode='objective' contributes a weight-scaled goodness-of-fit (metric) term to the combined objective;
mode='constraint' acts as a behavioural pass/fail filter — a run is rejected when the mean absolute error exceeds tolerance (absolute, in the signal’s units) or, alternatively, relative_tolerance times the mean absolute observed value. Exactly one of the two must be set.

metric

HydroErr metric name used for the objective term (default 'rmse').

Type:: str

weight

Weight of this term in the combined 'weighted' score (default 1.0).

Type:: float

mode

'objective' or 'constraint' (default 'objective').

Type:: str

tolerance

Maximum allowed mean absolute error for 'constraint' mode, in the signal’s units. Mutually exclusive with relative_tolerance.

Type:: float or None

relative_tolerance

Maximum allowed mean absolute error for 'constraint' mode, expressed as a fraction of the mean absolute observed value (e.g. 0.1 for 10%). Mutually exclusive with tolerance.

Type:: float or None

requires_recording

Whether computing the simulated values needs recorded series, either via record_all=True or by recording the specific items returned by required_recordings() (default True).

Type:: bool

configure_recording(model: Model) → None[source]

Enable, on model, the recordings this signal needs.

Call this before model.setup() as a targeted alternative to creating the model with record_all=True.

observed() → numpy.ndarray[source]: Return the observed values as a 1D array.

required_recordings(model: Model) → RecordingRequest[source]

Return the specific stores/fluxes this signal needs recorded.

Default: an empty request. Subclasses that read recorded series should override this so the model can record only what is needed, instead of record_all=True. model is provided to resolve names (e.g. the glacier land covers) from the model configuration.

restrict_to_period(start: str | pd.Timestamp | None, end: str | pd.Timestamp | None) → None[source]: Restrict the observations to [start, end] (default: no-op).

simulated(model: Model) → np.ndarray[source]

Return the simulated values matching observed(), from a run model.

The model must already have been run (and recorded, if requires_recording). The returned array must align 1:1 with observed(); entries that cannot be evaluated should be NaN.

class hydrobricks.evaluation.RecordingRequest(brick_states: list[tuple[str, str]]=<factory>, process_outputs: list[tuple[str, str, str]]=<factory>, fractions: bool = False)[source]

Bases: object

The specific stores/fluxes an auxiliary observation needs recorded.

Used as a lightweight alternative to record_all: an observation declares exactly which series it reads from a run model, so only those are logged.

brick_states

(brick_name, item) pairs, e.g. ("glacier_snowpack", "snow_content"). Logged label: "{brick}:{item}".

Type:: list[tuple[str, str]]

process_outputs

(brick_name, process_name, item) triples, e.g. ("glacier", "melt", "output"). Logged label: "{brick}:{process}:{item}".

Type:: list[tuple[str, str, str]]

fractions

Whether the time-varying land-cover fractions must be recorded.

Type:: bool

value_labels() → list[str][source]: Return the hydro-unit value labels this request records.

class hydrobricks.evaluation.GlacierMassBalanceObservations(metric: str = 'rmse', weight: float = 1.0, mode: str = 'objective', tolerance: float | None = None, relative_tolerance: float | None = None)[source]

Bases: AuxiliaryObservation

Observed glacier mass balance, used as an auxiliary calibration signal.

The observations are stored as a flat list of targets, each a single scalar value with its observation period and (optionally) its elevation band. The matching simulated values are produced by simulated(), in the same order, so the two can be compared directly.

Parameters:

metric – Calibration configuration, see AuxiliaryObservation.
weight – Calibration configuration, see AuxiliaryObservation.
mode – Calibration configuration, see AuxiliaryObservation.
tolerance – Calibration configuration, see AuxiliaryObservation.
relative_tolerance – Calibration configuration, see AuxiliaryObservation.

targets

One entry per scalar observation, with keys t0, t1 (period bounds, pd.Timestamp), value (mm w.e.), balance_type, and band_lo / band_hi (m a.s.l., or None for a whole-glacier observation).

Type:: list[dict]

granularity

'whole' or 'elevationbins'.

Type:: str

Load observed glacier mass balance from a generic CSV file.

Each row is one observed balance for balance_type. The observation period is given either by explicit date_start_col / date_end_col, or derived from a year_col using a hydro_year_start month (the period runs from the 1st of that month of the year to the day before it a year later — e.g. October → Oct 1 to Sep 30 of the next year; January → the calendar year). Provide band_* columns to load per-elevation-band balances. Call once per balance type for files with several (winter/summer/annual) value columns, or use from_glamos().

Parameters:

path – Path to the CSV file.
value_col – Column (name or 0-based index) holding the mass-balance value.
balance_type – What value_col represents: 'annual', 'winter' or 'summer' (used only as a label).
date_start_col – Columns with the period start/end dates (explicit-period mode).
date_end_col – Columns with the period start/end dates (explicit-period mode).
year_col – Alternative to explicit dates: a year column and the month the hydrological year starts (name or 1-12).
hydro_year_start – Alternative to explicit dates: a year column and the month the hydrological year starts (name or 1-12).
band_lo_col – Columns with the elevation-band bounds [m] and area, for per-band data.
band_hi_col – Columns with the elevation-band bounds [m] and area, for per-band data.
band_area_col – Columns with the elevation-band bounds [m] and area, for per-band data.
value_unit – Units of the value ('mm_we' or 'm_we') and band area ('km2' or 'm2'); normalized to mm w.e. and m2.
area_unit – Units of the value ('mm_we' or 'm_we') and band area ('km2' or 'm2'); normalized to mm w.e. and m2.
date_format – Optional explicit date format; otherwise dates are inferred.
glacier_id_col – Optional column and value to filter a single glacier from a multi-glacier file.
glacier_id – Optional column and value to filter a single glacier from a multi-glacier file.
start_date – Keep only observations whose period lies fully within this range.
end_date – Keep only observations whose period lies fully within this range.
skiprows – Rows to skip at the top of the file (metadata header).
metric – Calibration configuration (see the class docstring).
weight – Calibration configuration (see the class docstring).
mode – Calibration configuration (see the class docstring).
tolerance – Calibration configuration (see the class docstring).
relative_tolerance – Calibration configuration (see the class docstring).
**read_csv_kwargs – Extra keyword arguments forwarded to pandas.read_csv.

Return type:

The populated observations object.

classmethod from_glamos(path: str | Path, kind: str = 'whole', glacier_id: str | None = None, balance_types: tuple[str, ...] | list[str] = ('annual',), start_date: str | pd.Timestamp | None = None, end_date: str | pd.Timestamp | None = None, metric: str = 'rmse', weight: float = 1.0, mode: str = 'objective', tolerance: float | None = None, relative_tolerance: float | None = None) → GlacierMassBalanceObservations[source]

Load a GLAMOS “fixdate” mass-balance CSV file (preset over the CSV reader).

Handles the GLAMOS file layout (metadata/citation header rows, a date_start short-name header row, then a units row). The observation periods are taken from the per-row dates, so the hydrological year of the data is respected without any extra configuration.

Parameters:

path – Path to the GLAMOS CSV file.
kind – 'whole' for the whole-glacier file (one value per period) or 'elevationbins' for the per-elevation-bin file.
glacier_id – If given, keep only the rows of this glacier id (e.g. 'B43-03').
balance_types – Which balances to use among 'annual' (Ba), 'winter' (Bw) and 'summer' (Bs).
start_date – Keep only observations whose period lies fully within this range.
end_date – Keep only observations whose period lies fully within this range.
metric – Calibration configuration (see the class docstring).
weight – Calibration configuration (see the class docstring).
mode – Calibration configuration (see the class docstring).
tolerance – Calibration configuration (see the class docstring).
relative_tolerance – Calibration configuration (see the class docstring).

Return type:

The populated observations object.

observed() → numpy.ndarray[source]: The observed mass-balance values [mm w.e.], one per target.

required_recordings(model: Model) → RecordingRequest[source]

The glacier series needed by simulated() for each glacier cover.

For each glacier land cover this records the snowpack snow content, the ice melt output, and (once) the land-cover fractions — the exact inputs of the flux-based surface balance. A targeted alternative to record_all=True.

restrict_to_period(start: str | pd.Timestamp | None, end: str | pd.Timestamp | None) → None[source]: Keep only targets whose period lies fully within [start, end].

simulated(model: Model) → np.ndarray[source]

Compute the simulated glacier mass balance matching each observation.

For each target the flux-based surface balance B_i = ΔS_i − Σ M_ice,i is evaluated per glacier hydro unit over the target’s period, then aggregated (area-weighted by the model’s time-varying glacier area) to the target’s granularity — over all glacierized units for a whole-glacier target, or over the units whose elevation lies in the band for an elevation-bin target.

The glacier snowpack, ice melt and land-cover fractions must have been recorded in memory — either with record_all=True or by recording the specific items (see required_recordings(), applied via configure_recording(model) before model.setup()).

Returns:: Simulated mass balance [mm w.e.], aligned 1:1 with observed(). Entries are NaN where the period falls outside the simulation or where no glacier area is available.
Return type:: np.ndarray

property values: numpy.ndarray: Alias of observed() (observed values [mm w.e.]).

class hydrobricks.evaluation.SnowCoverObservations(swe_full: float = 100.0, land_covers: list[str] | None = None, metric: str = 'rmse', weight: float = 1.0, mode: str = 'objective', tolerance: float | None = None, relative_tolerance: float | None = None)[source]

Bases: AuxiliaryObservation

Observed snow cover fraction, used as an auxiliary calibration signal.

The observations are stored as a flat list of targets, each a single per-(hydro unit, date) fraction. The matching simulated values are produced by simulated(), in the same order, so the two can be compared directly (RMSE by default).

Parameters:

swe_full – SWE [mm w.e.] at which a hydro unit is considered fully snow-covered, i.e. the threshold of the linear SWE→fraction depletion curve. Default: 100.0.
land_covers – Names of the land covers whose snowpacks contribute to the snow cover (e.g. ['ground'] to ignore glacier snow). Default: None (all land covers of the model).
metric – Calibration configuration, see AuxiliaryObservation.
weight – Calibration configuration, see AuxiliaryObservation.
mode – Calibration configuration, see AuxiliaryObservation.
tolerance – Calibration configuration, see AuxiliaryObservation.
relative_tolerance – Calibration configuration, see AuxiliaryObservation.

targets

One entry per observed fraction, with keys t (date, pd.Timestamp), unit_id (int) and value (fraction in [0, 1]).

Type:: list[dict]

Load pre-aggregated per-hydro-unit snow cover from a long-format CSV.

Each row is one observed snow cover fraction for a given hydro unit and date.

Parameters:

path – Path to the CSV file.
date_col – Columns (name or 0-based index) holding the observation date, the hydro unit id, and the snow cover value.
unit_col – Columns (name or 0-based index) holding the observation date, the hydro unit id, and the snow cover value.
value_col – Columns (name or 0-based index) holding the observation date, the hydro unit id, and the snow cover value.
value_scale – Factor applied to the value column to obtain a fraction in [0, 1] (e.g. 0.01 for a 0-100 % cover). Default: 1.0.
valid_min – Keep only raw values within [valid_min, valid_max] (applied before value_scale); values outside are dropped. Use to filter quality/error codes (e.g. valid_max=100 for a 0-100 % product).
valid_max – Keep only raw values within [valid_min, valid_max] (applied before value_scale); values outside are dropped. Use to filter quality/error codes (e.g. valid_max=100 for a 0-100 % product).
date_format – Optional explicit date format; otherwise dates are inferred.
start_date – Keep only observations whose date lies within this range.
end_date – Keep only observations whose date lies within this range.
skiprows – Rows to skip at the top of the file (metadata header).
swe_full – Configuration (see the class docstring).
land_covers – Configuration (see the class docstring).
metric – Configuration (see the class docstring).
weight – Configuration (see the class docstring).
mode – Configuration (see the class docstring).
tolerance – Configuration (see the class docstring).
relative_tolerance – Configuration (see the class docstring).
**read_csv_kwargs – Extra keyword arguments forwarded to pandas.read_csv.

Return type:

The populated observations object.

Load and aggregate a snow-cover HDF5 stack per hydro unit.

Same as from_netcdf() but reads HDF5 files: engine defaults to 'h5netcdf' when available, falling back to 'netcdf4' (which reads NetCDF4/HDF5). For data that stores its variable in an HDF5 group, pass group. Quality/error codes are filtered with valid_min / valid_max (e.g. valid_max=100 to drop MODIS codes above 100 %). See _from_stack() for the full parameter description.

classmethod from_modis(path: str | Path, raster_hydro_units: str | Path, hydro_units: Any | None = None, *, variable: str = 'NDSI_Snow_Cover', file_pattern: str = '*.hdf', date_regex: str = 'A(\\d{7})', date_format: str = '%Y%j', date_parser: Any | None = None, value_scale: float = 0.01, valid_min: float | None = 0.0, valid_max: float | None = 100.0, nodata: float | None = None, min_valid_ratio: float = 0.5, resampling: str = 'nearest', engine: str = 'netcdf4', start_date: str | pd.Timestamp | None = None, end_date: str | pd.Timestamp | None = None, swe_full: float = 100.0, land_covers: list[str] | None = None, metric: str = 'rmse', weight: float = 1.0, mode: str = 'objective', tolerance: float | None = None, relative_tolerance: float | None = None, cache_dir: str | Path | None = None) → SnowCoverObservations[source]

Load MODIS (HDF-EOS) daily snow-cover tiles, aggregated per hydro unit.

Reads HDF-EOS grid products such as MOD10A1 / MYD10A1 (NDSI snow cover). Each file holds one date’s tile; the date is parsed from the file name, tiles sharing a date are mosaicked, and the data are reprojected from the MODIS sinusoidal grid (read from the file’s StructMetadata) to the hydro-unit raster’s CRS before aggregating. The default valid_min=0 / valid_max=100 drop the product’s quality/error codes (200=missing, 250=cloud, 255=fill, …), and value_scale=0.01 converts the 0-100 % NDSI snow cover to a fraction.

Reading the HDF-EOS files uses xarray’s netcdf4 engine (the bundled netCDF4 reads HDF4-EOS); no separate HDF4/GDAL build is required.

Parameters:

path – Folder of tiles (with file_pattern) or a single file.
raster_hydro_units – The hydro-unit id raster and (optionally) the units to aggregate; see _from_stack().
hydro_units – The hydro-unit id raster and (optionally) the units to aggregate; see _from_stack().
variable – Data field to read (default 'NDSI_Snow_Cover').
file_pattern – Glob of the tile files (default '*.hdf').
date_regex – Parse the date from the file name: date_regex’s first group is parsed with date_format (defaults match MODIS A%Y%j tokens, e.g. A2025361).
date_format – Parse the date from the file name: date_regex’s first group is parsed with date_format (defaults match MODIS A%Y%j tokens, e.g. A2025361).
date_parser – Optional callable (filename) -> pd.Timestamp overriding the regex.
value_scale – Aggregation/filtering options; see _from_stack().
valid_min – Aggregation/filtering options; see _from_stack().
valid_max – Aggregation/filtering options; see _from_stack().
nodata – Aggregation/filtering options; see _from_stack().
min_valid_ratio – Aggregation/filtering options; see _from_stack().
resampling – Resampling for the reprojection to the hydro-unit grid (a rasterio.enums.Resampling name, default 'nearest').
engine – xarray engine used to read the files (default 'netcdf4').
start_date – Configuration; see _from_stack() and the class docstring.
end_date – Configuration; see _from_stack() and the class docstring.
swe_full – Configuration; see _from_stack() and the class docstring.
land_covers – Configuration; see _from_stack() and the class docstring.
metric – Calibration-signal configuration; see the class docstring.
weight – Calibration-signal configuration; see the class docstring.
mode – Calibration-signal configuration; see the class docstring.
tolerance – Calibration-signal configuration; see the class docstring.
relative_tolerance – Calibration-signal configuration; see the class docstring.
cache_dir – If given, the aggregated per-(unit, date) fractions are cached there as a CSV named snow_cover_<hash>.csv. The hash is built from the hydro-unit id raster (the discretization), the aggregation options and a signature of the input tiles, so caches never mix across discretizations or settings. On a later call with the same inputs the CSV is loaded directly (skipping the slow tile reading); otherwise it is written after aggregating. Default: None (no caching).

Return type:

The populated observations object.

Load and aggregate a snow-cover NetCDF stack per hydro unit.

See _from_stack() for the full parameter description. This is the netCDF variant; for HDF5 inputs use from_hdf5().

observed() → numpy.ndarray[source]: The observed snow cover fractions [0, 1], one per target.

required_recordings(model: Model) → RecordingRequest[source]

The snowpack series needed by simulated().

For each contributing land cover this records the snowpack snow content and (once) the land-cover fractions, the inputs of the SWE-to-fraction transform. A targeted alternative to record_all=True.

restrict_to_period(start: str | pd.Timestamp | None, end: str | pd.Timestamp | None) → None[source]: Keep only targets whose date lies within [start, end].

simulated(model: Model) → np.ndarray[source]

Compute the simulated snow cover fraction matching each observation.

For each land cover the recorded snowpack SWE is turned into a per-unit cover fraction (linear depletion curve, see _swe_to_fraction()), then combined across land covers as a land-cover-fraction-weighted mean. The result is then read per target at its (hydro unit, date).

The snowpack snow content and the land-cover fractions must have been recorded in memory — either with record_all=True or by recording the specific items (see required_recordings(), applied via configure_recording(model) before model.setup()).

Returns:: Simulated snow cover fraction [0, 1], aligned 1:1 with observed(). Entries are NaN where the date falls outside the simulation or the unit is absent.
Return type:: np.ndarray

to_csv(path: str | Path) → None[source]

Save the per-(hydro unit, date) observations to a long-format CSV.

The file has date, unit_id and value columns and can be read back with from_csv() (it is also the format used by the raster loaders’ cache).

property values: numpy.ndarray: Alias of observed() (observed fractions [0, 1]).