Tuning¶

Find the optimal n_clusters and n_segments for your data.

Function	What it does
`grid_search`	Full grid search — returns best RMSE with complete history
`find_pareto_front`	Same grid, filtered to Pareto-optimal points
`find_optimal_combination`	Boundary curve for a target data reduction — fastest

All three evaluate each candidate across all slices.

In [1]:

Copied!





import numpy as np
import plotly.io as pio
import xarray_plotly  # noqa: F401

import tsam_xarray
from tsam_xarray._sample_data import sample_energy_data

pio.renderers.default = "notebook_connected"

da = sample_energy_data(n_days=30)
print(f"Shape: {dict(da.sizes)}")
import numpy as np
import plotly.io as pio
import xarray_plotly  # noqa: F401

import tsam_xarray
from tsam_xarray._sample_data import sample_energy_data

pio.renderers.default = "notebook_connected"

da = sample_energy_data(n_days=30)
print(f"Shape: {dict(da.sizes)}")

Shape: {'time': 720, 'variable': 3, 'region': 3, 'scenario': 2}

Grid search¶

Use timesteps with np.geomspace to test a sparse logarithmic spread of timestep counts — covers the full range while testing far fewer candidates.

In [2]:

Copied!





ts = np.unique(np.geomspace(2, 48, num=12, dtype=int))
print(f"Testing timestep counts: {ts.tolist()}")

grid = tsam_xarray.grid_search(
    da,
    time_dim="time",
    cluster_dim=["variable", "region"],
    timesteps=ts,
    show_progress=False,
)
print(f"Tested {len(grid.history)} combinations")
print(f"Best: {grid.n_clusters}c x {grid.n_segments}s (RMSE={grid.rmse:.4f})")

grid.summary_matrix["rmse"].plotly.imshow(
    x="n_segments",
    y="n_clusters",
    title="RMSE by (n_clusters, n_segments)",
)
ts = np.unique(np.geomspace(2, 48, num=12, dtype=int))
print(f"Testing timestep counts: {ts.tolist()}")

grid = tsam_xarray.grid_search(
    da,
    time_dim="time",
    cluster_dim=["variable", "region"],
    timesteps=ts,
    show_progress=False,
)
print(f"Tested {len(grid.history)} combinations")
print(f"Best: {grid.n_clusters}c x {grid.n_segments}s (RMSE={grid.rmse:.4f})")

grid.summary_matrix["rmse"].plotly.imshow(
    x="n_segments",
    y="n_clusters",
    title="RMSE by (n_clusters, n_segments)",
)

Testing timestep counts: [2, 3, 4, 6, 8, 11, 15, 20, 26, 35, 48]

Tested 32 combinations
Best: 6c x 8s (RMSE=0.1068)

The best result's accuracy.weighted_rmse is a DataArray with per-slice values — see which scenario/region drives the error.

In [3]:

Copied!





grid.best_result.accuracy.weighted_rmse.plotly.bar(
    x="scenario",
    title="Weighted RMSE per scenario (best config)",
)
grid.best_result.accuracy.weighted_rmse.plotly.bar(
    x="scenario",
    title="Weighted RMSE per scenario (best config)",
)

With save_all_results=True, the .accuracy property gives per-column RMSE as a DataArray with (n_clusters, n_segments) dims — one heatmap per variable.

In [4]:

Copied!





grid.accuracy["rmse"].sel(scenario="low", region="north").plotly.imshow(
    x="n_segments",
    y="n_clusters",
    facet_col="variable",
    title="Per-variable RMSE (north, low scenario)",
)
grid.accuracy["rmse"].sel(scenario="low", region="north").plotly.imshow(
    x="n_segments",
    y="n_clusters",
    facet_col="variable",
    title="Per-variable RMSE (north, low scenario)",
)

Pareto front¶

find_pareto_front runs the same grid search but filters to non-dominated configurations — where no other combo has both lower RMSE and fewer timesteps.

In [5]:

Copied!





pareto = tsam_xarray.find_pareto_front(
    da,
    time_dim="time",
    cluster_dim=["variable", "region"],
    timesteps=ts,
    show_progress=False,
)
print(f"Pareto-optimal: {len(pareto.history)} of {len(grid.history)} tested")
pareto.plot()
pareto = tsam_xarray.find_pareto_front(
    da,
    time_dim="time",
    cluster_dim=["variable", "region"],
    timesteps=ts,
    show_progress=False,
)
print(f"Pareto-optimal: {len(pareto.history)} of {len(grid.history)} tested")
pareto.plot()

Pareto-optimal: 14 of 32 tested

Target data reduction¶

find_optimal_combination tests only the boundary combos for a target reduction — faster.

In [6]:

Copied!





result_opt = tsam_xarray.find_optimal_combination(
    da,
    time_dim="time",
    cluster_dim=["variable", "region"],
    data_reduction=0.05,
    show_progress=False,
)
print(f"Best for 5% reduction: {result_opt.n_clusters}c x {result_opt.n_segments}s")
print(f"RMSE: {result_opt.rmse:.4f}")
result_opt.summary
result_opt = tsam_xarray.find_optimal_combination(
    da,
    time_dim="time",
    cluster_dim=["variable", "region"],
    data_reduction=0.05,
    show_progress=False,
)
print(f"Best for 5% reduction: {result_opt.n_clusters}c x {result_opt.n_segments}s")
print(f"RMSE: {result_opt.rmse:.4f}")
result_opt.summary

Config (n_clusters=36, n_segments=1) failed: Cannot extract more clusters than samples: 36 clusters were given for a tree with 30 leaves.

Best for 5% reduction: 6c x 6s
RMSE: 0.1134

Out[6]:

	n_clusters	n_segments	rmse	timesteps
4	6	6	0.113352	36
7	4	9	0.113452	36
5	5	7	0.113875	35
6	4	8	0.114701	32
9	3	11	0.116327	33
8	3	10	0.116503	30
10	3	12	0.116896	36
3	7	5	0.117908	35
15	2	17	0.121194	34
14	2	16	0.121386	32
16	2	18	0.121407	36
13	2	15	0.121587	30
12	2	14	0.121747	28
11	2	13	0.122086	26
2	9	4	0.124039	36
1	12	3	0.138325	36
0	18	2	0.204624	36