Tuning¶
Find the optimal n_clusters and n_segments for your data.
| Function | What it does |
|---|---|
grid_search |
Full grid search — returns best RMSE with complete history |
find_pareto_front |
Same grid, filtered to Pareto-optimal points |
find_optimal_combination |
Boundary curve for a target data reduction — fastest |
All three evaluate each candidate across all slices.
import numpy as np
import plotly.io as pio
import xarray_plotly # noqa: F401
import tsam_xarray
from tsam_xarray._sample_data import sample_energy_data
pio.renderers.default = "notebook_connected"
da = sample_energy_data(n_days=30)
print(f"Shape: {dict(da.sizes)}")
Shape: {'time': 720, 'variable': 3, 'region': 3, 'scenario': 2}
Grid search¶
Use timesteps with np.geomspace to test a sparse logarithmic spread
of timestep counts — covers the full range while testing far fewer candidates.
ts = np.unique(np.geomspace(2, 48, num=12, dtype=int))
print(f"Testing timestep counts: {ts.tolist()}")
grid = tsam_xarray.grid_search(
da,
time_dim="time",
cluster_dim=["variable", "region"],
timesteps=ts,
show_progress=False,
)
print(f"Tested {len(grid.history)} combinations")
print(f"Best: {grid.n_clusters}c x {grid.n_segments}s (RMSE={grid.rmse:.4f})")
grid.summary_matrix["rmse"].plotly.imshow(
x="n_segments",
y="n_clusters",
title="RMSE by (n_clusters, n_segments)",
)
Testing timestep counts: [2, 3, 4, 6, 8, 11, 15, 20, 26, 35, 48]
Tested 32 combinations Best: 6c x 8s (RMSE=0.1068)
The best result's accuracy.weighted_rmse is a DataArray with per-slice values — see which scenario/region drives the error.
grid.best_result.accuracy.weighted_rmse.plotly.bar(
x="scenario",
title="Weighted RMSE per scenario (best config)",
)
With save_all_results=True, the .accuracy property gives per-column RMSE
as a DataArray with (n_clusters, n_segments) dims — one heatmap per variable.
grid.accuracy["rmse"].sel(scenario="low", region="north").plotly.imshow(
x="n_segments",
y="n_clusters",
facet_col="variable",
title="Per-variable RMSE (north, low scenario)",
)
Pareto front¶
find_pareto_front runs the same grid search but filters to non-dominated
configurations — where no other combo has both lower RMSE and fewer timesteps.
pareto = tsam_xarray.find_pareto_front(
da,
time_dim="time",
cluster_dim=["variable", "region"],
timesteps=ts,
show_progress=False,
)
print(f"Pareto-optimal: {len(pareto.history)} of {len(grid.history)} tested")
pareto.plot()
Pareto-optimal: 14 of 32 tested
Target data reduction¶
find_optimal_combination tests only the boundary combos for a target reduction — faster.
result_opt = tsam_xarray.find_optimal_combination(
da,
time_dim="time",
cluster_dim=["variable", "region"],
data_reduction=0.05,
show_progress=False,
)
print(f"Best for 5% reduction: {result_opt.n_clusters}c x {result_opt.n_segments}s")
print(f"RMSE: {result_opt.rmse:.4f}")
result_opt.summary
Config (n_clusters=36, n_segments=1) failed: Cannot extract more clusters than samples: 36 clusters were given for a tree with 30 leaves.
Best for 5% reduction: 6c x 6s RMSE: 0.1134
| n_clusters | n_segments | rmse | timesteps | |
|---|---|---|---|---|
| 4 | 6 | 6 | 0.113354 | 36 |
| 7 | 4 | 9 | 0.113454 | 36 |
| 5 | 5 | 7 | 0.113877 | 35 |
| 6 | 4 | 8 | 0.114703 | 32 |
| 9 | 3 | 11 | 0.116326 | 33 |
| 8 | 3 | 10 | 0.116501 | 30 |
| 10 | 3 | 12 | 0.116895 | 36 |
| 3 | 7 | 5 | 0.117911 | 35 |
| 15 | 2 | 17 | 0.121194 | 34 |
| 14 | 2 | 16 | 0.121386 | 32 |
| 16 | 2 | 18 | 0.121407 | 36 |
| 13 | 2 | 15 | 0.121587 | 30 |
| 12 | 2 | 14 | 0.121747 | 28 |
| 11 | 2 | 13 | 0.122086 | 26 |
| 2 | 9 | 4 | 0.124040 | 36 |
| 1 | 12 | 3 | 0.138876 | 36 |
| 0 | 18 | 2 | 0.204624 | 36 |