Multi-Dimensional Data¶
tsam_xarray handles multi-dimensional DataArrays through two mechanisms:
cluster_dim— dimensions clustered together (shared clustering)- Auto-slicing — remaining dimensions get independent clusterings
This notebook covers stacking, slicing, and weights.
import plotly.io as pio
import xarray_plotly # noqa: F401
import tsam_xarray
from tsam_xarray._sample_data import sample_energy_data
pio.renderers.default = "notebook_connected"
da = sample_energy_data(n_days=30)
print(f"Dims: {list(da.dims)}")
print(f"Shape: {dict(da.sizes)}")
Dims: ['time', 'variable', 'region', 'scenario']
Shape: {'time': 720, 'variable': 3, 'region': 3, 'scenario': 2}
Multiple cluster dims¶
Pass cluster_dim=["variable", "region"] to cluster all variable-region
combinations together. They are stacked internally and unstacked in the results.
da_single = da.sel(scenario="low")
result = tsam_xarray.aggregate(
da_single,
time_dim="time",
cluster_dim=["variable", "region"],
n_clusters=4,
)
print("Result dims:", result.cluster_representatives.dims)
result.cluster_representatives.to_dataframe("value").head(10)
Result dims: ('cluster', 'timestep', 'variable', 'region')
| value | ||||
|---|---|---|---|---|
| cluster | timestep | variable | region | |
| 0 | 0 | demand | east | 0.172796 |
| north | 0.203079 | |||
| south | 0.168561 | |||
| solar | east | 0.000283 | ||
| north | 0.000000 | |||
| south | 0.000000 | |||
| wind | east | 0.575650 | ||
| north | 0.754306 | |||
| south | 0.407363 | |||
| 1 | demand | east | 0.194493 |
result.cluster_representatives.sel(variable="solar").plotly.line(
line_shape="hv",
x="timestep",
color="cluster",
facet_col="region",
title="Cluster representatives (solar, by region)",
)
Auto-slicing¶
Any dimension not in time_dim or cluster_dim is automatically sliced —
one independent aggregation per coordinate, with results concatenated into
coherent multi-dimensional arrays.
Here, scenario is auto-sliced. Each scenario gets its own clustering.
Cluster weights, accuracy metrics, and cluster representatives all have the scenario
dimension — no manual looping or concatenation needed.
Without tsam_xarray, you'd have to:
# Manual approach (what tsam_xarray replaces)
results = {}
for scenario in da.scenario.values:
da_slice = da.sel(scenario=scenario)
df = ... # flatten to DataFrame
results[scenario] = tsam.aggregate(df, n_clusters=4)
# Then manually concat cluster_weights, accuracy, cluster_representatives...
result_sliced = tsam_xarray.aggregate(
da,
time_dim="time",
cluster_dim=["variable", "region"],
n_clusters=4,
)
print("Result dims:", result_sliced.cluster_representatives.dims)
result_sliced.cluster_weights.to_dataframe("weight")
Result dims: ('scenario', 'cluster', 'timestep', 'variable', 'region')
| weight | ||
|---|---|---|
| scenario | cluster | |
| low | 0 | 8 |
| 1 | 13 | |
| 2 | 5 | |
| 3 | 4 | |
| high | 0 | 11 |
| 1 | 13 | |
| 2 | 4 | |
| 3 | 2 |
result_sliced.accuracy.rmse.to_dataframe("RMSE")
| RMSE | |||
|---|---|---|---|
| scenario | variable | region | |
| low | demand | east | 0.070919 |
| north | 0.071710 | ||
| south | 0.069703 | ||
| solar | east | 0.099097 | |
| north | 0.116967 | ||
| south | 0.093217 | ||
| wind | east | 0.153082 | |
| north | 0.154634 | ||
| south | 0.156379 | ||
| high | demand | east | 0.066410 |
| north | 0.069501 | ||
| south | 0.067962 | ||
| solar | east | 0.067889 | |
| north | 0.085266 | ||
| south | 0.063731 | ||
| wind | east | 0.157093 | |
| north | 0.162105 | ||
| south | 0.165263 |
Multiple slice dims¶
Only variable as cluster_dim — both region and scenario are auto-sliced.
One aggregation per (region, scenario) combination.
result_multi = tsam_xarray.aggregate(
da,
time_dim="time",
cluster_dim="variable",
n_clusters=4,
)
print("Result dims:", result_multi.cluster_representatives.dims)
result_multi.cluster_representatives.sel(
variable="solar",
scenario="low",
).plotly.line(
line_shape="hv",
x="timestep",
color="cluster",
facet_col="region",
title="Cluster representatives (solar, low, per region)",
)
Result dims: ('region', 'scenario', 'cluster', 'timestep', 'variable')
Weights¶
Use a dict to weight certain coordinates higher during clustering.
For multiple cluster_dim, use a dict-of-dicts keyed by dimension name.
# Weight solar 2x — broadcasts across region (missing entries default to 1.0)
result_w = tsam_xarray.aggregate(
da_single,
time_dim="time",
cluster_dim=["variable", "region"],
n_clusters=4,
weights={"variable": {"solar": 2.0}},
)
result_w.accuracy.rmse.to_dataframe("RMSE")
| RMSE | ||
|---|---|---|
| variable | region | |
| demand | east | 0.075016 |
| north | 0.076493 | |
| south | 0.071846 | |
| solar | east | 0.075933 |
| north | 0.112719 | |
| south | 0.073196 | |
| wind | east | 0.160321 |
| north | 0.159308 | |
| south | 0.162374 |
Weights can span multiple dimensions — they multiply across dims:
# Weight solar in north: solar=3.0 * north=2.0 = 6.0
result_w2 = tsam_xarray.aggregate(
da_single,
time_dim="time",
cluster_dim=["variable", "region"],
n_clusters=4,
weights={"variable": {"solar": 3.0}, "region": {"north": 2.0}},
)
result_w2.accuracy.rmse.to_dataframe("RMSE")
| RMSE | ||
|---|---|---|
| variable | region | |
| demand | east | 0.065388 |
| north | 0.073632 | |
| south | 0.068182 | |
| solar | east | 0.082727 |
| north | 0.095599 | |
| south | 0.076938 | |
| wind | east | 0.184076 |
| north | 0.186040 | |
| south | 0.182107 |