Clustering IO & Apply¶
Cluster on a subset of your data, save the clustering, and apply it to the full dataset later. A common workflow: determine clusters from renewable generation profiles, then apply the same clustering to all variables including demand.
In [1]:
Copied!
import plotly.io as pio
import xarray_plotly # noqa: F401
import tsam_xarray
from tsam_xarray._sample_data import sample_energy_data
pio.renderers.default = "notebook_connected"
da = sample_energy_data(n_days=30).sel(region="north", scenario="low")
print(f"Shape: {dict(da.sizes)}")
import plotly.io as pio
import xarray_plotly # noqa: F401
import tsam_xarray
from tsam_xarray._sample_data import sample_energy_data
pio.renderers.default = "notebook_connected"
da = sample_energy_data(n_days=30).sel(region="north", scenario="low")
print(f"Shape: {dict(da.sizes)}")
Shape: {'time': 720, 'variable': 3}
Cluster on renewables only¶
In [2]:
Copied!
# Cluster using only solar and wind
da_renewables = da.sel(variable=["solar", "wind"])
result_ren = tsam_xarray.aggregate(
da_renewables,
time_dim="time",
cluster_dim="variable",
n_clusters=4,
)
result_ren.clustering.to_json("clustering.json")
print(f"Clustered on: {list(da_renewables.coords['variable'].values)}")
print(f"Saved {result_ren.n_clusters} clusters")
result_ren.cluster_representatives.plotly.line(
line_shape="hv",
x="timestep",
color="cluster",
facet_col="variable",
title="Clustering based on renewables",
)
# Cluster using only solar and wind
da_renewables = da.sel(variable=["solar", "wind"])
result_ren = tsam_xarray.aggregate(
da_renewables,
time_dim="time",
cluster_dim="variable",
n_clusters=4,
)
result_ren.clustering.to_json("clustering.json")
print(f"Clustered on: {list(da_renewables.coords['variable'].values)}")
print(f"Saved {result_ren.n_clusters} clusters")
result_ren.cluster_representatives.plotly.line(
line_shape="hv",
x="timestep",
color="cluster",
facet_col="variable",
title="Clustering based on renewables",
)
Clustered on: [np.str_('solar'), np.str_('wind')]
Saved 4 clusters
Inspect the clustering¶
The ClusteringResult object exposes clustering metadata as xarray
DataArrays — useful for understanding cluster structure before
saving or optimizing.
In [3]:
Copied!
cr = result_ren.clustering
cr
cr = result_ren.clustering
cr
Out[3]:
ClusteringResult(n_clusters=4, n_periods=30, timesteps_per_period=24, time_dim='time', cluster_dim=['variable'])
In [4]:
Copied!
cr.cluster_assignments.expand_dims(dim_0=[0]).plotly.imshow(x="period")
cr.cluster_assignments.expand_dims(dim_0=[0]).plotly.imshow(x="period")
In [5]:
Copied!
cr.cluster_occurrences.plotly.bar(
x="cluster",
title="Cluster occurrences (periods per cluster)",
)
cr.cluster_occurrences.plotly.bar(
x="cluster",
title="Cluster occurrences (periods per cluster)",
)
Apply to all variables¶
Load the clustering and apply it to the full dataset including demand. The cluster assignments (which days go together) stay the same — only the representatives are recomputed for the new variables.
In [6]:
Copied!
clustering = tsam_xarray.load_clustering("clustering.json")
# Apply renewable-based clustering to ALL variables
result_all = clustering.apply(da)
print(
f"Variables: {list(result_all.cluster_representatives.coords['variable'].values)}"
)
result_all.cluster_representatives.plotly.line(
line_shape="hv",
x="timestep",
color="cluster",
facet_col="variable",
title="Renewable-based clustering applied to all variables",
)
clustering = tsam_xarray.load_clustering("clustering.json")
# Apply renewable-based clustering to ALL variables
result_all = clustering.apply(da)
print(
f"Variables: {list(result_all.cluster_representatives.coords['variable'].values)}"
)
result_all.cluster_representatives.plotly.line(
line_shape="hv",
x="timestep",
color="cluster",
facet_col="variable",
title="Renewable-based clustering applied to all variables",
)
Variables: ['demand', 'solar', 'wind']
Compare accuracy¶
In [7]:
Copied!
import pandas as pd
comparison = pd.DataFrame(
{
"renewables only": result_ren.accuracy.rmse.to_series(),
"all variables (applied)": result_all.accuracy.rmse.to_series(),
}
)
comparison
import pandas as pd
comparison = pd.DataFrame(
{
"renewables only": result_ren.accuracy.rmse.to_series(),
"all variables (applied)": result_all.accuracy.rmse.to_series(),
}
)
comparison
Out[7]:
| renewables only | all variables (applied) | |
|---|---|---|
| variable | ||
| demand | NaN | 0.073805 |
| solar | 0.112657 | 0.112657 |
| wind | 0.156366 | 0.156366 |
Disaggregate¶
Expand cluster-representative data back to the original time axis. Works directly from the loaded clustering — no original data needed.
In [8]:
Copied!
disaggregated = clustering.disaggregate(result_all.cluster_representatives)
disaggregated.sel(variable="solar").plotly.line(
x="time",
title="Disaggregated solar (cluster representatives expanded to full time axis)",
line_shape="hv",
)
disaggregated = clustering.disaggregate(result_all.cluster_representatives)
disaggregated.sel(variable="solar").plotly.line(
x="time",
title="Disaggregated solar (cluster representatives expanded to full time axis)",
line_shape="hv",
)