Clustering IO & Apply¶

Cluster on a subset of your data, save the clustering, and apply it to the full dataset later. A common workflow: determine clusters from renewable generation profiles, then apply the same clustering to all variables including demand.

In [1]:

Copied!

import plotly.io as pio
import xarray_plotly  # noqa: F401

import tsam_xarray
from tsam_xarray._sample_data import sample_energy_data

pio.renderers.default = "notebook_connected"

da = sample_energy_data(n_days=30).sel(region="north", scenario="low")
print(f"Shape: {dict(da.sizes)}")
import plotly.io as pio
import xarray_plotly  # noqa: F401

import tsam_xarray
from tsam_xarray._sample_data import sample_energy_data

pio.renderers.default = "notebook_connected"

da = sample_energy_data(n_days=30).sel(region="north", scenario="low")
print(f"Shape: {dict(da.sizes)}")

Shape: {'time': 720, 'variable': 3}

Cluster on renewables only¶

In [2]:

Copied!





# Cluster using only solar and wind
da_renewables = da.sel(variable=["solar", "wind"])

result_ren = tsam_xarray.aggregate(
    da_renewables,
    time_dim="time",
    cluster_dim="variable",
    n_clusters=4,
)

result_ren.clustering.to_json("clustering.json")
print(f"Clustered on: {list(da_renewables.coords['variable'].values)}")
print(f"Saved {result_ren.n_clusters} clusters")
result_ren.cluster_representatives.plotly.line(
    line_shape="hv",
    x="timestep",
    color="cluster",
    facet_col="variable",
    title="Clustering based on renewables",
)
# Cluster using only solar and wind
da_renewables = da.sel(variable=["solar", "wind"])

result_ren = tsam_xarray.aggregate(
    da_renewables,
    time_dim="time",
    cluster_dim="variable",
    n_clusters=4,
)

result_ren.clustering.to_json("clustering.json")
print(f"Clustered on: {list(da_renewables.coords['variable'].values)}")
print(f"Saved {result_ren.n_clusters} clusters")
result_ren.cluster_representatives.plotly.line(
    line_shape="hv",
    x="timestep",
    color="cluster",
    facet_col="variable",
    title="Clustering based on renewables",
)

Clustered on: [np.str_('solar'), np.str_('wind')]
Saved 4 clusters

Inspect the clustering¶

The ClusteringResult object exposes clustering metadata as xarray DataArrays — useful for understanding cluster structure before saving or optimizing.

In [3]:

Copied!

cr = result_ren.clustering
cr
cr = result_ren.clustering
cr

Out[3]:

ClusteringResult(n_clusters=4, n_periods=30, timesteps_per_period=24, time_dim='time', cluster_dim=['variable'])

In [4]:

Copied!

cr.cluster_assignments.expand_dims(dim_0=[0]).plotly.imshow(x="period")
cr.cluster_assignments.expand_dims(dim_0=[0]).plotly.imshow(x="period")

In [5]:

Copied!





cr.cluster_occurrences.plotly.bar(
    x="cluster",
    title="Cluster occurrences (periods per cluster)",
)
cr.cluster_occurrences.plotly.bar(
    x="cluster",
    title="Cluster occurrences (periods per cluster)",
)

Apply to all variables¶

Load the clustering and apply it to the full dataset including demand. The cluster assignments (which days go together) stay the same — only the representatives are recomputed for the new variables.

In [6]:

Copied!





clustering = tsam_xarray.load_clustering("clustering.json")

# Apply renewable-based clustering to ALL variables
result_all = clustering.apply(da)
print(
    f"Variables: {list(result_all.cluster_representatives.coords['variable'].values)}"
)
result_all.cluster_representatives.plotly.line(
    line_shape="hv",
    x="timestep",
    color="cluster",
    facet_col="variable",
    title="Renewable-based clustering applied to all variables",
)
clustering = tsam_xarray.load_clustering("clustering.json")

# Apply renewable-based clustering to ALL variables
result_all = clustering.apply(da)
print(
    f"Variables: {list(result_all.cluster_representatives.coords['variable'].values)}"
)
result_all.cluster_representatives.plotly.line(
    line_shape="hv",
    x="timestep",
    color="cluster",
    facet_col="variable",
    title="Renewable-based clustering applied to all variables",
)

Variables: ['demand', 'solar', 'wind']

Compare accuracy¶

In [7]:

Copied!





import pandas as pd

comparison = pd.DataFrame(
    {
        "renewables only": result_ren.accuracy.rmse.to_series(),
        "all variables (applied)": result_all.accuracy.rmse.to_series(),
    }
)
comparison
import pandas as pd

comparison = pd.DataFrame(
    {
        "renewables only": result_ren.accuracy.rmse.to_series(),
        "all variables (applied)": result_all.accuracy.rmse.to_series(),
    }
)
comparison

Out[7]:

	renewables only	all variables (applied)
variable
demand	NaN	0.073805
solar	0.112657	0.112657
wind	0.156366	0.156366

Disaggregate¶

Expand cluster-representative data back to the original time axis. Works directly from the loaded clustering — no original data needed.

In [8]:

Copied!





disaggregated = clustering.disaggregate(result_all.cluster_representatives)
disaggregated.sel(variable="solar").plotly.line(
    x="time",
    title="Disaggregated solar (cluster representatives expanded to full time axis)",
    line_shape="hv",
)
disaggregated = clustering.disaggregate(result_all.cluster_representatives)
disaggregated.sel(variable="solar").plotly.line(
    x="time",
    title="Disaggregated solar (cluster representatives expanded to full time axis)",
    line_shape="hv",
)