Segmentation¶
Segmentation reduces the temporal resolution within each cluster representative by grouping consecutive timesteps into segments. This is useful when optimizing over cluster representatives — fewer timesteps means faster solves.
tsam_xarray exposes segment_durations as a DataArray and provides
disaggregate() to map segmented results back to the original time axis.
import plotly.io as pio
import xarray as xr
import xarray_plotly # noqa: F401
from tsam import SegmentConfig
import tsam_xarray
from tsam_xarray._sample_data import sample_energy_data
pio.renderers.default = "notebook_connected"
da = sample_energy_data(n_days=30).sel(region="north", scenario="low")
print(f"Input shape: {dict(da.sizes)}")
Input shape: {'time': 720, 'variable': 3}
Aggregate with segmentation¶
Pass segments=SegmentConfig(n_segments=6) to reduce each 24-hour period to 6 segments.
result = tsam_xarray.aggregate(
da,
time_dim="time",
cluster_dim="variable",
n_clusters=4,
segments=SegmentConfig(n_segments=6, representation="medoid"),
)
n_compact = result.n_clusters * result.n_timesteps_per_period
print(f"Cluster representatives: {dict(result.cluster_representatives.sizes)}")
print(f"Data reduction: {da.sizes['time']} -> {n_compact} values")
result.cluster_representatives.to_dataframe("value").head(10)
Cluster representatives: {'cluster': 4, 'timestep': 6, 'variable': 3}
Data reduction: 720 -> 24 values
| value | |||
|---|---|---|---|
| cluster | timestep | variable | |
| 0 | 0 | demand | 0.259442 |
| solar | 0.002530 | ||
| wind | 0.341894 | ||
| 1 | demand | 0.568183 | |
| solar | 0.000000 | ||
| wind | 0.479922 | ||
| 2 | demand | 0.750693 | |
| solar | 0.055802 | ||
| wind | 0.523210 | ||
| 3 | demand | 0.823226 |
result.cluster_representatives.plotly.line(
line_shape="hv",
x="timestep",
facet_col="variable",
color="cluster",
title="Segmented cluster representatives (6 segments per day)",
)
Segment durations¶
Each segment spans a different number of original timesteps.
segment_durations tells you how many hours each segment represents.
result.segment_durations.to_dataframe("hours")
| hours | ||
|---|---|---|
| cluster | timestep | |
| 0 | 0 | 4 |
| 1 | 4 | |
| 2 | 2 | |
| 3 | 5 | |
| 4 | 2 | |
| 5 | 7 | |
| 1 | 0 | 3 |
| 1 | 4 | |
| 2 | 3 | |
| 3 | 6 | |
| 4 | 3 | |
| 5 | 5 | |
| 2 | 0 | 4 |
| 1 | 3 | |
| 2 | 3 | |
| 3 | 4 | |
| 4 | 3 | |
| 5 | 7 | |
| 3 | 0 | 8 |
| 1 | 3 | |
| 2 | 4 | |
| 3 | 1 | |
| 4 | 5 | |
| 5 | 3 |
Durations sum to 24 (hours per day) for each cluster:
result.segment_durations.sum(dim="timestep").to_dataframe("total_hours")
| total_hours | |
|---|---|
| cluster | |
| 0 | 24 |
| 1 | 24 |
| 2 | 24 |
| 3 | 24 |
Accuracy¶
result.accuracy.rmse.to_dataframe("RMSE")
| RMSE | |
|---|---|
| variable | |
| demand | 0.112650 |
| solar | 0.117669 |
| wind | 0.152176 |
Disaggregate¶
With segmentation, disaggregate() places values at segment boundaries
and fills the rest with NaN. You choose how to fill.
dis = result.disaggregate(result.cluster_representatives)
print(f"NaN values: {int(dis.isnull().sum())} / {dis.size}")
dis.sel(variable="solar").to_dataframe("value").head(30)
NaN values: 1620 / 2160
| variable | value | |
|---|---|---|
| time | ||
| 2020-01-01 00:00:00 | solar | 0.012059 |
| 2020-01-01 01:00:00 | solar | NaN |
| 2020-01-01 02:00:00 | solar | NaN |
| 2020-01-01 03:00:00 | solar | 0.000000 |
| 2020-01-01 04:00:00 | solar | NaN |
| 2020-01-01 05:00:00 | solar | NaN |
| 2020-01-01 06:00:00 | solar | NaN |
| 2020-01-01 07:00:00 | solar | 0.049799 |
| 2020-01-01 08:00:00 | solar | NaN |
| 2020-01-01 09:00:00 | solar | NaN |
| 2020-01-01 10:00:00 | solar | 0.106954 |
| 2020-01-01 11:00:00 | solar | NaN |
| 2020-01-01 12:00:00 | solar | NaN |
| 2020-01-01 13:00:00 | solar | NaN |
| 2020-01-01 14:00:00 | solar | NaN |
| 2020-01-01 15:00:00 | solar | NaN |
| 2020-01-01 16:00:00 | solar | 0.028507 |
| 2020-01-01 17:00:00 | solar | NaN |
| 2020-01-01 18:00:00 | solar | NaN |
| 2020-01-01 19:00:00 | solar | 0.025895 |
| 2020-01-01 20:00:00 | solar | NaN |
| 2020-01-01 21:00:00 | solar | NaN |
| 2020-01-01 22:00:00 | solar | NaN |
| 2020-01-01 23:00:00 | solar | NaN |
| 2020-01-02 00:00:00 | solar | 0.012059 |
| 2020-01-02 01:00:00 | solar | NaN |
| 2020-01-02 02:00:00 | solar | NaN |
| 2020-01-02 03:00:00 | solar | 0.000000 |
| 2020-01-02 04:00:00 | solar | NaN |
| 2020-01-02 05:00:00 | solar | NaN |
Forward-fill for rate variables (power, temperature)¶
filled = dis.ffill(dim="time")
comparison = xr.concat(
[da.sel(variable="solar"), filled.sel(variable="solar")],
dim="source",
).assign_coords(source=["original", "disaggregated + ffill"])
comparison.plotly.line(
x="time",
color="source",
title="Segmented disaggregation vs original (solar)",
)
Comparison: with and without segmentation¶
result_noseg = tsam_xarray.aggregate(
da,
time_dim="time",
cluster_dim="variable",
n_clusters=4,
)
comparison = xr.concat(
[
da.sel(variable="solar"),
result_noseg.reconstructed.sel(variable="solar"),
filled.sel(variable="solar"),
],
dim="source",
).assign_coords(source=["original", "4 clusters (no seg)", "4 clusters x 6 segments"])
comparison.plotly.line(
x="time",
color="source",
title="Effect of segmentation on reconstruction quality (solar)",
)