Segmentation¶

Segmentation reduces the temporal resolution within each cluster representative by grouping consecutive timesteps into segments. This is useful when optimizing over cluster representatives — fewer timesteps means faster solves.

tsam_xarray exposes segment_durations as a DataArray and provides disaggregate() to map segmented results back to the original time axis.

In [1]:

Copied!





import plotly.io as pio
import xarray as xr
import xarray_plotly  # noqa: F401
from tsam import SegmentConfig

import tsam_xarray
from tsam_xarray._sample_data import sample_energy_data

pio.renderers.default = "notebook_connected"

da = sample_energy_data(n_days=30).sel(region="north", scenario="low")
print(f"Input shape: {dict(da.sizes)}")
import plotly.io as pio
import xarray as xr
import xarray_plotly  # noqa: F401
from tsam import SegmentConfig

import tsam_xarray
from tsam_xarray._sample_data import sample_energy_data

pio.renderers.default = "notebook_connected"

da = sample_energy_data(n_days=30).sel(region="north", scenario="low")
print(f"Input shape: {dict(da.sizes)}")

Input shape: {'time': 720, 'variable': 3}

Aggregate with segmentation¶

Pass segments=SegmentConfig(n_segments=6) to reduce each 24-hour period to 6 segments.

In [2]:

Copied!





result = tsam_xarray.aggregate(
    da,
    time_dim="time",
    cluster_dim="variable",
    n_clusters=4,
    segments=SegmentConfig(n_segments=6, representation="medoid"),
)
n_compact = result.n_clusters * result.n_timesteps_per_period
print(f"Cluster representatives: {dict(result.cluster_representatives.sizes)}")
print(f"Data reduction: {da.sizes['time']} -> {n_compact} values")
result.cluster_representatives.to_dataframe("value").head(10)
result = tsam_xarray.aggregate(
    da,
    time_dim="time",
    cluster_dim="variable",
    n_clusters=4,
    segments=SegmentConfig(n_segments=6, representation="medoid"),
)
n_compact = result.n_clusters * result.n_timesteps_per_period
print(f"Cluster representatives: {dict(result.cluster_representatives.sizes)}")
print(f"Data reduction: {da.sizes['time']} -> {n_compact} values")
result.cluster_representatives.to_dataframe("value").head(10)

Cluster representatives: {'cluster': 4, 'timestep': 6, 'variable': 3}
Data reduction: 720 -> 24 values

Out[2]:

			value
cluster	timestep	variable
0	0	demand	0.259442
		solar	0.002530
		wind	0.341894
	1	demand	0.568183
		solar	0.000000
		wind	0.479922
	2	demand	0.750693
		solar	0.055802
		wind	0.523210
	3	demand	0.823226

In [3]:

Copied!





result.cluster_representatives.plotly.line(
    line_shape="hv",
    x="timestep",
    facet_col="variable",
    color="cluster",
    title="Segmented cluster representatives (6 segments per day)",
)
result.cluster_representatives.plotly.line(
    line_shape="hv",
    x="timestep",
    facet_col="variable",
    color="cluster",
    title="Segmented cluster representatives (6 segments per day)",
)

Segment durations¶

Each segment spans a different number of original timesteps. segment_durations tells you how many hours each segment represents.

In [4]:

Copied!

result.segment_durations.to_dataframe("hours")
result.segment_durations.to_dataframe("hours")

Out[4]:

		hours
cluster	timestep
0	0	4
	1	4
	2	2
	3	5
	4	2
	5	7
1	0	3
	1	4
	2	3
	3	6
	4	3
	5	5
2	0	4
	1	3
	2	3
	3	4
	4	3
	5	7
3	0	8
	1	3
	2	4
	3	1
	4	5
	5	3

Durations sum to 24 (hours per day) for each cluster:

In [5]:

Copied!

result.segment_durations.sum(dim="timestep").to_dataframe("total_hours")
result.segment_durations.sum(dim="timestep").to_dataframe("total_hours")

Out[5]:

	total_hours
cluster
0	24
1	24
2	24
3	24

Accuracy¶

In [6]:

Copied!

result.accuracy.rmse.to_dataframe("RMSE")
result.accuracy.rmse.to_dataframe("RMSE")

Out[6]:

	RMSE
variable
demand	0.112650
solar	0.117669
wind	0.152176

Disaggregate¶

With segmentation, disaggregate() places values at segment boundaries and fills the rest with NaN. You choose how to fill.

In [7]:

Copied!

dis = result.disaggregate(result.cluster_representatives)
print(f"NaN values: {int(dis.isnull().sum())} / {dis.size}")
dis.sel(variable="solar").to_dataframe("value").head(30)
dis = result.disaggregate(result.cluster_representatives)
print(f"NaN values: {int(dis.isnull().sum())} / {dis.size}")
dis.sel(variable="solar").to_dataframe("value").head(30)

NaN values: 1620 / 2160

Out[7]:

	variable	value
time
2020-01-01 00:00:00	solar	0.012059
2020-01-01 01:00:00	solar	NaN
2020-01-01 02:00:00	solar	NaN
2020-01-01 03:00:00	solar	0.000000
2020-01-01 04:00:00	solar	NaN
2020-01-01 05:00:00	solar	NaN
2020-01-01 06:00:00	solar	NaN
2020-01-01 07:00:00	solar	0.049799
2020-01-01 08:00:00	solar	NaN
2020-01-01 09:00:00	solar	NaN
2020-01-01 10:00:00	solar	0.106954
2020-01-01 11:00:00	solar	NaN
2020-01-01 12:00:00	solar	NaN
2020-01-01 13:00:00	solar	NaN
2020-01-01 14:00:00	solar	NaN
2020-01-01 15:00:00	solar	NaN
2020-01-01 16:00:00	solar	0.028507
2020-01-01 17:00:00	solar	NaN
2020-01-01 18:00:00	solar	NaN
2020-01-01 19:00:00	solar	0.025895
2020-01-01 20:00:00	solar	NaN
2020-01-01 21:00:00	solar	NaN
2020-01-01 22:00:00	solar	NaN
2020-01-01 23:00:00	solar	NaN
2020-01-02 00:00:00	solar	0.012059
2020-01-02 01:00:00	solar	NaN
2020-01-02 02:00:00	solar	NaN
2020-01-02 03:00:00	solar	0.000000
2020-01-02 04:00:00	solar	NaN
2020-01-02 05:00:00	solar	NaN

Forward-fill for rate variables (power, temperature)¶

In [8]:

Copied!





filled = dis.ffill(dim="time")

comparison = xr.concat(
    [da.sel(variable="solar"), filled.sel(variable="solar")],
    dim="source",
).assign_coords(source=["original", "disaggregated + ffill"])
comparison.plotly.line(
    x="time",
    color="source",
    title="Segmented disaggregation vs original (solar)",
)
filled = dis.ffill(dim="time")

comparison = xr.concat(
    [da.sel(variable="solar"), filled.sel(variable="solar")],
    dim="source",
).assign_coords(source=["original", "disaggregated + ffill"])
comparison.plotly.line(
    x="time",
    color="source",
    title="Segmented disaggregation vs original (solar)",
)

Comparison: with and without segmentation¶

In [9]:

Copied!





result_noseg = tsam_xarray.aggregate(
    da,
    time_dim="time",
    cluster_dim="variable",
    n_clusters=4,
)

comparison = xr.concat(
    [
        da.sel(variable="solar"),
        result_noseg.reconstructed.sel(variable="solar"),
        filled.sel(variable="solar"),
    ],
    dim="source",
).assign_coords(source=["original", "4 clusters (no seg)", "4 clusters x 6 segments"])
comparison.plotly.line(
    x="time",
    color="source",
    title="Effect of segmentation on reconstruction quality (solar)",
)
result_noseg = tsam_xarray.aggregate(
    da,
    time_dim="time",
    cluster_dim="variable",
    n_clusters=4,
)

comparison = xr.concat(
    [
        da.sel(variable="solar"),
        result_noseg.reconstructed.sel(variable="solar"),
        filled.sel(variable="solar"),
    ],
    dim="source",
).assign_coords(source=["original", "4 clusters (no seg)", "4 clusters x 6 segments"])
comparison.plotly.line(
    x="time",
    color="source",
    title="Effect of segmentation on reconstruction quality (solar)",
)