Getting Started¶

tsam_xarray wraps tsam for xarray DataArrays. This notebook shows the basic workflow.

Sample data¶

tsam_xarray includes sample energy data with realistic profiles for documentation.

In [1]:

Copied!





import plotly.io as pio
import xarray_plotly  # noqa: F401 — registers .plotly accessor

import tsam_xarray
from tsam_xarray._sample_data import sample_energy_data

pio.renderers.default = "notebook_connected"

da = sample_energy_data(n_days=30)
print(f"Shape: {dict(da.sizes)}")
da.sel(region="north", scenario="low").plotly.line(
    x="time", color="variable", title="Input data (north, low)"
)
import plotly.io as pio
import xarray_plotly  # noqa: F401 — registers .plotly accessor

import tsam_xarray
from tsam_xarray._sample_data import sample_energy_data

pio.renderers.default = "notebook_connected"

da = sample_energy_data(n_days=30)
print(f"Shape: {dict(da.sizes)}")
da.sel(region="north", scenario="low").plotly.line(
    x="time", color="variable", title="Input data (north, low)"
)

Shape: {'time': 720, 'variable': 3, 'region': 3, 'scenario': 2}

In [2]:

Copied!

da.sel(region="north", scenario="low").to_dataframe("value").head()
da.sel(region="north", scenario="low").to_dataframe("value").head()

Out[2]:

		region	scenario	value
time	variable
2020-01-01 00:00:00	solar	north	low	0.022417
	wind	north	low	0.997077
	demand	north	low	0.187212
2020-01-01 01:00:00	solar	north	low	0.001815
2020-01-01 01:00:00	wind	north	low	0.982863

Aggregate¶

For a (time, variable) array, cluster_dim is auto-detected.

In [3]:

Copied!





da_simple = da.sel(region="north", scenario="low")

result = tsam_xarray.aggregate(
    da_simple,
    time_dim="time",
    cluster_dim="variable",
    n_clusters=4,
)
result.cluster_representatives.to_dataframe("value").head(10)
da_simple = da.sel(region="north", scenario="low")

result = tsam_xarray.aggregate(
    da_simple,
    time_dim="time",
    cluster_dim="variable",
    n_clusters=4,
)
result.cluster_representatives.to_dataframe("value").head(10)

Out[3]:

			value
cluster	timestep	variable
0	0	demand	0.156564
		solar	0.000000
		wind	0.405613
	1	demand	0.259442
		solar	0.002530
		wind	0.341894
	2	demand	0.338327
		solar	0.012811
		wind	0.381879
	3	demand	0.306719

In [4]:

Copied!





result.cluster_representatives.plotly.line(
    line_shape="hv",
    x="timestep",
    facet_col="variable",
    color="cluster",
    title="Cluster representatives",
)
result.cluster_representatives.plotly.line(
    line_shape="hv",
    x="timestep",
    facet_col="variable",
    color="cluster",
    title="Cluster representatives",
)

Inspect results¶

The result contains xarray-native fields.

In [5]:

Copied!





print(f"Clusters: {result.n_clusters}")
print(f"Timesteps per period: {result.n_timesteps_per_period}")
print("Cluster counts (days each represents):")
result.cluster_counts.to_dataframe("count")
print(f"Clusters: {result.n_clusters}")
print(f"Timesteps per period: {result.n_timesteps_per_period}")
print("Cluster counts (days each represents):")
result.cluster_counts.to_dataframe("count")

Clusters: 4
Timesteps per period: 24
Cluster counts (days each represents):

Out[5]:

	count
cluster
0	13
1	8
2	5
3	4

In [6]:

Copied!

result.accuracy.rmse.to_dataframe("RMSE")
result.accuracy.rmse.to_dataframe("RMSE")

Out[6]:

	RMSE
variable
demand	0.073805
solar	0.112657
wind	0.156366

Reconstructed vs original¶

result.compare() stacks original and reconstructed along a variant dimension (on the original time axis), so the comparison plots directly with color="variant" — no manual concat/melt. Pass a coordinate selection (e.g. variable="solar") to focus on one column. result.original and result.reconstructed share the same dim order, so they always line up.

In [7]:

Copied!

result.compare(variable="solar").plotly.line(
    x="time", color="variant", title="Original vs reconstructed (solar)"
)
result.compare(variable="solar").plotly.line(
    x="time", color="variant", title="Original vs reconstructed (solar)"
)

In [8]:

Copied!

result.residuals.plotly.line(x="time", facet_col="variable", title="Residuals")
result.residuals.plotly.line(x="time", facet_col="variable", title="Residuals")

Because to_dataframe() returns tidy long-form data, richer comparisons are a few lines in the plotting library of your choice — no bespoke helper needed. For example, load-duration curves (each series sorted descending, Original dotted / Reconstructed solid):

In [9]:

Copied!





tidy = result.to_dataframe()
tidy["rank"] = (
    tidy.groupby(["variant", "variable"])["energy"]
    .rank(method="first", ascending=False)
    .astype(int)
)
duration = tidy.set_index(["rank", "variant", "variable"])["energy"].to_xarray()
duration.plotly.line(
    x="rank",
    color="variable",
    line_dash="variant",
    line_dash_map={"original": "dot", "reconstructed": "solid"},
    title="Load-duration curves: original vs reconstructed",
)
tidy = result.to_dataframe()
tidy["rank"] = (
    tidy.groupby(["variant", "variable"])["energy"]
    .rank(method="first", ascending=False)
    .astype(int)
)
duration = tidy.set_index(["rank", "variant", "variable"])["energy"].to_xarray()
duration.plotly.line(
    x="rank",
    color="variable",
    line_dash="variant",
    line_dash_map={"original": "dot", "reconstructed": "solid"},
    title="Load-duration curves: original vs reconstructed",
)

Passing tsam parameters¶

All tsam.aggregate() keyword arguments pass through.

In [10]:

Copied!





from tsam import ClusterConfig

result_km = tsam_xarray.aggregate(
    da_simple,
    time_dim="time",
    cluster_dim="variable",
    n_clusters=4,
    cluster=ClusterConfig(method="kmeans"),
)
result_km.accuracy.rmse.to_dataframe("RMSE")
from tsam import ClusterConfig

result_km = tsam_xarray.aggregate(
    da_simple,
    time_dim="time",
    cluster_dim="variable",
    n_clusters=4,
    cluster=ClusterConfig(method="kmeans"),
)
result_km.accuracy.rmse.to_dataframe("RMSE")

Out[10]:

	RMSE
variable
demand	0.055077
solar	0.081574
wind	0.127160

Disaggregate¶

disaggregate() is the inverse of aggregate() — it maps any data on the (cluster, timestep) grid back to the original time axis.

In [11]:

Copied!





import xarray as xr

disaggregated = result.disaggregate(result.cluster_representatives)

comparison = xr.concat(
    [da_simple.sel(variable="solar"), disaggregated.sel(variable="solar")],
    dim="source",
).assign_coords(source=["original", "disaggregated"])
comparison.plotly.line(
    x="time", color="source", title="Disaggregated vs original (solar)"
)
import xarray as xr

disaggregated = result.disaggregate(result.cluster_representatives)

comparison = xr.concat(
    [da_simple.sel(variable="solar"), disaggregated.sel(variable="solar")],
    dim="source",
).assign_coords(source=["original", "disaggregated"])
comparison.plotly.line(
    x="time", color="source", title="Disaggregated vs original (solar)"
)