API Reference¶

tsam_xarray ¶

tsam_xarray: Lightweight xarray wrapper for tsam time series aggregation.

ClusteringInfo `module-attribute` ¶

ClusteringInfo = ClusteringResult

Backwards-compatible alias for :class:ClusteringResult.

ClusteringResult `dataclass` ¶

Reusable clustering result with xarray dimension metadata.

Wraps one or more tsam ClusteringResult objects alongside the dimension names needed to apply the clustering to new data. Exposes clustering metadata as cached xarray DataArrays.

Attributes:

Name	Type	Description
`time_dim`	`str`	Name of the time dimension.
`cluster_dim`	`list[str]`	Dimension(s) clustered together.
`slice_dims`	`list[str]`	Dimension(s) aggregated independently.
`clusterings`	`dict[tuple[Hashable, ...], ClusteringResult]`	Per-slice tsam clustering. Single entry `{(): result}` when no slicing.
`n_clusters`	`int`	Number of clusters.
`n_original_periods`	`int`	Number of original periods.
`n_timesteps_per_period`	`int`	Timesteps per period.
`n_segments`	`int \| None`	Segments per period, or `None`.
`cluster_assignments`	`DataArray`	Cluster ID per period. Dims: `(period, *slice_dims)`.
`cluster_occurrences`	`DataArray`	Periods per cluster. Dims: `(cluster, *slice_dims)`.
`cluster_centers`	`DataArray`	Representative period per cluster. Dims: `(cluster, *slice_dims)`.
`segment_durations`	`DataArray \| None`	Duration per segment, or `None`. Dims: `(cluster, timestep, *slice_dims)`.
`segment_assignments`	`DataArray \| None`	Segment ID per timestep, or `None`. Dims: `(cluster, timestep, *slice_dims)`.
`segment_centers`	`DataArray \| None`	Representative timestep per segment, or `None`. Dims: `(cluster, segment, *slice_dims)`.
`dim_names`	`DimNames`	Names of the structural output dimensions. See `DimNames`.

Source code in src/tsam_xarray/_clustering.py

@dataclass(frozen=True, repr=False)
class ClusteringResult:
    """Reusable clustering result with xarray dimension metadata.

    Wraps one or more tsam ``ClusteringResult`` objects alongside
    the dimension names needed to apply the clustering to new data.
    Exposes clustering metadata as cached xarray DataArrays.

    Attributes:
        time_dim: Name of the time dimension.
        cluster_dim: Dimension(s) clustered together.
        slice_dims: Dimension(s) aggregated independently.
        clusterings: Per-slice tsam clustering.
            Single entry ``{(): result}`` when no slicing.
        n_clusters: Number of clusters.
        n_original_periods: Number of original periods.
        n_timesteps_per_period: Timesteps per period.
        n_segments: Segments per period, or ``None``.
        cluster_assignments: Cluster ID per period.
            Dims: ``(period, *slice_dims)``.
        cluster_occurrences: Periods per cluster.
            Dims: ``(cluster, *slice_dims)``.
        cluster_centers: Representative period per cluster.
            Dims: ``(cluster, *slice_dims)``.
        segment_durations: Duration per segment, or ``None``.
            Dims: ``(cluster, timestep, *slice_dims)``.
        segment_assignments: Segment ID per timestep, or
            ``None``. Dims: ``(cluster, timestep,
            *slice_dims)``.
        segment_centers: Representative timestep per segment,
            or ``None``.
            Dims: ``(cluster, segment, *slice_dims)``.
        dim_names: Names of the structural output dimensions.
            See `DimNames`.
    """

    time_dim: str
    cluster_dim: list[str]
    slice_dims: list[str]
    clusterings: dict[tuple[Hashable, ...], tsam.ClusteringResult]
    dim_names: DimNames = field(default_factory=DimNames)
    _cache: dict[str, Any] = field(
        default_factory=dict, repr=False, init=False, compare=False
    )

    def __repr__(self) -> str:
        seg = f", n_segments={self.n_segments}" if self.n_segments else ""
        slices = f", slice_dims={self.slice_dims}" if self.slice_dims else ""
        return (
            f"ClusteringResult("
            f"n_clusters={self.n_clusters}, "
            f"n_periods={self.n_original_periods}, "
            f"timesteps_per_period={self.n_timesteps_per_period}, "
            f"time_dim={self.time_dim!r}, "
            f"cluster_dim={self.cluster_dim}"
            f"{slices}{seg})"
        )

    # -- scalar accessors (uniform across slices) --

    @property
    def n_clusters(self) -> int:
        """Number of clusters."""
        return next(iter(self.clusterings.values())).n_clusters

    @property
    def n_original_periods(self) -> int:
        """Number of original periods (e.g., days)."""
        return next(iter(self.clusterings.values())).n_original_periods

    @property
    def n_timesteps_per_period(self) -> int:
        """Number of timesteps per period (e.g., 24 for hourly with daily periods)."""
        return next(iter(self.clusterings.values())).n_timesteps_per_period

    @property
    def n_segments(self) -> int | None:
        """Number of segments per period, or None if no segmentation."""
        return next(iter(self.clusterings.values())).n_segments

    # -- DataArray properties (cached, concatenated across slices) --

    @property
    def _slice_coords(self) -> dict[str, Any]:
        """Reconstruct slice coordinates from clusterings keys."""
        if not self.slice_dims:
            return {}
        keys = list(self.clusterings.keys())
        return {
            dim: list(dict.fromkeys(k[i] for k in keys))
            for i, dim in enumerate(self.slice_dims)
        }

    @property
    def cluster_assignments(self) -> xr.DataArray:
        """Cluster assignment for each period, as DataArray.

        Dims: ``(period, *slice_dims)``.
        """
        if "cluster_assignments" not in self._cache:
            self._cache["cluster_assignments"] = self._build_assignments()
        result: xr.DataArray = self._cache["cluster_assignments"]
        return result

    def _build_assignments(self) -> xr.DataArray:
        if not self.slice_dims:
            cr = self.clusterings[()]
            return xr.DataArray(
                list(cr.cluster_assignments), dims=[self.dim_names.period]
            )

        import itertools

        sc = self._slice_coords
        keys = list(itertools.product(*(sc[d] for d in self.slice_dims)))
        arrays = [
            xr.DataArray(
                list(self.clusterings[k].cluster_assignments),
                dims=[self.dim_names.period],
            )
            for k in keys
        ]
        return _concat_along_dims(arrays, self.slice_dims, sc)

    @property
    def cluster_occurrences(self) -> xr.DataArray:
        """Number of periods assigned to each cluster.

        Dims: ``(cluster, *slice_dims)``.
        """
        if "cluster_occurrences" not in self._cache:
            self._cache["cluster_occurrences"] = self._build_occurrences()
        result: xr.DataArray = self._cache["cluster_occurrences"]
        return result

    def _build_occurrences(self) -> xr.DataArray:
        def _single(cr: tsam.ClusteringResult) -> xr.DataArray:
            counts = np.bincount(cr.cluster_assignments, minlength=cr.n_clusters)
            return xr.DataArray(
                counts,
                dims=[self.dim_names.cluster],
                coords={self.dim_names.cluster: np.arange(cr.n_clusters)},
            )

        if not self.slice_dims:
            return _single(self.clusterings[()])

        import itertools

        sc = self._slice_coords
        keys = list(itertools.product(*(sc[d] for d in self.slice_dims)))
        arrays = [_single(self.clusterings[k]) for k in keys]
        return _concat_along_dims(arrays, self.slice_dims, sc)

    @property
    def segment_durations(self) -> xr.DataArray | None:
        """Duration of each segment per cluster, or None if no segmentation.

        Dims: ``(cluster, timestep, *slice_dims)``.
        """
        if "segment_durations" not in self._cache:
            self._cache["segment_durations"] = self._build_segment_durations()
        result: xr.DataArray | None = self._cache["segment_durations"]
        return result

    def _build_segment_durations(self) -> xr.DataArray | None:
        if not self.slice_dims:
            return _segment_durations_to_da(
                self.clusterings[()].segment_durations, self.dim_names
            )

        import itertools

        sc = self._slice_coords
        keys = list(itertools.product(*(sc[d] for d in self.slice_dims)))
        first = _segment_durations_to_da(
            self.clusterings[keys[0]].segment_durations, self.dim_names
        )
        if first is None:
            return None
        das: list[xr.DataArray] = [first]
        for k in keys[1:]:
            da = _segment_durations_to_da(
                self.clusterings[k].segment_durations, self.dim_names
            )
            if da is None:
                msg = (
                    f"Slice {k} has no segment durations but the first "
                    f"slice does. Segmentation must be uniform across slices."
                )
                raise ValueError(msg)
            das.append(da)
        return _concat_along_dims(das, self.slice_dims, sc)

    @property
    def cluster_centers(self) -> xr.DataArray:
        """Representative period index for each cluster.

        Dims: ``(cluster, *slice_dims)``.
        """
        if "cluster_centers" not in self._cache:
            self._cache["cluster_centers"] = self._build_cluster_centers()
        result: xr.DataArray = self._cache["cluster_centers"]
        return result

    def _build_cluster_centers(self) -> xr.DataArray:
        def _single(cr: tsam.ClusteringResult) -> xr.DataArray:
            centers = cr.cluster_centers
            if centers is None:
                msg = "No cluster centers available."
                raise ValueError(msg)
            return xr.DataArray(
                list(centers),
                dims=[self.dim_names.cluster],
                coords={self.dim_names.cluster: np.arange(cr.n_clusters)},
            )

        if not self.slice_dims:
            return _single(self.clusterings[()])

        import itertools

        sc = self._slice_coords
        keys = list(itertools.product(*(sc[d] for d in self.slice_dims)))
        arrays = [_single(self.clusterings[k]) for k in keys]
        return _concat_along_dims(arrays, self.slice_dims, sc)

    @property
    def segment_assignments(self) -> xr.DataArray | None:
        """Segment assignment for each timestep per cluster, or None.

        Dims: ``(cluster, timestep, *slice_dims)``.
        """
        if "segment_assignments" not in self._cache:
            self._cache["segment_assignments"] = self._build_segment_assignments()
        result: xr.DataArray | None = self._cache["segment_assignments"]
        return result

    def _build_segment_assignments(self) -> xr.DataArray | None:
        def _single(cr: tsam.ClusteringResult) -> xr.DataArray | None:
            if cr.segment_assignments is None:
                return None
            return xr.DataArray(
                np.array(cr.segment_assignments),
                dims=[self.dim_names.cluster, self.dim_names.timestep],
                coords={
                    self.dim_names.cluster: np.arange(cr.n_clusters),
                    self.dim_names.timestep: np.arange(cr.n_timesteps_per_period),
                },
            )

        if not self.slice_dims:
            return _single(self.clusterings[()])

        import itertools

        sc = self._slice_coords
        keys = list(itertools.product(*(sc[d] for d in self.slice_dims)))
        first = _single(self.clusterings[keys[0]])
        if first is None:
            return None
        das: list[xr.DataArray] = [first]
        for k in keys[1:]:
            da = _single(self.clusterings[k])
            if da is None:
                msg = (
                    f"Slice {k} has no segment assignments but the first "
                    f"slice does. Segmentation must be uniform across slices."
                )
                raise ValueError(msg)
            das.append(da)
        return _concat_along_dims(das, self.slice_dims, sc)

    @property
    def segment_centers(self) -> xr.DataArray | None:
        """Representative timestep index for each segment per cluster, or None.

        Dims: ``(cluster, segment, *slice_dims)``.
        """
        if "segment_centers" not in self._cache:
            self._cache["segment_centers"] = self._build_segment_centers()
        result: xr.DataArray | None = self._cache["segment_centers"]
        return result

    def _build_segment_centers(self) -> xr.DataArray | None:
        def _single(cr: tsam.ClusteringResult) -> xr.DataArray | None:
            if cr.segment_centers is None:
                return None
            n_segments = cr.n_segments or len(cr.segment_centers[0])
            return xr.DataArray(
                np.array(cr.segment_centers),
                dims=[self.dim_names.cluster, self.dim_names.segment],
                coords={
                    self.dim_names.cluster: np.arange(cr.n_clusters),
                    self.dim_names.segment: np.arange(n_segments),
                },
            )

        if not self.slice_dims:
            return _single(self.clusterings[()])

        import itertools

        sc = self._slice_coords
        keys = list(itertools.product(*(sc[d] for d in self.slice_dims)))
        first = _single(self.clusterings[keys[0]])
        if first is None:
            return None
        das: list[xr.DataArray] = [first]
        for k in keys[1:]:
            da = _single(self.clusterings[k])
            if da is None:
                msg = (
                    f"Slice {k} has no segment centers but the first "
                    f"slice does. Segmentation must be uniform across slices."
                )
                raise ValueError(msg)
            das.append(da)
        return _concat_along_dims(das, self.slice_dims, sc)

    def apply(
        self,
        da: xr.DataArray,
        *,
        time_dim: str | None = None,
        cluster_dim: Sequence[str] | str | None = None,
        **tsam_kwargs: Any,
    ) -> Any:
        """Apply this clustering to new data.

        Args:
            da: New data with compatible time dimension
                length.
            time_dim: Time dimension name. Defaults to the
                stored value.
            cluster_dim: Cluster dimension(s). Defaults to the
                stored value. Can differ from the original if
                the new data has different dimension names.
            **tsam_kwargs: Additional keyword arguments passed
                to ``ClusteringResult.apply()``.

        Returns:
            Aggregation result using the stored clustering.
        """
        from tsam_xarray._result import AggregationResult

        td = time_dim if time_dim is not None else self.time_dim
        cd = (
            _resolve_cluster_dim(cluster_dim)
            if cluster_dim is not None
            else self.cluster_dim
        )

        _validate_apply(da, td, cd, self.slice_dims, self.clusterings)

        # Use stored slice_dims for canonical ordering
        slice_dims = self.slice_dims

        if not slice_dims:
            cr = self.clusterings[()]
            return _apply_single(da, cr, td, cd, tsam_kwargs, self.dim_names)

        import itertools

        slice_coords: dict[str, Any] = {d: da.coords[d].values for d in slice_dims}
        slice_keys = list(itertools.product(*(slice_coords[d] for d in slice_dims)))

        results: list[AggregationResult] = []

        for key in slice_keys:
            sel = dict(zip(slice_dims, key, strict=True))
            da_slice = da.sel(sel)
            cr = _lookup_clustering(self.clusterings, key)
            r = _apply_single(da_slice, cr, td, cd, tsam_kwargs, self.dim_names)
            results.append(r)

        return _concat_results(results, slice_dims, slice_coords, slice_keys)

    def disaggregate(self, data: xr.DataArray) -> xr.DataArray:
        """Map data on ``(cluster, timestep)`` back to original time.

        This is the inverse of ``aggregate()``. Use it to expand
        data computed on the compact cluster-representative grid
        (e.g., optimization results) back to the full time axis.

        Unlike ``AggregationResult.disaggregate()``, this method
        works on a ``ClusteringInfo`` loaded from JSON — no
        original data needed.

        Args:
            data: Data with ``cluster`` and ``timestep`` dims,
                matching the shape of the original cluster
                representatives. Additional dims (including
                auto-sliced dims like scenario) are supported.

        Returns:
            Data with ``cluster`` and ``timestep`` replaced by
            the original ``time`` dimension.
        """
        slice_dims = self.slice_dims
        if not slice_dims:
            return _disaggregate_single(self.clusterings[()], data, self.dim_names)

        import itertools

        slice_coords = {d: data.coords[d].values for d in slice_dims}
        keys = list(itertools.product(*(slice_coords[d] for d in slice_dims)))
        results = []
        for key in keys:
            sel = dict(zip(slice_dims, key, strict=True))
            data_slice = data.sel(sel)
            cr = _lookup_clustering(self.clusterings, key)
            results.append(_disaggregate_single(cr, data_slice, self.dim_names))

        return _concat_along_dims(results, slice_dims, slice_coords)

    def to_dict(self) -> dict[str, Any]:
        """Serialize clustering to a dictionary.

        Returns:
            Plain dict suitable for ``json.dump()`` or
            storage in databases, APIs, etc.
        """
        entries = []
        for key, cr in self.clusterings.items():
            entries.append(
                {
                    "key": list(_native_key(key)),
                    "clustering": cr.to_dict(),
                }
            )
        return {
            "time_dim": self.time_dim,
            "cluster_dim": self.cluster_dim,
            "slice_dims": self.slice_dims,
            "dim_names": {
                "cluster": self.dim_names.cluster,
                "timestep": self.dim_names.timestep,
                "period": self.dim_names.period,
                "segment": self.dim_names.segment,
            },
            "clusterings": entries,
        }

    def to_json(self, path: str | Path, **json_kwargs: Any) -> None:
        """Save clustering to JSON file.

        Args:
            path: Output file path.
            **json_kwargs: Additional keyword arguments passed
                to ``json.dump()``. Default: ``indent=2``.
        """
        with Path(path).open("w") as f:
            json.dump(self.to_dict(), f, **json_kwargs)

    @classmethod
    def from_dict(cls, data: dict[str, Any]) -> ClusteringResult:
        """Load clustering from a dictionary.

        Args:
            data: Dict as returned by :meth:`to_dict`.

        Returns:
            The loaded ``ClusteringResult``.
        """
        # Backcompat: pre-0.6 wrappers stored the time index as an outer
        # ``time_coords`` key while the inner tsam blob (written by tsam<3.4)
        # had no ``time_index``. Forward it so disaggregate keeps datetimes.
        if "time_coords" in data:
            import warnings

            warnings.warn(
                "Loading a legacy tsam_xarray JSON with an outer 'time_coords' "
                "field; re-save with to_json() to silence this warning.",
                DeprecationWarning,
                stacklevel=2,
            )
            for entry in data["clusterings"]:
                entry["clustering"].setdefault("time_index", data["time_coords"])

        clusterings: dict[tuple[Hashable, ...], tsam.ClusteringResult] = {}
        for entry in data["clusterings"]:
            key = tuple(entry["key"])
            clusterings[key] = tsam.ClusteringResult.from_dict(entry["clustering"])

        dim_names_data = data.get("dim_names")
        dim_names = DimNames(**dim_names_data) if dim_names_data else DimNames()

        return cls(
            time_dim=data["time_dim"],
            cluster_dim=data["cluster_dim"],
            slice_dims=data.get("slice_dims", []),
            clusterings=clusterings,
            dim_names=dim_names,
        )

    @classmethod
    def from_json(cls, path: str | Path) -> ClusteringResult:
        """Load clustering from JSON file.

        Args:
            path: Input file path.

        Returns:
            The loaded ``ClusteringResult``.
        """
        with Path(path).open() as f:
            return cls.from_dict(json.load(f))

n_clusters `property` ¶

n_clusters: int

Number of clusters.

n_original_periods `property` ¶

n_original_periods: int

Number of original periods (e.g., days).

n_timesteps_per_period `property` ¶

n_timesteps_per_period: int

Number of timesteps per period (e.g., 24 for hourly with daily periods).

n_segments `property` ¶

n_segments: int | None

Number of segments per period, or None if no segmentation.

cluster_assignments `property` ¶

cluster_assignments: DataArray

Cluster assignment for each period, as DataArray.

Dims: (period, *slice_dims).

cluster_occurrences `property` ¶

cluster_occurrences: DataArray

Number of periods assigned to each cluster.

Dims: (cluster, *slice_dims).

segment_durations `property` ¶

segment_durations: DataArray | None

Duration of each segment per cluster, or None if no segmentation.

Dims: (cluster, timestep, *slice_dims).

cluster_centers `property` ¶

cluster_centers: DataArray

Representative period index for each cluster.

Dims: (cluster, *slice_dims).

segment_assignments `property` ¶

segment_assignments: DataArray | None

Segment assignment for each timestep per cluster, or None.

Dims: (cluster, timestep, *slice_dims).

segment_centers `property` ¶

segment_centers: DataArray | None

Representative timestep index for each segment per cluster, or None.

Dims: (cluster, segment, *slice_dims).

apply ¶

apply(
    da: DataArray,
    *,
    time_dim: str | None = None,
    cluster_dim: Sequence[str] | str | None = None,
    **tsam_kwargs: Any,
) -> Any

Apply this clustering to new data.

Parameters:

Name	Type	Description	Default
`da`	`DataArray`	New data with compatible time dimension length.	required
`time_dim`	`str \| None`	Time dimension name. Defaults to the stored value.	`None`
`cluster_dim`	`Sequence[str] \| str \| None`	Cluster dimension(s). Defaults to the stored value. Can differ from the original if the new data has different dimension names.	`None`
`**tsam_kwargs`	`Any`	Additional keyword arguments passed to `ClusteringResult.apply()`.	`{}`

Returns:

Type	Description
`Any`	Aggregation result using the stored clustering.

Source code in src/tsam_xarray/_clustering.py

def apply(
    self,
    da: xr.DataArray,
    *,
    time_dim: str | None = None,
    cluster_dim: Sequence[str] | str | None = None,
    **tsam_kwargs: Any,
) -> Any:
    """Apply this clustering to new data.

    Args:
        da: New data with compatible time dimension
            length.
        time_dim: Time dimension name. Defaults to the
            stored value.
        cluster_dim: Cluster dimension(s). Defaults to the
            stored value. Can differ from the original if
            the new data has different dimension names.
        **tsam_kwargs: Additional keyword arguments passed
            to ``ClusteringResult.apply()``.

    Returns:
        Aggregation result using the stored clustering.
    """
    from tsam_xarray._result import AggregationResult

    td = time_dim if time_dim is not None else self.time_dim
    cd = (
        _resolve_cluster_dim(cluster_dim)
        if cluster_dim is not None
        else self.cluster_dim
    )

    _validate_apply(da, td, cd, self.slice_dims, self.clusterings)

    # Use stored slice_dims for canonical ordering
    slice_dims = self.slice_dims

    if not slice_dims:
        cr = self.clusterings[()]
        return _apply_single(da, cr, td, cd, tsam_kwargs, self.dim_names)

    import itertools

    slice_coords: dict[str, Any] = {d: da.coords[d].values for d in slice_dims}
    slice_keys = list(itertools.product(*(slice_coords[d] for d in slice_dims)))

    results: list[AggregationResult] = []

    for key in slice_keys:
        sel = dict(zip(slice_dims, key, strict=True))
        da_slice = da.sel(sel)
        cr = _lookup_clustering(self.clusterings, key)
        r = _apply_single(da_slice, cr, td, cd, tsam_kwargs, self.dim_names)
        results.append(r)

    return _concat_results(results, slice_dims, slice_coords, slice_keys)

disaggregate ¶

disaggregate(data: DataArray) -> xr.DataArray

Map data on (cluster, timestep) back to original time.

This is the inverse of aggregate(). Use it to expand data computed on the compact cluster-representative grid (e.g., optimization results) back to the full time axis.

Unlike AggregationResult.disaggregate(), this method works on a ClusteringInfo loaded from JSON — no original data needed.

Parameters:

Name	Type	Description	Default
`data`	`DataArray`	Data with `cluster` and `timestep` dims, matching the shape of the original cluster representatives. Additional dims (including auto-sliced dims like scenario) are supported.	required

Returns:

Type	Description
`DataArray`	Data with `cluster` and `timestep` replaced by
`DataArray`	the original `time` dimension.

Source code in src/tsam_xarray/_clustering.py

def disaggregate(self, data: xr.DataArray) -> xr.DataArray:
    """Map data on ``(cluster, timestep)`` back to original time.

    This is the inverse of ``aggregate()``. Use it to expand
    data computed on the compact cluster-representative grid
    (e.g., optimization results) back to the full time axis.

    Unlike ``AggregationResult.disaggregate()``, this method
    works on a ``ClusteringInfo`` loaded from JSON — no
    original data needed.

    Args:
        data: Data with ``cluster`` and ``timestep`` dims,
            matching the shape of the original cluster
            representatives. Additional dims (including
            auto-sliced dims like scenario) are supported.

    Returns:
        Data with ``cluster`` and ``timestep`` replaced by
        the original ``time`` dimension.
    """
    slice_dims = self.slice_dims
    if not slice_dims:
        return _disaggregate_single(self.clusterings[()], data, self.dim_names)

    import itertools

    slice_coords = {d: data.coords[d].values for d in slice_dims}
    keys = list(itertools.product(*(slice_coords[d] for d in slice_dims)))
    results = []
    for key in keys:
        sel = dict(zip(slice_dims, key, strict=True))
        data_slice = data.sel(sel)
        cr = _lookup_clustering(self.clusterings, key)
        results.append(_disaggregate_single(cr, data_slice, self.dim_names))

    return _concat_along_dims(results, slice_dims, slice_coords)

to_dict ¶

to_dict() -> dict[str, Any]

Serialize clustering to a dictionary.

Returns:

Type	Description
`dict[str, Any]`	Plain dict suitable for `json.dump()` or
`dict[str, Any]`	storage in databases, APIs, etc.

Source code in src/tsam_xarray/_clustering.py

def to_dict(self) -> dict[str, Any]:
    """Serialize clustering to a dictionary.

    Returns:
        Plain dict suitable for ``json.dump()`` or
        storage in databases, APIs, etc.
    """
    entries = []
    for key, cr in self.clusterings.items():
        entries.append(
            {
                "key": list(_native_key(key)),
                "clustering": cr.to_dict(),
            }
        )
    return {
        "time_dim": self.time_dim,
        "cluster_dim": self.cluster_dim,
        "slice_dims": self.slice_dims,
        "dim_names": {
            "cluster": self.dim_names.cluster,
            "timestep": self.dim_names.timestep,
            "period": self.dim_names.period,
            "segment": self.dim_names.segment,
        },
        "clusterings": entries,
    }

to_json ¶

to_json(path: str | Path, **json_kwargs: Any) -> None

Save clustering to JSON file.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Output file path.	required
`**json_kwargs`	`Any`	Additional keyword arguments passed to `json.dump()`. Default: `indent=2`.	`{}`

Source code in src/tsam_xarray/_clustering.py

def to_json(self, path: str | Path, **json_kwargs: Any) -> None:
    """Save clustering to JSON file.

    Args:
        path: Output file path.
        **json_kwargs: Additional keyword arguments passed
            to ``json.dump()``. Default: ``indent=2``.
    """
    with Path(path).open("w") as f:
        json.dump(self.to_dict(), f, **json_kwargs)

from_dict `classmethod` ¶

from_dict(data: dict[str, Any]) -> ClusteringResult

Load clustering from a dictionary.

Parameters:

Name	Type	Description	Default
`data`	`dict[str, Any]`	Dict as returned by :meth:`to_dict`.	required

Returns:

Type	Description
`ClusteringResult`	The loaded `ClusteringResult`.

Source code in src/tsam_xarray/_clustering.py

@classmethod
def from_dict(cls, data: dict[str, Any]) -> ClusteringResult:
    """Load clustering from a dictionary.

    Args:
        data: Dict as returned by :meth:`to_dict`.

    Returns:
        The loaded ``ClusteringResult``.
    """
    # Backcompat: pre-0.6 wrappers stored the time index as an outer
    # ``time_coords`` key while the inner tsam blob (written by tsam<3.4)
    # had no ``time_index``. Forward it so disaggregate keeps datetimes.
    if "time_coords" in data:
        import warnings

        warnings.warn(
            "Loading a legacy tsam_xarray JSON with an outer 'time_coords' "
            "field; re-save with to_json() to silence this warning.",
            DeprecationWarning,
            stacklevel=2,
        )
        for entry in data["clusterings"]:
            entry["clustering"].setdefault("time_index", data["time_coords"])

    clusterings: dict[tuple[Hashable, ...], tsam.ClusteringResult] = {}
    for entry in data["clusterings"]:
        key = tuple(entry["key"])
        clusterings[key] = tsam.ClusteringResult.from_dict(entry["clustering"])

    dim_names_data = data.get("dim_names")
    dim_names = DimNames(**dim_names_data) if dim_names_data else DimNames()

    return cls(
        time_dim=data["time_dim"],
        cluster_dim=data["cluster_dim"],
        slice_dims=data.get("slice_dims", []),
        clusterings=clusterings,
        dim_names=dim_names,
    )

from_json `classmethod` ¶

from_json(path: str | Path) -> ClusteringResult

Load clustering from JSON file.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Input file path.	required

Returns:

Type	Description
`ClusteringResult`	The loaded `ClusteringResult`.

Source code in src/tsam_xarray/_clustering.py

@classmethod
def from_json(cls, path: str | Path) -> ClusteringResult:
    """Load clustering from JSON file.

    Args:
        path: Input file path.

    Returns:
        The loaded ``ClusteringResult``.
    """
    with Path(path).open() as f:
        return cls.from_dict(json.load(f))

DimNames `dataclass` ¶

Names of the structural output dimensions produced by aggregation.

tsam_xarray adds four dimensions to its results that do not exist in the input: the cluster/representative axis, the intra-period timestep axis, the original-period axis (in cluster_assignments), and the segment axis (segmented runs). By default these are cluster, timestep, period, and segment; override them when they would collide with the caller's own dimension names.

Attributes:

Name	Type	Description
`cluster`	`str`	Cluster/representative axis.
`timestep`	`str`	Intra-period timestep axis.
`period`	`str`	Original-period axis (in `cluster_assignments`).
`segment`	`str`	Segment axis (segmented runs).

Source code in src/tsam_xarray/_dim_names.py

@dataclass(frozen=True)
class DimNames:
    """Names of the structural output dimensions produced by aggregation.

    tsam_xarray adds four dimensions to its results that do not exist in the
    input: the cluster/representative axis, the intra-period timestep axis,
    the original-period axis (in ``cluster_assignments``), and the segment
    axis (segmented runs). By default these are ``cluster``, ``timestep``,
    ``period``, and ``segment``; override them when they would collide with
    the caller's own dimension names.

    Attributes:
        cluster: Cluster/representative axis.
        timestep: Intra-period timestep axis.
        period: Original-period axis (in ``cluster_assignments``).
        segment: Segment axis (segmented runs).
    """

    cluster: str = "cluster"
    timestep: str = "timestep"
    period: str = "period"
    segment: str = "segment"

    def __post_init__(self) -> None:
        names = self.as_tuple()
        if len(set(names)) != len(names):
            msg = f"DimNames must be unique, got {names}"
            raise ValueError(msg)

    def as_tuple(self) -> tuple[str, str, str, str]:
        """The four names as a tuple, in declaration order."""
        return (self.cluster, self.timestep, self.period, self.segment)

as_tuple ¶

as_tuple() -> tuple[str, str, str, str]

The four names as a tuple, in declaration order.

Source code in src/tsam_xarray/_dim_names.py

def as_tuple(self) -> tuple[str, str, str, str]:
    """The four names as a tuple, in declaration order."""
    return (self.cluster, self.timestep, self.period, self.segment)

AccuracyMetrics `dataclass` ¶

Accuracy metrics from time series aggregation.

Attributes:

Name	Type	Description
`rmse`	`DataArray`	Per-column RMSE. Dims: `(cluster_dims, slice_dims)`.
`mae`	`DataArray`	Per-column MAE. Dims: `(cluster_dims, slice_dims)`.
`rmse_duration`	`DataArray`	Per-column duration-curve RMSE. Dims: `(cluster_dims, slice_dims)`.
`weighted_rmse`	`DataArray`	RMSE weighted across columns. Dims: `(*slice_dims)` or scalar.
`weighted_mae`	`DataArray`	MAE weighted across columns. Dims: `(*slice_dims)` or scalar.
`weighted_rmse_duration`	`DataArray`	Duration-curve RMSE weighted across columns. Dims: `(*slice_dims)` or scalar.

Source code in src/tsam_xarray/_result.py

@dataclass(frozen=True, repr=False)
class AccuracyMetrics:
    """Accuracy metrics from time series aggregation.

    Attributes:
        rmse: Per-column RMSE.
            Dims: ``(*cluster_dims, *slice_dims)``.
        mae: Per-column MAE.
            Dims: ``(*cluster_dims, *slice_dims)``.
        rmse_duration: Per-column duration-curve RMSE.
            Dims: ``(*cluster_dims, *slice_dims)``.
        weighted_rmse: RMSE weighted across columns.
            Dims: ``(*slice_dims)`` or scalar.
        weighted_mae: MAE weighted across columns.
            Dims: ``(*slice_dims)`` or scalar.
        weighted_rmse_duration: Duration-curve RMSE weighted
            across columns.
            Dims: ``(*slice_dims)`` or scalar.
    """

    rmse: xr.DataArray
    mae: xr.DataArray
    rmse_duration: xr.DataArray
    weighted_rmse: xr.DataArray
    weighted_mae: xr.DataArray
    weighted_rmse_duration: xr.DataArray

    def __repr__(self) -> str:
        def _fmt(da: xr.DataArray) -> str:
            mean = float(da.mean())
            if da.size <= 1:
                return f"{mean:.4f}"
            return f"{mean:.4f} [{float(da.min()):.4f}-{float(da.max()):.4f}]"

        return (
            f"AccuracyMetrics("
            f"weighted_rmse={_fmt(self.weighted_rmse)}, "
            f"weighted_mae={_fmt(self.weighted_mae)}, "
            f"weighted_rmse_duration="
            f"{_fmt(self.weighted_rmse_duration)})"
        )

AggregationResult `dataclass` ¶

Result of tsam_xarray.aggregate().

Attributes:

Name	Type	Description
`cluster_representatives`	`DataArray`	Typical periods. Dims: `(cluster, timestep, cluster_dims, slice_dims)`.
`cluster_assignments`	`DataArray`	Which cluster each period belongs to. Dims: `(period, *slice_dims)`.
`cluster_counts`	`DataArray`	Periods per cluster. Dims: `(cluster, *slice_dims)`. Formerly `cluster_weights`, which remains as a deprecated alias (following tsam v4's rename).
`segment_durations`	`DataArray \| None`	Duration of each segment, or `None`. Dims: `(cluster, timestep, *slice_dims)`.
`accuracy`	`AccuracyMetrics`	Per-column and weighted accuracy metrics.
`reconstructed`	`DataArray`	Reconstructed time series (same shape and dim order as `original`).
`original`	`DataArray`	The input data.
`clustering`	`ClusteringResult`	Reusable clustering metadata. See `ClusteringResult`.
`is_transferred`	`bool`	Whether this result came from `apply()` vs `aggregate()`.

Source code in src/tsam_xarray/_result.py

@dataclass(frozen=True, repr=False)
class AggregationResult:
    """Result of ``tsam_xarray.aggregate()``.

    Attributes:
        cluster_representatives: Typical periods.
            Dims: ``(cluster, timestep, *cluster_dims,
            *slice_dims)``.
        cluster_assignments: Which cluster each period
            belongs to. Dims: ``(period, *slice_dims)``.
        cluster_counts: Periods per cluster.
            Dims: ``(cluster, *slice_dims)``. Formerly
            ``cluster_weights``, which remains as a deprecated
            alias (following tsam v4's rename).
        segment_durations: Duration of each segment, or
            ``None``. Dims: ``(cluster, timestep,
            *slice_dims)``.
        accuracy: Per-column and weighted accuracy metrics.
        reconstructed: Reconstructed time series
            (same shape and dim order as ``original``).
        original: The input data.
        clustering: Reusable clustering metadata.
            See `ClusteringResult`.
        is_transferred: Whether this result came from
            ``apply()`` vs ``aggregate()``.
    """

    cluster_representatives: xr.DataArray
    cluster_assignments: xr.DataArray
    cluster_counts: xr.DataArray
    segment_durations: xr.DataArray | None
    accuracy: AccuracyMetrics
    reconstructed: xr.DataArray
    original: xr.DataArray
    clustering: ClusteringResult
    is_transferred: bool = False

    def __repr__(self) -> str:
        c = self.clustering
        slices = f", slice_dims={c.slice_dims}" if c.slice_dims else ""
        seg = f", n_segments={self.n_segments}" if self.n_segments else ""
        return (
            f"AggregationResult("
            f"n_clusters={self.n_clusters}, "
            f"n_periods={c.n_original_periods}, "
            f"cluster_dim={c.cluster_dim}"
            f"{slices}{seg}, "
            f"weighted_rmse={float(self.accuracy.weighted_rmse.mean()):.4f})"
        )

    @property
    def dim_names(self) -> DimNames:
        """Names of the structural output dimensions. See `DimNames`."""
        return self.clustering.dim_names

    @property
    def cluster_weights(self) -> xr.DataArray:
        """Deprecated alias for `cluster_counts`.

        Renamed to match tsam v4, where the values are occurrence counts
        rather than weights. Will be removed in a future release.
        """
        warnings.warn(
            "AggregationResult.cluster_weights is deprecated; use "
            "cluster_counts instead.",
            FutureWarning,
            stacklevel=2,
        )
        return self.cluster_counts

    @property
    def n_clusters(self) -> int:
        """Number of cluster representative clusters."""
        return int(self.cluster_counts.sizes[self.dim_names.cluster])

    @property
    def n_timesteps_per_period(self) -> int:
        """Number of timesteps per cluster representative."""
        return int(self.cluster_representatives.sizes[self.dim_names.timestep])

    @property
    def n_segments(self) -> int | None:
        """Number of segments per period, if segmentation was used."""
        first_cr = next(iter(self.clustering.clusterings.values()))
        result: int | None = first_cr.n_segments
        return result

    @property
    def residuals(self) -> xr.DataArray:
        """Difference between original and reconstructed data.

        Shares the dim order of ``original`` and ``reconstructed``.
        """
        return self.original - self.reconstructed

    def compare(self, **sel: object) -> xr.DataArray:
        """Stack ``original`` and ``reconstructed`` along a ``variant`` dim.

        Returns a single DataArray on the original time axis with a new
        ``variant`` coordinate ``["original", "reconstructed"]``, ready to
        plot directly with a ``color=``/``hue="variant"`` grouping — no
        ``melt`` step. This is the canonical way to eyeball aggregation
        quality per column and per slice dim.

        Args:
            **sel: Optional label-based selection applied to both arrays
                before stacking, e.g. ``compare(variable="solar")`` to
                compare a single column.

        Returns:
            DataArray with dims ``("variant", *original.dims)``.

        Examples:
            >>> agg.compare(variable="solar").plotly.line(
            ...     x="time", color="variant"
            ... )
        """
        original = self.original
        reconstructed = self.reconstructed
        if sel:
            original = original.sel(sel)
            reconstructed = reconstructed.sel(sel)
        variant = pd.Index(["original", "reconstructed"], name="variant")
        combined = xr.concat([original, reconstructed], dim=variant)
        combined.name = self.original.name
        return combined

    def to_dataframe(self, **sel: object) -> pd.DataFrame:
        """Tidy/long-form ``original`` vs ``reconstructed`` DataFrame.

        A flat DataFrame with a ``variant`` column
        (``"original"``/``"reconstructed"``), the ``time`` axis, every
        cluster and slice dim, and a value column — ready to hand
        straight to a plotting library.

        Args:
            **sel: Optional label-based selection forwarded to
                `compare` (e.g. ``variable="solar"``).

        Returns:
            Long-form DataFrame with a ``variant`` column and one value
            column (named after the input DataArray, or ``"value"``).
        """
        combined = self.compare(**sel)
        name = combined.name
        if name is None or str(name) == "variant":
            name = "value"
        return combined.to_dataframe(name=str(name)).reset_index()

    def disaggregate(self, data: xr.DataArray) -> xr.DataArray:
        """Map data on ``(cluster, timestep)`` back to original time.

        This is the inverse of ``aggregate()``. Use it to expand
        external data computed on the compact cluster-representative
        grid (e.g., optimization results) back to the full time
        axis.

        Without segmentation, values are repeated for each timestep
        in the period. With segmentation, values are placed at
        segment boundaries and remaining timesteps are NaN — use
        ``.ffill(dim="time")``,
        ``.interpolate_na(dim="time")``, etc.

        Args:
            data: Data with ``cluster`` and ``timestep`` dims,
                matching the shape of
                ``result.cluster_representatives``. Additional
                dims (including auto-sliced dims like scenario)
                are supported.

        Returns:
            Data with ``cluster`` and ``timestep`` replaced by
            the original ``time`` dimension.
        """
        # Use stored slice_dims for canonical ordering
        slice_dims = self.clustering.slice_dims
        if not slice_dims:
            return self._disaggregate_single(data)

        import itertools

        from tsam_xarray._core import _concat_along_dims

        slice_coords = {d: data.coords[d].values for d in slice_dims}
        keys = list(itertools.product(*(slice_coords[d] for d in slice_dims)))
        results = []
        for key in keys:
            sel = dict(zip(slice_dims, key, strict=True))
            data_slice = data.sel(sel)
            result_slice = self._make_slice_view(sel)
            results.append(result_slice._disaggregate_single(data_slice))

        return _concat_along_dims(results, slice_dims, slice_coords)

    def _make_slice_view(self, sel: dict[str, object]) -> AggregationResult:
        """Create a view of this result for a single slice."""
        from tsam_xarray._clustering import (
            ClusteringResult as CR,
        )
        from tsam_xarray._clustering import (
            _lookup_clustering,
        )

        # Build key in stored slice_dims order
        key = tuple(sel[d] for d in self.clustering.slice_dims)
        cr = _lookup_clustering(self.clustering.clusterings, key)

        return AggregationResult(
            cluster_representatives=self.cluster_representatives.sel(sel),
            cluster_assignments=self.cluster_assignments.sel(sel),
            cluster_counts=self.cluster_counts.sel(sel),
            segment_durations=(
                self.segment_durations.sel(sel)
                if self.segment_durations is not None
                else None
            ),
            accuracy=AccuracyMetrics(
                rmse=self.accuracy.rmse.sel(sel),
                mae=self.accuracy.mae.sel(sel),
                rmse_duration=self.accuracy.rmse_duration.sel(sel),
                weighted_rmse=self.accuracy.weighted_rmse.sel(sel),
                weighted_mae=self.accuracy.weighted_mae.sel(sel),
                weighted_rmse_duration=self.accuracy.weighted_rmse_duration.sel(sel),
            ),
            reconstructed=self.reconstructed.sel(sel),
            original=self.original.sel(sel),
            clustering=CR(
                time_dim=self.clustering.time_dim,
                cluster_dim=self.clustering.cluster_dim,
                slice_dims=[],
                clusterings={(): cr},
                dim_names=self.clustering.dim_names,
            ),
        )

    def _disaggregate_single(self, data: xr.DataArray) -> xr.DataArray:
        """Disaggregate without slice dims."""
        from tsam_xarray._clustering import _disaggregate_single

        cr = self.clustering.clusterings[()]
        return _disaggregate_single(cr, data, self.clustering.dim_names)

dim_names `property` ¶

dim_names: DimNames

Names of the structural output dimensions. See DimNames.

cluster_weights `property` ¶

cluster_weights: DataArray

Deprecated alias for cluster_counts.

Renamed to match tsam v4, where the values are occurrence counts rather than weights. Will be removed in a future release.

n_clusters `property` ¶

n_clusters: int

Number of cluster representative clusters.

n_timesteps_per_period `property` ¶

n_timesteps_per_period: int

Number of timesteps per cluster representative.

n_segments `property` ¶

n_segments: int | None

Number of segments per period, if segmentation was used.

residuals `property` ¶

residuals: DataArray

Difference between original and reconstructed data.

Shares the dim order of original and reconstructed.

compare ¶

compare(**sel: object) -> xr.DataArray

Stack original and reconstructed along a variant dim.

Returns a single DataArray on the original time axis with a new variant coordinate ["original", "reconstructed"], ready to plot directly with a color=/hue="variant" grouping — no melt step. This is the canonical way to eyeball aggregation quality per column and per slice dim.

Parameters:

Name	Type	Description	Default
`**sel`	`object`	Optional label-based selection applied to both arrays before stacking, e.g. `compare(variable="solar")` to compare a single column.	`{}`

Returns:

Type	Description
`DataArray`	DataArray with dims `("variant", *original.dims)`.

Examples:

>>> agg.compare(variable="solar").plotly.line(
...     x="time", color="variant"
... )

Source code in src/tsam_xarray/_result.py

def compare(self, **sel: object) -> xr.DataArray:
    """Stack ``original`` and ``reconstructed`` along a ``variant`` dim.

    Returns a single DataArray on the original time axis with a new
    ``variant`` coordinate ``["original", "reconstructed"]``, ready to
    plot directly with a ``color=``/``hue="variant"`` grouping — no
    ``melt`` step. This is the canonical way to eyeball aggregation
    quality per column and per slice dim.

    Args:
        **sel: Optional label-based selection applied to both arrays
            before stacking, e.g. ``compare(variable="solar")`` to
            compare a single column.

    Returns:
        DataArray with dims ``("variant", *original.dims)``.

    Examples:
        >>> agg.compare(variable="solar").plotly.line(
        ...     x="time", color="variant"
        ... )
    """
    original = self.original
    reconstructed = self.reconstructed
    if sel:
        original = original.sel(sel)
        reconstructed = reconstructed.sel(sel)
    variant = pd.Index(["original", "reconstructed"], name="variant")
    combined = xr.concat([original, reconstructed], dim=variant)
    combined.name = self.original.name
    return combined

to_dataframe ¶

to_dataframe(**sel: object) -> pd.DataFrame

Tidy/long-form original vs reconstructed DataFrame.

A flat DataFrame with a variant column ("original"/"reconstructed"), the time axis, every cluster and slice dim, and a value column — ready to hand straight to a plotting library.

Parameters:

Name	Type	Description	Default
`**sel`	`object`	Optional label-based selection forwarded to `compare` (e.g. `variable="solar"`).	`{}`

Returns:

Type	Description
`DataFrame`	Long-form DataFrame with a `variant` column and one value
`DataFrame`	column (named after the input DataArray, or `"value"`).

Source code in src/tsam_xarray/_result.py

def to_dataframe(self, **sel: object) -> pd.DataFrame:
    """Tidy/long-form ``original`` vs ``reconstructed`` DataFrame.

    A flat DataFrame with a ``variant`` column
    (``"original"``/``"reconstructed"``), the ``time`` axis, every
    cluster and slice dim, and a value column — ready to hand
    straight to a plotting library.

    Args:
        **sel: Optional label-based selection forwarded to
            `compare` (e.g. ``variable="solar"``).

    Returns:
        Long-form DataFrame with a ``variant`` column and one value
        column (named after the input DataArray, or ``"value"``).
    """
    combined = self.compare(**sel)
    name = combined.name
    if name is None or str(name) == "variant":
        name = "value"
    return combined.to_dataframe(name=str(name)).reset_index()

disaggregate ¶

disaggregate(data: DataArray) -> xr.DataArray

Map data on (cluster, timestep) back to original time.

This is the inverse of aggregate(). Use it to expand external data computed on the compact cluster-representative grid (e.g., optimization results) back to the full time axis.

Without segmentation, values are repeated for each timestep in the period. With segmentation, values are placed at segment boundaries and remaining timesteps are NaN — use .ffill(dim="time"), .interpolate_na(dim="time"), etc.

Parameters:

Name	Type	Description	Default
`data`	`DataArray`	Data with `cluster` and `timestep` dims, matching the shape of `result.cluster_representatives`. Additional dims (including auto-sliced dims like scenario) are supported.	required

Returns:

Type	Description
`DataArray`	Data with `cluster` and `timestep` replaced by
`DataArray`	the original `time` dimension.

Source code in src/tsam_xarray/_result.py

def disaggregate(self, data: xr.DataArray) -> xr.DataArray:
    """Map data on ``(cluster, timestep)`` back to original time.

    This is the inverse of ``aggregate()``. Use it to expand
    external data computed on the compact cluster-representative
    grid (e.g., optimization results) back to the full time
    axis.

    Without segmentation, values are repeated for each timestep
    in the period. With segmentation, values are placed at
    segment boundaries and remaining timesteps are NaN — use
    ``.ffill(dim="time")``,
    ``.interpolate_na(dim="time")``, etc.

    Args:
        data: Data with ``cluster`` and ``timestep`` dims,
            matching the shape of
            ``result.cluster_representatives``. Additional
            dims (including auto-sliced dims like scenario)
            are supported.

    Returns:
        Data with ``cluster`` and ``timestep`` replaced by
        the original ``time`` dimension.
    """
    # Use stored slice_dims for canonical ordering
    slice_dims = self.clustering.slice_dims
    if not slice_dims:
        return self._disaggregate_single(data)

    import itertools

    from tsam_xarray._core import _concat_along_dims

    slice_coords = {d: data.coords[d].values for d in slice_dims}
    keys = list(itertools.product(*(slice_coords[d] for d in slice_dims)))
    results = []
    for key in keys:
        sel = dict(zip(slice_dims, key, strict=True))
        data_slice = data.sel(sel)
        result_slice = self._make_slice_view(sel)
        results.append(result_slice._disaggregate_single(data_slice))

    return _concat_along_dims(results, slice_dims, slice_coords)

TuningResult `dataclass` ¶

Result of hyperparameter tuning.

Attributes:

Name	Type	Description
`n_clusters`	`int`	Optimal number of typical periods.
`n_segments`	`int`	Optimal number of segments per period.
`rmse`	`float`	RMSE of the optimal configuration.
`best_result`	`AggregationResult`	The AggregationResult for the optimal configuration.
`history`	`list[dict[str, Any]]`	History of all tested configurations with their RMSE values.
`all_results`	`list[AggregationResult]`	All AggregationResults from tuning (when `save_all_results=True`).

Source code in src/tsam_xarray/_tuning.py

@dataclass
class TuningResult:
    """Result of hyperparameter tuning.

    Attributes:
        n_clusters: Optimal number of typical periods.
        n_segments: Optimal number of segments per period.
        rmse: RMSE of the optimal configuration.
        best_result: The AggregationResult for the optimal
            configuration.
        history: History of all tested configurations with
            their RMSE values.
        all_results: All AggregationResults from tuning
            (when ``save_all_results=True``).
    """

    n_clusters: int
    n_segments: int
    rmse: float
    best_result: AggregationResult
    history: list[dict[str, Any]] = field(repr=False)
    all_results: list[AggregationResult] = field(default_factory=list, repr=False)
    _cache: dict[str, Any] = field(
        default_factory=dict, repr=False, init=False, compare=False
    )

    @property
    def summary(self) -> pd.DataFrame:
        """Summary table of all tested configurations, sorted by RMSE."""
        import pandas as pd

        return pd.DataFrame(self.history).sort_values("rmse")

    @property
    def summary_matrix(self) -> xr.Dataset:
        """Metrics as Dataset with ``(n_clusters, n_segments)`` dims.

        Contains ``rmse`` and ``timesteps`` as variables.
        NaN where a combination was not tested.
        """
        import pandas as pd

        df = pd.DataFrame(self.history)
        return df.set_index(["n_clusters", "n_segments"]).to_xarray()

    def _require_all_results(self) -> None:
        if not self.all_results:
            msg = (
                "No results available. Use save_all_results=True "
                "in the tuning function."
            )
            raise ValueError(msg)
        if len(self.all_results) != len(self.history):
            msg = (
                f"Results/history mismatch: "
                f"{len(self.all_results)} results "
                f"vs {len(self.history)} history entries."
            )
            raise ValueError(msg)

    @property
    def reconstructed(self) -> xr.DataArray:
        """Reconstructed time series for each tested config.

        Lazy and cached.  Returns an xarray DataArray with the
        original dimensions plus ``(n_clusters, n_segments)``.
        NaN where a combination was not tested.

        Requires ``save_all_results=True``.
        """
        if "reconstructed" not in self._cache:
            self._require_all_results()
            import xarray as xr

            arrays = []
            for h, res in zip(self.history, self.all_results, strict=True):
                arr = res.reconstructed.expand_dims(
                    n_clusters=[h["n_clusters"]],
                    n_segments=[h["n_segments"]],
                )
                arrays.append(arr)
            self._cache["reconstructed"] = xr.combine_by_coords(arrays, join="outer")
        return self._cache["reconstructed"]  # type: ignore[no-any-return]

    @property
    def accuracy(self) -> xr.Dataset:
        """Per-column accuracy metrics for each tested config.

        Lazy and cached.  Returns an xarray Dataset with variables
        ``rmse``, ``mae``, and ``rmse_duration``, each with the
        cluster dimensions plus ``(n_clusters, n_segments)``.
        NaN where a combination was not tested.

        Requires ``save_all_results=True``.
        """
        if "accuracy" not in self._cache:
            self._require_all_results()
            import xarray as xr

            datasets = []
            for h, res in zip(self.history, self.all_results, strict=True):
                dims = {
                    "n_clusters": [h["n_clusters"]],
                    "n_segments": [h["n_segments"]],
                }
                ds = xr.Dataset(
                    {
                        "rmse": res.accuracy.rmse.expand_dims(dims),
                        "mae": res.accuracy.mae.expand_dims(dims),
                        "rmse_duration": res.accuracy.rmse_duration.expand_dims(dims),
                    }
                )
                datasets.append(ds)
            self._cache["accuracy"] = xr.combine_by_coords(datasets, join="outer")
        return self._cache["accuracy"]  # type: ignore[no-any-return]

    def find_by_timesteps(self, target: int) -> AggregationResult:
        """Find the result closest to a target timestep count.

        Requires ``save_all_results=True``.
        """
        self._require_all_results()
        best_idx = 0
        best_diff = float("inf")
        for i, h in enumerate(self.history):
            diff = abs(h["timesteps"] - target)
            if diff < best_diff:
                best_diff = diff
                best_idx = i
        return self.all_results[best_idx]

    def find_by_rmse(self, threshold: float) -> AggregationResult:
        """Find the smallest configuration that achieves a target RMSE.

        Returns the configuration with the fewest timesteps whose RMSE
        is at or below ``threshold``.

        Requires ``save_all_results=True``.
        """
        self._require_all_results()
        candidates: list[tuple[int, int]] = []  # (timesteps, index)
        for i, h in enumerate(self.history):
            if h["rmse"] <= threshold:
                candidates.append((h["timesteps"], i))

        if not candidates:
            best_available = min(h["rmse"] for h in self.history)
            msg = (
                f"No configuration achieves RMSE <= {threshold}. "
                f"Best available: {best_available:.4f}"
            )
            raise ValueError(msg)

        candidates.sort(key=lambda x: x[0])
        return self.all_results[candidates[0][1]]

    def plot(self, show_labels: bool = True, **kwargs: Any) -> go.Figure:
        """Plot RMSE vs timesteps.

        Requires ``plotly`` (``pip install plotly``).
        """
        try:
            import plotly.graph_objects as go
        except ImportError as exc:
            msg = "plotly is required for plot(): pip install plotly"
            raise ImportError(msg) from exc

        summary = self.summary
        hover_text = [
            f"{row['n_clusters']}x{row['n_segments']}<br>"
            f"Timesteps: {row['timesteps']}<br>"
            f"RMSE: {row['rmse']:.4f}"
            for _, row in summary.iterrows()
        ]

        fig = go.Figure()
        fig.add_trace(
            go.Scatter(
                x=summary["timesteps"],
                y=summary["rmse"],
                mode="lines+markers" if len(summary) > 1 else "markers",
                marker={"size": 10},
                hovertext=hover_text if show_labels else None,
                hoverinfo="text" if show_labels else "x+y",
                **kwargs,
            )
        )
        fig.update_layout(
            title="Tuning Results: Complexity vs Accuracy",
            xaxis_title="Timesteps (n_clusters x n_segments)",
            yaxis_title="RMSE",
            hovermode="closest",
        )
        return fig

    def __len__(self) -> int:
        return len(self.all_results)

    def __getitem__(self, index: int) -> AggregationResult:
        self._require_all_results()
        return self.all_results[index]

    def __iter__(self) -> Any:
        self._require_all_results()
        return iter(self.all_results)

summary `property` ¶

summary: DataFrame

Summary table of all tested configurations, sorted by RMSE.

summary_matrix `property` ¶

summary_matrix: Dataset

Metrics as Dataset with (n_clusters, n_segments) dims.

Contains rmse and timesteps as variables. NaN where a combination was not tested.

reconstructed `property` ¶

reconstructed: DataArray

Reconstructed time series for each tested config.

Lazy and cached. Returns an xarray DataArray with the original dimensions plus (n_clusters, n_segments). NaN where a combination was not tested.

Requires save_all_results=True.

accuracy `property` ¶

accuracy: Dataset

Per-column accuracy metrics for each tested config.

Lazy and cached. Returns an xarray Dataset with variables rmse, mae, and rmse_duration, each with the cluster dimensions plus (n_clusters, n_segments). NaN where a combination was not tested.

Requires save_all_results=True.

find_by_timesteps ¶

find_by_timesteps(target: int) -> AggregationResult

Find the result closest to a target timestep count.

Requires save_all_results=True.

Source code in src/tsam_xarray/_tuning.py

def find_by_timesteps(self, target: int) -> AggregationResult:
    """Find the result closest to a target timestep count.

    Requires ``save_all_results=True``.
    """
    self._require_all_results()
    best_idx = 0
    best_diff = float("inf")
    for i, h in enumerate(self.history):
        diff = abs(h["timesteps"] - target)
        if diff < best_diff:
            best_diff = diff
            best_idx = i
    return self.all_results[best_idx]

find_by_rmse ¶

find_by_rmse(threshold: float) -> AggregationResult

Find the smallest configuration that achieves a target RMSE.

Returns the configuration with the fewest timesteps whose RMSE is at or below threshold.

Requires save_all_results=True.

Source code in src/tsam_xarray/_tuning.py

def find_by_rmse(self, threshold: float) -> AggregationResult:
    """Find the smallest configuration that achieves a target RMSE.

    Returns the configuration with the fewest timesteps whose RMSE
    is at or below ``threshold``.

    Requires ``save_all_results=True``.
    """
    self._require_all_results()
    candidates: list[tuple[int, int]] = []  # (timesteps, index)
    for i, h in enumerate(self.history):
        if h["rmse"] <= threshold:
            candidates.append((h["timesteps"], i))

    if not candidates:
        best_available = min(h["rmse"] for h in self.history)
        msg = (
            f"No configuration achieves RMSE <= {threshold}. "
            f"Best available: {best_available:.4f}"
        )
        raise ValueError(msg)

    candidates.sort(key=lambda x: x[0])
    return self.all_results[candidates[0][1]]

plot ¶

plot(show_labels: bool = True, **kwargs: Any) -> go.Figure

Plot RMSE vs timesteps.

Requires plotly (pip install plotly).

Source code in src/tsam_xarray/_tuning.py

def plot(self, show_labels: bool = True, **kwargs: Any) -> go.Figure:
    """Plot RMSE vs timesteps.

    Requires ``plotly`` (``pip install plotly``).
    """
    try:
        import plotly.graph_objects as go
    except ImportError as exc:
        msg = "plotly is required for plot(): pip install plotly"
        raise ImportError(msg) from exc

    summary = self.summary
    hover_text = [
        f"{row['n_clusters']}x{row['n_segments']}<br>"
        f"Timesteps: {row['timesteps']}<br>"
        f"RMSE: {row['rmse']:.4f}"
        for _, row in summary.iterrows()
    ]

    fig = go.Figure()
    fig.add_trace(
        go.Scatter(
            x=summary["timesteps"],
            y=summary["rmse"],
            mode="lines+markers" if len(summary) > 1 else "markers",
            marker={"size": 10},
            hovertext=hover_text if show_labels else None,
            hoverinfo="text" if show_labels else "x+y",
            **kwargs,
        )
    )
    fig.update_layout(
        title="Tuning Results: Complexity vs Accuracy",
        xaxis_title="Timesteps (n_clusters x n_segments)",
        yaxis_title="RMSE",
        hovermode="closest",
    )
    return fig

aggregate ¶

aggregate(
    da: DataArray,
    *,
    time_dim: str,
    cluster_dim: Sequence[str] | str,
    n_clusters: int,
    weights: Weights = None,
    cluster_on: ClusterOn = None,
    dim_names: DimNames | None = None,
    **tsam_kwargs: Any,
) -> AggregationResult

Aggregate an xarray DataArray using tsam.

Parameters:

Name	Type	Description	Default
`da`	`DataArray`	Input data with a time dimension and optional extra dimensions.	required
`time_dim`	`str`	Name of the time dimension.	required
`cluster_dim`	`Sequence[str] \| str`	Dimension(s) to cluster together. Multiple dims are stacked internally into a MultiIndex and unstacked in results. All remaining dims are sliced independently. Empty `()` for 1D time series with no column dimension.	required
`n_clusters`	`int`	Number of cluster representatives.	required
`weights`	`Weights`	Per-coordinate weights for clustering. Missing entries default to 1.0. Two formats: Simple dict (single `cluster_dim`):: weights={"solar": 2.0, "wind": 1.0} Dict-of-dicts (multiple `cluster_dim`):: weights={ "variable": {"solar": 2.0}, "region": {"north": 1.5}, } Weights are multiplied across dimensions, e.g. `("solar", "north")` gets weight `2.0 * 1.5 = 3.0`.	`None`
`cluster_on`	`ClusterOn`	Restrict which coordinates drive the clustering. Coordinates not selected are still aggregated and reconstructed from the resulting clusters, but have no influence on how the clusters are formed. `None` (default) clusters on everything. Mirrors the `weights` formats: List (single `cluster_dim`):: cluster_on=["solar", "wind"] Dict (multiple `cluster_dim`):: cluster_on={"variable": ["solar", "wind"]} A dim omitted from the dict is unrestricted (all its coordinates are eligible). When several dims are listed, a column must match on all of them to be clustered on. At least one column must remain selected. Not compatible with `ExtremeConfig(method= "replace")` — the carried columns are filled by transferring the clustering, which cannot reproduce the hybrid 'replace' representative. Use a low `weights` value to de-emphasise a column instead of excluding it. An `ExtremeConfig` also may not reference an excluded coordinate, since extreme periods are identified only on the clustered-on columns.	`None`
`dim_names`	`DimNames \| None`	Names for the structural output dimensions (`cluster`, `timestep`, `period`, `segment`). `None` (default) keeps today's names. Override to avoid collisions with the caller's own dimension names. See `DimNames`.	`None`
`**tsam_kwargs`	`Any`	Additional keyword arguments passed to `tsam.aggregate()`.	`{}`

Source code in src/tsam_xarray/_core.py

def aggregate(
    da: xr.DataArray,
    *,
    time_dim: str,
    cluster_dim: Sequence[str] | str,
    n_clusters: int,
    weights: Weights = None,
    cluster_on: ClusterOn = None,
    dim_names: DimNames | None = None,
    **tsam_kwargs: Any,
) -> AggregationResult:
    """Aggregate an xarray DataArray using tsam.

    Args:
        da: Input data with a time dimension and optional
            extra dimensions.
        time_dim: Name of the time dimension.
        cluster_dim: Dimension(s) to cluster together.
            Multiple dims are stacked internally into a
            MultiIndex and unstacked in results. All remaining
            dims are sliced independently. Empty ``()`` for 1D
            time series with no column dimension.
        n_clusters: Number of cluster representatives.
        weights: Per-coordinate weights for clustering.
            Missing entries default to 1.0. Two formats:

            - **Simple dict** (single ``cluster_dim``)::

                  weights={"solar": 2.0, "wind": 1.0}

            - **Dict-of-dicts** (multiple ``cluster_dim``)::

                  weights={
                      "variable": {"solar": 2.0},
                      "region": {"north": 1.5},
                  }

              Weights are multiplied across dimensions,
              e.g. ``("solar", "north")`` gets weight
              ``2.0 * 1.5 = 3.0``.

        cluster_on: Restrict which coordinates drive the
            clustering. Coordinates not selected are still
            aggregated and reconstructed from the resulting
            clusters, but have no influence on how the clusters
            are formed. ``None`` (default) clusters on
            everything. Mirrors the ``weights`` formats:

            - **List** (single ``cluster_dim``)::

                  cluster_on=["solar", "wind"]

            - **Dict** (multiple ``cluster_dim``)::

                  cluster_on={"variable": ["solar", "wind"]}

              A dim omitted from the dict is unrestricted (all
              its coordinates are eligible). When several dims
              are listed, a column must match on *all* of them
              to be clustered on. At least one column must
              remain selected.

              Not compatible with ``ExtremeConfig(method=
              "replace")`` — the carried columns are filled by
              transferring the clustering, which cannot
              reproduce the hybrid 'replace' representative. Use
              a low ``weights`` value to de-emphasise a column
              instead of excluding it. An ``ExtremeConfig`` also
              may not reference an excluded coordinate, since
              extreme periods are identified only on the
              clustered-on columns.

        dim_names: Names for the structural output dimensions
            (``cluster``, ``timestep``, ``period``, ``segment``).
            ``None`` (default) keeps today's names. Override to
            avoid collisions with the caller's own dimension
            names. See `DimNames`.

        **tsam_kwargs: Additional keyword arguments passed to
            ``tsam.aggregate()``.
    """
    resolved_dim_names = dim_names if dim_names is not None else DimNames()
    _validate_time_dim(da, time_dim)
    col_dims = _resolve_cluster_dim(cluster_dim)
    slice_dims = _infer_slice_dims(da, time_dim, col_dims)
    _validate(da, time_dim, col_dims, slice_dims)
    _validate_dim_names(resolved_dim_names, time_dim, col_dims, slice_dims)
    da = _validate_data(da, time_dim, col_dims, slice_dims)
    _validate_no_cluster_config_weights(tsam_kwargs)
    per_dim_weights = _normalize_weights(weights, da, col_dims)
    active_coords = _normalize_cluster_on(cluster_on, da, col_dims)
    _validate_extremes_with_cluster_on(tsam_kwargs, active_coords, da)

    if not slice_dims:
        return _aggregate_single(
            da,
            n_clusters,
            time_dim,
            col_dims,
            per_dim_weights,
            active_coords,
            tsam_kwargs,
            resolved_dim_names,
        )

    slice_coords = {d: da.coords[d].values for d in slice_dims}
    slice_keys = list(itertools.product(*(slice_coords[d] for d in slice_dims)))

    results: list[AggregationResult] = []

    for key in slice_keys:
        sel = dict(zip(slice_dims, key, strict=True))
        da_slice = da.sel(sel)
        r = _aggregate_single(
            da_slice,
            n_clusters,
            time_dim,
            col_dims,
            per_dim_weights,
            active_coords,
            tsam_kwargs,
            resolved_dim_names,
        )
        results.append(r)

    # Validate consistent cluster counts (can differ with extremes="append")
    _validate_consistent_cluster_counts(results, slice_keys)

    return _concat_results(results, slice_dims, slice_coords, slice_keys)

find_best_combination ¶

find_best_combination(
    *args: Any, **kwargs: Any
) -> TuningResult

Deprecated alias for :func:grid_search.

Source code in src/tsam_xarray/_tuning.py

def find_best_combination(*args: Any, **kwargs: Any) -> TuningResult:
    """Deprecated alias for :func:`grid_search`."""
    import warnings

    warnings.warn(
        "find_best_combination is deprecated, use grid_search instead",
        FutureWarning,
        stacklevel=2,
    )
    return grid_search(*args, **kwargs)

find_optimal_combination ¶

find_optimal_combination(
    da: Any,
    *,
    time_dim: str,
    cluster_dim: Sequence[str] | str,
    data_reduction: float,
    weights: Weights = None,
    period_duration: int | float | str = 24,
    show_progress: bool = True,
    save_all_results: bool = True,
    **tsam_kwargs: Any,
) -> TuningResult

Find optimal n_clusters/n_segments for a target data reduction.

Tests all (n_clusters, n_segments) combinations that achieve the target data reduction, evaluating each across all slices.

Parameters:

Name	Type	Description	Default
`da`	`Any`	Input data.	required
`time_dim`	`str`	Name of the time dimension.	required
`cluster_dim`	`Sequence[str] \| str`	Dimension(s) to cluster together.	required
`data_reduction`	`float`	Target data reduction (e.g., 0.01 for 1% of original).	required
`weights`	`Weights`	Per-coordinate weights for clustering and RMSE evaluation.	`None`
`period_duration`	`int \| float \| str`	Hours per period (default: 24 for daily).	`24`
`show_progress`	`bool`	Show progress bar (requires tqdm).	`True`
`save_all_results`	`bool`	Keep all AggregationResults (memory-intensive).	`True`
`**tsam_kwargs`	`Any`	Additional keyword arguments passed to `tsam.aggregate()`.	`{}`

Returns:

Type	Description
`TuningResult`	Best combination with lowest overall RMSE.

Source code in src/tsam_xarray/_tuning.py

def find_optimal_combination(
    da: Any,
    *,
    time_dim: str,
    cluster_dim: Sequence[str] | str,
    data_reduction: float,
    weights: Weights = None,
    period_duration: int | float | str = 24,
    show_progress: bool = True,
    save_all_results: bool = True,
    **tsam_kwargs: Any,
) -> TuningResult:
    """Find optimal n_clusters/n_segments for a target data reduction.

    Tests all (n_clusters, n_segments) combinations that achieve
    the target data reduction, evaluating each across all slices.

    Args:
        da: Input data.
        time_dim: Name of the time dimension.
        cluster_dim: Dimension(s) to cluster together.
        data_reduction: Target data reduction (e.g., 0.01 for
            1% of original).
        weights: Per-coordinate weights for clustering and
            RMSE evaluation.
        period_duration: Hours per period (default: 24 for
            daily).
        show_progress: Show progress bar (requires tqdm).
        save_all_results: Keep all AggregationResults
            (memory-intensive).
        **tsam_kwargs: Additional keyword arguments passed to
            ``tsam.aggregate()``.

    Returns:
        Best combination with lowest overall RMSE.
    """
    n_timesteps_per_period, _n_periods, n_timesteps = _infer_time_params(
        da, time_dim, period_duration
    )

    # Generate candidates: for each segment count, max clusters that fits
    seen: set[tuple[int, int]] = set()
    candidates: list[tuple[int, int]] = []
    for n_seg in range(1, n_timesteps_per_period + 1):
        n_clust = find_clusters_for_reduction(n_timesteps, n_seg, data_reduction)
        if n_clust >= 2 and (n_clust, n_seg) not in seen:
            candidates.append((n_clust, n_seg))
            seen.add((n_clust, n_seg))

    if not candidates:
        msg = (
            f"No valid (n_clusters, n_segments) combinations "
            f"for data_reduction={data_reduction}"
        )
        raise ValueError(msg)

    history, all_results, best_rmse, best_result, best_nc, best_ns = (
        _evaluate_candidates(
            candidates,
            da,
            time_dim=time_dim,
            cluster_dim=cluster_dim,
            weights=weights,
            period_duration=period_duration,
            show_progress=show_progress,
            progress_desc="Testing configurations",
            save_all_results=save_all_results,
            tsam_kwargs=tsam_kwargs,
        )
    )

    if best_result is None:
        msg = "All configurations failed"
        raise RuntimeError(msg)

    return TuningResult(
        n_clusters=best_nc,
        n_segments=best_ns,
        rmse=best_rmse,
        best_result=best_result,
        history=history,
        all_results=all_results,
    )

find_pareto_front ¶

find_pareto_front(
    da: Any,
    *,
    time_dim: str,
    cluster_dim: Sequence[str] | str,
    max_timesteps: int | None = None,
    timesteps: Sequence[int] | None = None,
    weights: Weights = None,
    period_duration: int | float | str = 24,
    show_progress: bool = True,
    save_all_results: bool = True,
    **tsam_kwargs: Any,
) -> TuningResult

Find Pareto-optimal configs (RMSE vs complexity).

Runs the same grid search as :func:grid_search but filters the results to the Pareto frontier -- configurations where no other tested combo has both lower RMSE and fewer timesteps.

Parameters:

Name	Type	Description	Default
`da`	`Any`	Input data.	required
`time_dim`	`str`	Name of the time dimension.	required
`cluster_dim`	`Sequence[str] \| str`	Dimension(s) to cluster together.	required
`max_timesteps`	`int \| None`	Maximum total timesteps to test (n_clusters * n_segments). Defaults to total number of timesteps in the data.	`None`
`timesteps`	`Sequence[int] \| None`	Specific timestep counts to test. Only combinations where `n_clusters * n_segments` is in this list are evaluated. Mutually exclusive with `max_timesteps`.	`None`
`weights`	`Weights`	Per-coordinate weights for clustering and RMSE evaluation.	`None`
`period_duration`	`int \| float \| str`	Hours per period (default: 24).	`24`
`show_progress`	`bool`	Show progress bar.	`True`
`save_all_results`	`bool`	Keep all AggregationResults (memory-intensive).	`True`
`**tsam_kwargs`	`Any`	Additional keyword arguments passed to `tsam.aggregate()`.	`{}`

Returns:

Type	Description
`TuningResult`	Pareto-optimal result with lowest RMSE on the
`TuningResult`	frontier.

Source code in src/tsam_xarray/_tuning.py

def find_pareto_front(
    da: Any,
    *,
    time_dim: str,
    cluster_dim: Sequence[str] | str,
    max_timesteps: int | None = None,
    timesteps: Sequence[int] | None = None,
    weights: Weights = None,
    period_duration: int | float | str = 24,
    show_progress: bool = True,
    save_all_results: bool = True,
    **tsam_kwargs: Any,
) -> TuningResult:
    """Find Pareto-optimal configs (RMSE vs complexity).

    Runs the same grid search as :func:`grid_search`
    but filters the results to the Pareto frontier --
    configurations where no other tested combo has both lower
    RMSE and fewer timesteps.

    Args:
        da: Input data.
        time_dim: Name of the time dimension.
        cluster_dim: Dimension(s) to cluster together.
        max_timesteps: Maximum total timesteps to test
            (n_clusters * n_segments). Defaults to total
            number of timesteps in the data.
        timesteps: Specific timestep counts to test. Only
            combinations where ``n_clusters * n_segments``
            is in this list are evaluated. Mutually exclusive
            with ``max_timesteps``.
        weights: Per-coordinate weights for clustering and
            RMSE evaluation.
        period_duration: Hours per period (default: 24).
        show_progress: Show progress bar.
        save_all_results: Keep all AggregationResults
            (memory-intensive).
        **tsam_kwargs: Additional keyword arguments passed to
            ``tsam.aggregate()``.

    Returns:
        Pareto-optimal result with lowest RMSE on the
        frontier.
    """
    grid = grid_search(
        da,
        time_dim=time_dim,
        cluster_dim=cluster_dim,
        max_timesteps=max_timesteps,
        timesteps=timesteps,
        weights=weights,
        period_duration=period_duration,
        show_progress=show_progress,
        save_all_results=save_all_results,
        **tsam_kwargs,
    )

    pareto_history, pareto_results = _pareto_filter(grid.history, grid.all_results)

    # Best on Pareto front = lowest RMSE (last entry when sorted
    # by ascending timesteps / descending RMSE).
    best_idx = min(
        range(len(pareto_history)),
        key=lambda i: pareto_history[i]["rmse"],
    )
    best_h = pareto_history[best_idx]

    # Reuse the best_result from the grid search when it matches.
    if (
        best_h["n_clusters"] == grid.n_clusters
        and best_h["n_segments"] == grid.n_segments
    ):
        best_result = grid.best_result
    elif pareto_results:
        best_result = pareto_results[best_idx]
    else:
        seg_config = SegmentConfig(n_segments=best_h["n_segments"])
        best_result = aggregate(
            da,
            time_dim=time_dim,
            cluster_dim=cluster_dim,
            n_clusters=best_h["n_clusters"],
            weights=weights,
            segments=seg_config,
            period_duration=period_duration,
            **tsam_kwargs,
        )

    return TuningResult(
        n_clusters=best_h["n_clusters"],
        n_segments=best_h["n_segments"],
        rmse=best_h["rmse"],
        best_result=best_result,
        history=pareto_history,
        all_results=pareto_results,
    )

grid_search ¶

grid_search(
    da: Any,
    *,
    time_dim: str,
    cluster_dim: Sequence[str] | str,
    max_timesteps: int | None = None,
    timesteps: Sequence[int] | None = None,
    weights: Weights = None,
    period_duration: int | float | str = 24,
    show_progress: bool = True,
    save_all_results: bool = True,
    **tsam_kwargs: Any,
) -> TuningResult

Full grid search for best (n_clusters, n_segments).

Tests all valid (n_clusters, n_segments) pairs up to max_timesteps and returns the one with the lowest overall RMSE. The complete unfiltered history is preserved.

Parameters:

Name	Type	Description	Default
`da`	`Any`	Input data.	required
`time_dim`	`str`	Name of the time dimension.	required
`cluster_dim`	`Sequence[str] \| str`	Dimension(s) to cluster together.	required
`max_timesteps`	`int \| None`	Maximum total timesteps to test (n_clusters * n_segments). Defaults to total number of timesteps in the data.	`None`
`timesteps`	`Sequence[int] \| None`	Specific timestep counts to test. Only combinations where `n_clusters * n_segments` is in this list are evaluated. Mutually exclusive with `max_timesteps`.	`None`
`weights`	`Weights`	Per-coordinate weights for clustering and RMSE evaluation.	`None`
`period_duration`	`int \| float \| str`	Hours per period (default: 24).	`24`
`show_progress`	`bool`	Show progress bar.	`True`
`save_all_results`	`bool`	Keep all AggregationResults (memory-intensive).	`True`
`**tsam_kwargs`	`Any`	Additional keyword arguments passed to `tsam.aggregate()`.	`{}`

Returns:

Type	Description
`TuningResult`	Best combination with lowest overall RMSE and full
`TuningResult`	history.

Source code in src/tsam_xarray/_tuning.py

def grid_search(
    da: Any,
    *,
    time_dim: str,
    cluster_dim: Sequence[str] | str,
    max_timesteps: int | None = None,
    timesteps: Sequence[int] | None = None,
    weights: Weights = None,
    period_duration: int | float | str = 24,
    show_progress: bool = True,
    save_all_results: bool = True,
    **tsam_kwargs: Any,
) -> TuningResult:
    """Full grid search for best (n_clusters, n_segments).

    Tests all valid (n_clusters, n_segments) pairs up to
    ``max_timesteps`` and returns the one with the lowest overall
    RMSE.  The complete unfiltered ``history`` is preserved.

    Args:
        da: Input data.
        time_dim: Name of the time dimension.
        cluster_dim: Dimension(s) to cluster together.
        max_timesteps: Maximum total timesteps to test
            (n_clusters * n_segments). Defaults to total
            number of timesteps in the data.
        timesteps: Specific timestep counts to test. Only
            combinations where ``n_clusters * n_segments``
            is in this list are evaluated. Mutually exclusive
            with ``max_timesteps``.
        weights: Per-coordinate weights for clustering and
            RMSE evaluation.
        period_duration: Hours per period (default: 24).
        show_progress: Show progress bar.
        save_all_results: Keep all AggregationResults
            (memory-intensive).
        **tsam_kwargs: Additional keyword arguments passed to
            ``tsam.aggregate()``.

    Returns:
        Best combination with lowest overall RMSE and full
        history.
    """
    n_timesteps_per_period, n_periods, n_timesteps = _infer_time_params(
        da, time_dim, period_duration
    )

    if timesteps is not None and max_timesteps is not None:
        msg = "Cannot specify both 'timesteps' and 'max_timesteps'"
        raise ValueError(msg)

    if max_timesteps is None and timesteps is None:
        max_timesteps = n_timesteps

    allowed = set(timesteps) if timesteps is not None else None
    if max_timesteps is None:
        max_timesteps = max(allowed) if allowed else n_timesteps

    # Generate grid of candidates
    # Cap n_clusters at n_periods - 1 (n_periods = trivial perfect fit)
    max_clusters = n_periods - 1
    candidates: list[tuple[int, int]] = []
    for n_seg in range(1, n_timesteps_per_period + 1):
        for n_clust in range(2, min(max_clusters, max_timesteps // n_seg) + 1):
            total = n_clust * n_seg
            if total <= max_timesteps and (allowed is None or total in allowed):
                candidates.append((n_clust, n_seg))

    if not candidates:
        msg = f"No valid combinations for max_timesteps={max_timesteps}"
        raise ValueError(msg)

    history, all_results, best_rmse, best_result, best_nc, best_ns = (
        _evaluate_candidates(
            candidates,
            da,
            time_dim=time_dim,
            cluster_dim=cluster_dim,
            weights=weights,
            period_duration=period_duration,
            show_progress=show_progress,
            progress_desc="Grid search",
            save_all_results=save_all_results,
            tsam_kwargs=tsam_kwargs,
        )
    )

    if best_result is None:
        msg = "All configurations failed"
        raise RuntimeError(msg)

    return TuningResult(
        n_clusters=best_nc,
        n_segments=best_ns,
        rmse=best_rmse,
        best_result=best_result,
        history=history,
        all_results=all_results,
    )

API Reference¶

tsam_xarray ¶

ClusteringInfo module-attribute ¶

ClusteringResult dataclass ¶

n_clusters property ¶

n_original_periods property ¶

n_timesteps_per_period property ¶

n_segments property ¶

cluster_assignments property ¶

cluster_occurrences property ¶

segment_durations property ¶

cluster_centers property ¶

segment_assignments property ¶

segment_centers property ¶

apply ¶

disaggregate ¶

to_dict ¶

to_json ¶

from_dict classmethod ¶

from_json classmethod ¶

DimNames dataclass ¶

as_tuple ¶

AccuracyMetrics dataclass ¶

AggregationResult dataclass ¶

dim_names property ¶

cluster_weights property ¶

n_clusters property ¶

n_timesteps_per_period property ¶

n_segments property ¶

residuals property ¶

compare ¶

to_dataframe ¶

disaggregate ¶

TuningResult dataclass ¶

summary property ¶

summary_matrix property ¶

reconstructed property ¶

accuracy property ¶

find_by_timesteps ¶

find_by_rmse ¶

plot ¶

aggregate ¶

find_best_combination ¶

find_optimal_combination ¶

find_pareto_front ¶

grid_search ¶

ClusteringInfo `module-attribute` ¶

ClusteringResult `dataclass` ¶

n_clusters `property` ¶

n_original_periods `property` ¶

n_timesteps_per_period `property` ¶

n_segments `property` ¶

cluster_assignments `property` ¶

cluster_occurrences `property` ¶

segment_durations `property` ¶

cluster_centers `property` ¶

segment_assignments `property` ¶

segment_centers `property` ¶

from_dict `classmethod` ¶

from_json `classmethod` ¶

DimNames `dataclass` ¶

AccuracyMetrics `dataclass` ¶

AggregationResult `dataclass` ¶

dim_names `property` ¶

cluster_weights `property` ¶

n_clusters `property` ¶

n_timesteps_per_period `property` ¶

n_segments `property` ¶

residuals `property` ¶

TuningResult `dataclass` ¶

summary `property` ¶

summary_matrix `property` ¶

reconstructed `property` ¶

accuracy `property` ¶