dataeval.utils.dataset.split_dataset
====================================

.. py:function:: dataeval.utils.dataset.split_dataset(labels, num_folds = 1, stratify = False, split_on = None, metadata = None, test_frac = 0.0, val_frac = 0.0)

   Top level splitting function. Returns a dataclass containing a list of train and validation indices.
   Indices for a test holdout may also be optionally included

   :param labels: Classification Labels used to generate splits. Determines the size of the dataset
   :type labels: list or NDArray of ints
   :param num_folds: Number of [train, val] folds. If equal to 1, val_frac must be greater than 0.0
   :type num_folds: int, default 1
   :param stratify: If true, dataset is split such that the class distribution of the entire dataset is
                    preserved within each [train, val] partition, which is generally recommended.
   :type stratify: bool, default False
   :param split_on: Keys of the metadata dictionary upon which to group the dataset.
                    A grouped partition is divided such that no group is present within both the training and
                    validation set. Split_on groups should be selected to mitigate validation bias
   :type split_on: list or None, default None
   :param metadata: Dict containing data for potential dataset grouping. See split_on above
   :type metadata: dict or None, default None
   :param test_frac: Fraction of data to be optionally held out for test set
   :type test_frac: float, default 0.0
   :param val_frac: Fraction of training data to be set aside for validation in the case where a single
                    [train, val] split is desired
   :type val_frac: float, default 0.0

   :returns: **split_defs** -- Output class containing a list of indices of training
             and validation data for each fold and optional test indices
   :rtype: SplitDatasetOutput

   :raises TypeError: Raised if split_on is passed, but metadata is None or empty

   .. note::

      When specifying groups and/or stratification, ratios for test and validation splits can vary
      as the stratification and grouping take higher priority than the percentages