Utility Functions
setfit.get_templated_dataset
< source >( dataset: typing.Optional[datasets.arrow_dataset.Dataset] = None candidate_labels: typing.Optional[typing.List[str]] = None reference_dataset: typing.Optional[str] = None template: str = 'This sentence is {}' sample_size: int = 2 text_column: str = 'text' label_column: str = 'label' multi_label: bool = False label_names_column: str = 'label_text' ) → Dataset
Parameters
- dataset (
Dataset
, optional) — A Dataset to add templated examples to. - candidate_labels (
List[str]
, optional) — The list of candidate labels to be fed into the template to construct examples. - reference_dataset (
str
, optional) — A dataset to take labels from, ifcandidate_labels
is not supplied. - template (
str
, optional, defaults to"This sentence is {}"
) — The template used to turn each label into a synthetic training example. This template must include a {} for the candidate label to be inserted into the template. For example, the default template is “This sentence is {}.” With the candidate label “sports”, this would produce an example “This sentence is sports”. - sample_size (
int
, optional, defaults to 2) — The number of examples to make for each candidate label. - text_column (
str
, optional, defaults to"text"
) — The name of the column containing the text of the examples. - label_column (
str
, optional, defaults to"label"
) — The name of the column indataset
containing the labels of the examples. - multi_label (
bool
, optional, defaults toFalse
) — Whether or not multiple candidate labels can be true. - label_names_column (
str
, optional, defaults to “label_text”) — The name of the label column in thereference_dataset
, to be used in case there is no ClassLabel feature for the label column.
Returns
Dataset
A copy of the input Dataset with templated examples added.
Raises
ValueError
ValueError
— If the input Dataset is not empty and one or both of the provided column names are missing.
Create templated examples for a reference dataset or reference labels.
If candidate_labels
is supplied, use it for generating the templates.
Otherwise, use the labels loaded from reference_dataset
.
If input Dataset is supplied, add the examples to it, otherwise create a new Dataset.
The input Dataset is assumed to have a text column with the name text_column
and a
label column with the name label_column
, which contains one-hot or multi-hot
encoded label sequences.
setfit.sample_dataset
< source >( dataset: Dataset label_column: str = 'label' num_samples: int = 8 seed: int = 42 )
Samples a Dataset to create an equal number of samples per class (when possible).