Hi there,
the last “Try it out!” task here asks us
- to “create an own dataset of GitHub issues” and
- to “fine-tune a multilabel classifier” (for bonus points ).
I have created this dataset. It has 57 different labels and an instance may be labelled with any combination of those. I would like to add the class label names ["bug", "benchmark", "performance", ...]
to the dataset. Inspired by this forum post, I have tried the following, yet without success:
features = transformers_issues_labels.features.copy()
features["arr_labels"] = ClassLabel(names=unique_labels)
transformers_issues_labels = transformers_issues_labels.map(
lambda batch: batch, batched=False, features=features
)
TypeError: Couldn't cast array of type list<item: int64> to int64
=> Two questions:
- How to build a classifier for this task (e.g. “MultiLabelFromPretrainedClassifier” or something like this…)?
- How can I add the class label names to my dataset (specifically to the “arr_labels” features, assuming this makes sense)?
P.s. In any case: Thanks a ton to all contributors of this course. I am learning a lot and looking forward to part 3.