Hello,
For some experiments I have made, I have already divided my local private data into “train”, “test”, and “valid” and saved them into a single .json file with the following stucture:
{
“train”: [
{
“text”: “this is the first example”
“label”: 2
},
{
“text”: “this is the second example”
“label”: 1
},
],
“test”: …
“valid”: …
}
but I struggle to create a Dataset out of it. In this tutorial (in memory section):
https://huggingface.co/docs/datasets/loading_datasets.html#from-a-python-dictionary
It isn’t clear how to proceed. I looked at the source code of arrow_dataset.py for this method from_dict, but I didn’t manage to find out the solution to my problem.
Does anyone have an idea (except splitting the original file into several files and reformating them)?