Hi,
I have just tried to make my first dataset on the Huggingface website.
Once I upload a few datafiles (3 different xlsx and 1 csv) I see the following message:
but when I click on “files” tab, I see all my files.
When I try to access the dataset using python Jupiter Notebook, I receive the following error message:
FileNotFoundError: Couldn’t find a dataset script at /content/Lord-Goku/testing_1/testing_1.py or any data file in the same directory. Couldn’t find ‘Lord-Goku/testing_1’ on the Hugging Face Hub either: FileNotFoundError: Unable to resolve any data file that matches [‘train[-._ 0-9/]', '[-._ 0-9/]train[-._ 0-9/]', 'training[-._ 0-9/]’, ‘[-._ 0-9/]training[-._ 0-9/]’] in dataset repository Lord-Goku/testing_1 with any supported extension
Even though the dataset is public, I have logged into my huggingface through the Notebook to see if that makes any difference but no luck there.
here is the code I have tried:
dataset = load_dataset(“Lord-Goku/testing_1”)
and:
data_files={“test”:“test.xlsx”}
dataset = load_dataset(“Lord-Goku/testing_1”, data_files = data_files)
neither of them works.
1 Like
Hi!
For the csv
files, you can do:
from datasets import load_dataset
ds = load_dataset("Lord-Goku/testing_1", data_files="nyse-listed.csv")
I don’t think we have a BuilderConfig
for xlsx
files, so you can do this instead:
from datasets import Dataset
import pandas as pd
df = pd.read_excel("https://huggingface.co/datasets/Lord-Goku/testing_1/resolve/main/test.xlsx")
df = pd.DataFrame(df)
dataset = Dataset.from_pandas(df)
1 Like
Thank you so much for your prompt reply.
You are totally right, I re-read the documentation and I realized that Huggingface does not have .xlsx file support.
I was able to retrieve my csv file using your code correctly.
Appreciate your support.
The only issue is that I still have the following message:
" The dataset is currently empty. [Upload or create new data files]. Then, you will be able to explore them in the Dataset Viewer."
Any idea why this is?
I think this might be because your repository isn’t structured properly (you’ve got a mix of xlsx
and csv
files). Can you try organizing your csv
files as shown here?
Also, you can create a dataset with xlsx
files, but you’ll need to write a loading script which is a bit more involved than just uploading your dataset files
2 Likes
Thanks,
You were right. It seems to be in order now.
Appreciate your feedback.
1 Like