Dataset viewer documentation

Check dataset validity

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Check dataset validity

Before you download a dataset from the Hub, it is helpful to know if a specific dataset you’re interested in is available. The dataset viewer provides the /is-valid endpoint to check if a specific dataset works without any errors.

The API endpoint will return an error for datasets that cannot be loaded with the 🤗 Datasets library, for example, because the data hasn’t been uploaded or the format is not supported.

The largest datasets are partially supported by the dataset viewer. If they are streamable, Datasets Server can extract the first 100 rows without downloading the whole dataset. This is especially useful for previewing large datasets where downloading the whole dataset may take hours! See the preview field in the response of /is-valid to check if a dataset is partially supported.

This guide shows you how to check dataset validity programmatically, but free to try it out with Postman, RapidAPI, or ReDoc.

Check if a dataset is valid

/is-valid checks whether a specific dataset loads without any error. This endpoint’s query parameter requires you to specify the name of the dataset:

Python
JavaScript
cURL
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "/static-proxy?url=https%3A%2F%2Fdatasets-server.huggingface.co%2Fis-valid%3Fdataset%3Dcornell-movie-review-data%2Frotten_tomatoes%26quot%3B%3C%2Fspan%3E
def query():
    response = requests.get(API_URL, headers=headers)
    return response.json()
data = query()

The response looks like this if a dataset is valid:

{
  "viewer": true,
  "preview": true,
  "search": true,
  "filter": true,
  "statistics": true,
}

The response looks like this if a dataset is valid but /search is not available for it:

{
  "viewer": true,
  "preview": true,
  "search": false,
  "filter": true,
  "statistics": true,
}

The response looks like this if a dataset is valid but /filter is not available for it:

{
  "viewer": true,
  "preview": true,
  "search": true,
  "filter": false,
  "statistics": true,
}

Similarly, if the statistics are not available:

{
  "viewer": true,
  "preview": true,
  "search": true,
  "filter": true,
  "statistics": false,
}

If only the first rows of a dataset are available, then the response looks like:

{
  "viewer": false,
  "preview": true,
  "search": true,
  "filter": true,
  "statistics": true,
}

Finally, if the dataset is not valid at all, then the response is:

{
  "viewer": false,
  "preview": false,
  "search": false,
  "filter": false,
  "statistics": false,
}

Some cases where a dataset is not valid are:

  • the dataset viewer is disabled
  • the dataset is gated but the access is not granted: no token is passed or the passed token is not authorized
  • the dataset is private but the owner is not a PRO user or an Enterprise Hub org
  • the dataset contains no data or the data format is not supported
Remember if a dataset is gated, you'll need to provide your user token to submit a successful query!
< > Update on GitHub