Hello, I am loading images into a Dataset, by casting their urls as datasets.Image
objects.
def load_dataset(db_client: DBClient) -> Dataset:
"""Loads the dataset from the given bucket."""
paths = db_client.missing_image_paths()
paths = list(paths)
def url_from_path(path: str) -> str:
return f'gs://{BUCKET}/{FOLDER}{path}'
return Dataset.from_dict({
'image': [url_from_path(path) for path in paths],
'filename': paths
}).cast_column('image', Image())
Now, some of these images don’t exist anymore. So with print(dataset[0])
, I get:
{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1920x1280 at 0x132D99190>, 'filename': 'b57ed2793e6a8ae06382c78a87863b8d.jpg'}
But if I try to load more, at some point, I get a message similar to that: PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x13592fab0>
Is there some way to specify that we want to ignore those issues and discard the images when that happens?