Filesystem API
The HfFileSystem
class provides a pythonic file interface to the Hugging Face Hub based on fsspec
.
HfFileSystem
HfFileSystem
is based on fsspec, so it is compatible with most of the APIs that it offers. For more details, check out our guide and fsspec’s API Reference.
class huggingface_hub.HfFileSystem
< source >( *args **kwargs )
Parameters
- token (
str
orbool
, optional) — A valid user access token (string). Defaults to the locally saved token, which is the recommended method for authentication (see https://huggingface.co/docs/huggingface_hub/quick-start#authentication). To disable authentication, passFalse
. - endpoint (
str
, optional) — Endpoint of the Hub. Defaults to https://huggingface.co.
Access a remote Hugging Face Hub repository as if were a local file system.
HfFileSystem provides fsspec compatibility, which is useful for libraries that require it (e.g., reading
Hugging Face datasets directly with pandas
). However, it introduces additional overhead due to this compatibility
layer. For better performance and reliability, it’s recommended to use HfApi
methods when possible.
Usage:
>>> from huggingface_hub import HfFileSystem
>>> fs = HfFileSystem()
>>> # List files
>>> fs.glob("my-username/my-model/*.bin")
['my-username/my-model/pytorch_model.bin']
>>> fs.ls("datasets/my-username/my-dataset", detail=False)
['datasets/my-username/my-dataset/.gitattributes', 'datasets/my-username/my-dataset/README.md', 'datasets/my-username/my-dataset/data.json']
>>> # Read/write files
>>> with fs.open("my-username/my-model/pytorch_model.bin") as f:
... data = f.read()
>>> with fs.open("my-username/my-model/pytorch_model.bin", "wb") as f:
... f.write(data)
__init__
< source >( *args endpoint: typing.Optional[str] = None token: typing.Union[bool, str, NoneType] = None **storage_options )
cp_file
< source >( path1: str path2: str revision: typing.Optional[str] = None **kwargs )
Copy a file within or between repositories.
Note: When possible, use HfApi.upload_file()
for better performance.
exists
< source >( path **kwargs ) → bool
Check if a file exists.
For more details, refer to fsspec documentation.
Note: When possible, use HfApi.file_exists()
for better performance.
find
< source >( path: str maxdepth: typing.Optional[int] = None withdirs: bool = False detail: bool = False refresh: bool = False revision: typing.Optional[str] = None **kwargs ) → Union[List[str], Dict[str, Dict[str, Any]]]
Parameters
- path (
str
) — Root path to list files from. - maxdepth (
int
, optional) — Maximum depth to descend into subdirectories. - withdirs (
bool
, optional) — Include directory paths in the output. Defaults to False. - detail (
bool
, optional) — If True, returns a dict mapping paths to file information. Defaults to False. - refresh (
bool
, optional) — If True, bypass the cache and fetch the latest data. Defaults to False. - revision (
str
, optional) — The git revision to list from.
Returns
Union[List[str], Dict[str, Dict[str, Any]]]
List of paths or dict of file information.
List all files below path.
For more details, refer to fsspec documentation.
get_file
< source >( rpath lpath callback = <fsspec.callbacks.NoOpCallback object at 0x7fc730d8ef80> outfile = None **kwargs )
Copy single remote file to local.
Note: When possible, use HfApi.hf_hub_download()
for better performance.
glob
< source >( path: str **kwargs ) → List[str]
Find files by glob-matching.
For more details, refer to fsspec documentation.
info
< source >( path: str refresh: bool = False revision: typing.Optional[str] = None **kwargs ) → Dict[str, Any]
Parameters
- path (
str
) — Path to get info for. - refresh (
bool
, optional) — If True, bypass the cache and fetch the latest data. Defaults to False. - revision (
str
, optional) — The git revision to get info from.
Returns
Dict[str, Any]
Dictionary containing file information (type, size, commit info, etc.).
Get information about a file or directory.
For more details, refer to fsspec documentation.
Note: When possible, use HfApi.get_paths_info()
or HfApi.repo_info()
for better performance.
invalidate_cache
< source >( path: typing.Optional[str] = None )
Clear the cache for a given path.
For more details, refer to fsspec documentation.
isdir
< source >( path ) → bool
Check if a path is a directory.
For more details, refer to fsspec documentation.
isfile
< source >( path ) → bool
Check if a path is a file.
For more details, refer to fsspec documentation.
ls
< source >( path: str detail: bool = True refresh: bool = False revision: typing.Optional[str] = None **kwargs ) → List[Union[str, Dict[str, Any]]]
Parameters
- path (
str
) — Path to the directory. - detail (
bool
, optional) — If True, returns a list of dictionaries containing file information. If False, returns a list of file paths. Defaults to True. - refresh (
bool
, optional) — If True, bypass the cache and fetch the latest data. Defaults to False. - revision (
str
, optional) — The git revision to list from.
Returns
List[Union[str, Dict[str, Any]]]
List of file paths (if detail=False) or list of file information dictionaries (if detail=True).
List the contents of a directory.
For more details, refer to fsspec documentation.
Note: When possible, use HfApi.list_repo_tree()
for better performance.
modified
< source >( path: str **kwargs ) → datetime
Get the last modified time of a file.
For more details, refer to fsspec documentation.
resolve_path
< source >( path: str revision: typing.Optional[str] = None ) → HfFileSystemResolvedPath
Parameters
- path (
str
) — Path to resolve. - revision (
str
, optional) — The revision of the repo to resolve. Defaults to the revision specified in the path.
Returns
HfFileSystemResolvedPath
Resolved path information containing repo_type
, repo_id
, revision
and path_in_repo
.
Raises
ValueError
or NotImplementedError
ValueError
— If path contains conflicting revision information.NotImplementedError
— If trying to list repositories.
Resolve a Hugging Face file system path into its components.
rm
< source >( path: str recursive: bool = False maxdepth: typing.Optional[int] = None revision: typing.Optional[str] = None **kwargs )
Delete files from a repository.
For more details, refer to fsspec documentation.
Note: When possible, use HfApi.delete_file()
for better performance.
url
< source >( path: str ) → str
Get the HTTP URL of the given path.
walk
< source >( path: str *args **kwargs ) → Iterator[Tuple[str, List[str], List[str]]]
Return all files below the given path.
For more details, refer to fsspec documentation.