view post Post 1626 The folks at Foursquare released a dataset of 104.5 million places of interest ( foursquare/fsq-os-places) and here's all of them on a plot See translation 3 replies · 🔥 4 4 🚀 1 1 😔 1 1 + Reply
view post Post 2372 The Lichess database of games, puzzles, and engine evaluations is now on the Hub: https://huggingface.co/LichessBillions of chess data points to download, query, and stream and we're excited to see what you'll build with it! ♟️ 🤗- Lichess/positions-datasets-66f50837db5cd3287d60d489- Lichess/games-datasets-66f508df78f4b43e1bb2d353 See translation 👍 8 8 ❤️ 2 2 🔥 2 2 + Reply
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published Apr 29, 2024 • 69
view post Post TIL: EleutherAI/pile is on Wikipedia: https://en.wikipedia.org/wiki/The_Pile_(dataset) 🤯 5 5 🤗 4 4 ❤️ 1 1 + Reply
Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face Paper • 2302.14534 • Published Feb 28, 2023
MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition Paper • 2210.12391 • Published Oct 22, 2022
Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages Paper • 2210.09984 • Published Oct 18, 2022 • 2
AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages Paper • 2305.06897 • Published May 11, 2023 • 8
GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration Paper • 2306.01481 • Published Jun 2, 2023 • 1
NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation Paper • 2312.11361 • Published Dec 18, 2023 • 1
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models Paper • 2312.02969 • Published Dec 5, 2023 • 13
GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration Paper • 2306.01481 • Published Jun 2, 2023 • 1
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Paper • 2005.11401 • Published May 22, 2020 • 11