I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.
The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.
I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.
Coming back to Paris Friday to open our new Hugging Face office!
We're at capacity for the party but add your name in the waiting list as we're trying to privatize the passage du Caire for extra space for robots π€π¦Ύπ¦Ώ
This model can handle tasks that vary from OCR to semantic segmentation.
The difference from previous models is that the authors have compiled a dataset consisting of 126M images with 5.4B annotations labelled with their own data engine pseudolabelled by smaller specialized models and APIs.
The model has a similar architecture to previous models: an image encoder and a multimodality encoder with a text decoder. The authors have compiled the multitask dataset with prompts for each task.
You can also fine-tune this model on any task of choice. The authors also released different results on downstream tasks and reported their results when un/freezing the vision encoder π€π They have released fine-tuned models too, you can find them in the collection above π€
just landed at Hugging Face Hub: community-led computer vision course ππ€ learn from fundamentals to details of the bleeding edge vision transformers!
Released a new version of vikhyatk/moondream2 today! Primarily focused on improving OCR and captioning (e.g. "Describe this image", "Describe this image in one sentence"), but also seeing general improvement across all benchmarks.