Omar Sanseviero's picture

Omar Sanseviero

osanseviero

·

https://osanseviero.github.io/hackerllama/

AI & ML interests

Llamas, model merging, massive ASR for data collection, 3D ML, on-device ML, quantization, model judging, ML in browser, healthcare applications, education, intersection of art and ML.🦙

Recent Activity

upvoted a paper about 20 hours ago

OpenAI o1 System Card

upvoted a paper about 20 hours ago

In Case You Missed It: ARC 'Challenge' Is Not That Challenging

upvoted a paper about 20 hours ago

ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

View all activity

Articles

Llama can now see and run on your device - welcome Llama 3.2

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

WWDC 24: Running Mistral 7B with Core ML

How we leveraged distilabel to create an Argilla 2.0 Chatbot

Welcome Gemma 2 - Google's new open LLM

Welcome Llama 3 - Meta's new open LLM

CodeGemma - an official Google release for code LLMs

🪆 Introduction to Matryoshka Embedding Models

Welcome Gemma - Google's new open LLM

Constitutional AI with Open LLMs

Preference Tuning LLMs with Direct Preference Optimization Methods

Mixture of Experts Explained

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

Inference for PROs

Spread Your Wings: Falcon 180B is here

Code Llama: Llama 2 learns to code

Results of the Open Source AI Game Jam

Llama 2 is here - get it on Hugging Face

The Falcon has landed in the Hugging Face ecosystem

Hugging Face Machine Learning Demos on arXiv

What's new in Diffusers? 🎨

Announcing Evaluation on the Hub

An Introduction to Deep Reinforcement Learning

Welcome spaCy to the 🤗 Hub

Sentence Transformers in the 🤗 Hub

Organizations

Posts 19

Post

10118

Diaries of Open Source. Part 15 🤗

🕵️‍♀️Idefics 2 is out, a multimodal open-source model with very nice capabilities
Models, demo, and datasets: HuggingFaceM4/idefics2-661d1971b7c50831dd3ce0fe
Blog: https://hf.co/blog/idefics2

💾Snowflake released snowflake-arctic-embed, a family of powerful small embedding models
Model: Snowflake/snowflake-arctic-embed-m
Blog: https://www.snowflake.com/blog/introducing-snowflake-arctic-embed-snowflakes-state-of-the-art-text-embedding-family-of-models/

✨Pile-T5, EleutherAI's T5 model trained on 2T tokens
Blog: https://blog.eleuther.ai/pile-t5/
Models: EleutherAI/pile-t5-65a76a0d0022dd270b385a66
GitHub: https://github.com/EleutherAI/improved-t5

🤖CodeQwen1.5-7B base and chat models. Models trained on 3T tokens strong benchmark results for code generation, editing and SQL
Blog post: https://qwenlm.github.io/blog/codeqwen1.5/
Demo: Qwen/CodeQwen1.5-7b-Chat-demo
Models: Qwen/CodeQwen1.5-7B and Qwen/CodeQwen1.5-7B-Chat

Misc
🦉 DocOwl1.5: Unified Stucture Learning for OCR-free Document Understanding mPLUG/DocOwl
👀Cerule - a tiny Vision LM model Tensoic/Cerule-v0.1
ChemLLM - a LLM for chemistry and molecule science ⚗️https://hf.co/AI4Chem/ChemLLM-7B-Chat-1.5-DPO
Distil Whisper Large
📝New pdf/OCR datasets with 19 samples pixparse/pdf-document-ocr-datasets-660701430b0346f97c4bc628
🔥Gretel AI high quality text-to-sql synthetic dataset gretelai/synthetic_text_to_sql

Post

9571

Diaries of Open Source. Part 14 🤗

🔥CohereForAI releases Command R+, an open 104B model with:
- Tool usage capabilities
- Specialized in RAGs
- Multilingual
It's one of the first models to surpass GPT-4 in the lmsys arena, check it out!
Model: CohereForAI/c4ai-command-r-plus
Official demo: https://hf.co/spaces/CohereForAI/c4ai-command-r-plus
Quantized: CohereForAI/c4ai-command-r-plus-4bit

🎉Google releases a new version of their Gemma instruct models, with improved quality, nicer to converse, and a fancier RL algorithm. The model is similar to Llama 2 70B in the Chat Arena!
Models: google/gemma-release-65d5efbccdbb8c4202ec078b
Try it out in HuggingChat https://hf.co/chat/models/google/gemma-1.1-7b-it

🪄VoiceCraft, a speech editing and TTS SOTA open model
Paper: VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild (2403.16973)
Model: pyp1/VoiceCraft

💻Google released CodeGemma, a family of code generation, completion, and chat models
Blog post: https://hf.co/blog/codegemma
Models: google/codegemma-release-66152ac7b683e2667abdee11
Report: https://storage.googleapis.com/deepmind-media/gemma/codegemma_report.pdf

Misc models:
🦖T-Rex2, a very powerful object detection model for many applications https://github.com/IDEA-Research/T-Rex
👀 CT-RATE : A 3D dataset paired with text reports ibrahimhamamci/CT-RATE
🐙Octopus v2: a Gemma-based model trained for Android API - extremely fast, better than Llama+RAG, great results NexaAIDev/Octopus-v2

Collections 13

Papers 4

arxiv:2310.16944

arxiv:2303.12582

arxiv:2211.05100

arxiv:2210.01970

spaces 179

Gemini Coder

How Much Do I Cost

Distilabel Dataset Generator

Mistral Super Fast

Non Streaming Example

Build your Whisper demo

models 301

osanseviero/qwen2.5_0.5b-instruct-q2_K_test

osanseviero/qwen2.5-0.5b-instruct-q2_K

Updated Oct 10 • 7

osanseviero/o-blob-3.2

Updated Oct 10 • 12

osanseviero/test-in-go7

osanseviero/test-in-go6

osanseviero/test-in-go5

osanseviero/Reflection-Llama-3.1-70B-GGUF

Text Generation • Updated Sep 16 • 53

osanseviero/test-in-go4

osanseviero/test-in-go3

osanseviero/test-in-go

datasets 38

osanseviero/super-fun-llamas

Viewer • Updated Sep 13 • 10 • 52 • 1

osanseviero/fun_llamas

Viewer • Updated Sep 12 • 50 • 67

osanseviero/my-llamas

Viewer • Updated Sep 11 • 100 • 44

osanseviero/bill_summary_us_chunks-similarity

Viewer • Updated Jul 12 • 2k • 35

osanseviero/bill_summary_us_chunks

Viewer • Updated Jul 12 • 3.45M • 56

osanseviero/testing_geospatial

Updated Jul 8 • 34

osanseviero/ag_misclassifications

Viewer • Updated Oct 8, 2023 • 200 • 7

osanseviero/test_hacks

Updated Apr 28, 2023 • 4

osanseviero/example_ola

Viewer • Updated Mar 24, 2023 • 2 • 3

osanseviero/langchain_hub_test

Viewer • Updated Jan 30, 2023 • 1 • 6