waiting for moonshine-distilled next :)
Adam Molnar
lunarflu
AI & ML interests
join the Hugging Face discord! hf.co/discord/join
Recent Activity
liked
a Space
1 day ago
khouraisan/fumo-classifier
upvoted
an
article
3 days ago
Fine-tune ModernBERT for text classification using synthetic data
Organizations
lunarflu's activity
reacted to
Xenova's
post with ππ₯β€οΈ
7 days ago
Post
2694
Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser!
π Faster and more accurate than Whisper
π Privacy-focused (no data leaves your device)
β‘οΈ WebGPU accelerated (w/ WASM fallback)
π₯ Powered by ONNX Runtime Web and Transformers.js
Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web
π Faster and more accurate than Whisper
π Privacy-focused (no data leaves your device)
β‘οΈ WebGPU accelerated (w/ WASM fallback)
π₯ Powered by ONNX Runtime Web and Transformers.js
Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web
reacted to
ginipick's
post with π₯
9 days ago
Post
4506
π¬ Revolutionize Your Video Creation
Dokdo Multimodal AI Transform a single image into a stunning video with perfect audio harmony! π
Superior Technology π«
Advanced Flow Matching: Smoother video transitions surpassing Kling and Sora
Intelligent Sound System: Automatically generates perfect audio by analyzing video mood
Multimodal Framework: Advanced AI integrating image, text, and audio analysis
Outstanding Performance π―
Ultra-High Resolution: 4K video quality with bfloat16 acceleration
Real-Time Optimization: 3x faster processing with PyTorch GPU acceleration
Smart Sound Matching: Real-time audio effects based on scene transitions and motion
Exceptional Features β¨
Custom Audio Creation: Natural soundtrack matching video tempo and rhythm
Intelligent Watermarking: Adaptive watermark adjusting to video characteristics
Multilingual Support: Precise translation engine powered by Helsinki-NLP
Versatile Applications π
Social Media Marketing: Create engaging shorts for Instagram and YouTube
Product Promotion: Dynamic promotional videos highlighting product features
Educational Content: Interactive learning materials with enhanced engagement
Portfolio Enhancement: Professional-grade videos showcasing your work
Experience the video revolution with Dokdo Multimodal, where anyone can create professional-quality content from a single image. Elevate your content with perfectly synchronized video and audio that captivates your audience! π¨
Start creating stunning videos that stand out from the crowd - whether you're a marketer, educator, content creator, or business owner. Join the future of AI-powered video creation today!
ginipick/Dokdo-multimodal
#VideoInnovation #AITechnology #PremiumContent #MarketingSolution
π Please turn on your sound for the best viewing experience!
Dokdo Multimodal AI Transform a single image into a stunning video with perfect audio harmony! π
Superior Technology π«
Advanced Flow Matching: Smoother video transitions surpassing Kling and Sora
Intelligent Sound System: Automatically generates perfect audio by analyzing video mood
Multimodal Framework: Advanced AI integrating image, text, and audio analysis
Outstanding Performance π―
Ultra-High Resolution: 4K video quality with bfloat16 acceleration
Real-Time Optimization: 3x faster processing with PyTorch GPU acceleration
Smart Sound Matching: Real-time audio effects based on scene transitions and motion
Exceptional Features β¨
Custom Audio Creation: Natural soundtrack matching video tempo and rhythm
Intelligent Watermarking: Adaptive watermark adjusting to video characteristics
Multilingual Support: Precise translation engine powered by Helsinki-NLP
Versatile Applications π
Social Media Marketing: Create engaging shorts for Instagram and YouTube
Product Promotion: Dynamic promotional videos highlighting product features
Educational Content: Interactive learning materials with enhanced engagement
Portfolio Enhancement: Professional-grade videos showcasing your work
Experience the video revolution with Dokdo Multimodal, where anyone can create professional-quality content from a single image. Elevate your content with perfectly synchronized video and audio that captivates your audience! π¨
Start creating stunning videos that stand out from the crowd - whether you're a marketer, educator, content creator, or business owner. Join the future of AI-powered video creation today!
ginipick/Dokdo-multimodal
#VideoInnovation #AITechnology #PremiumContent #MarketingSolution
π Please turn on your sound for the best viewing experience!
reacted to
vincentg64's
post with π₯
9 days ago
Post
2194
LLM 2.0, RAG & Non-Standard Gen AI on GitHub https://mltblog.com/3DsyZSq
In this article, I share my latest Gen AI and LLM advances, featuring innovative approaches radically different from both standard AI and classical ML/NLP. The focus is on doing better with less, using efficient architectures, new algorithms and evaluation metrics. It originates from research that I started long ago. It gained significant momentum in the last two years. See background and history at https://mltblog.com/4g2sKTv.
OpenAI, Perplexity, Anthropic, Llama and others typically follow the trend and implement solutions very similar to mines within 3 to 6 months after I publish new milestones. For instance, multi-tokens, knowledge graph tokens, multi-indexes, real-time fine-tuning, mixtures of experts, LLM routers, small enterprise sub-LLMs, prompt distillation, relevancy scoring engine, deep contextual retrieval, optimum agentic chunking, and modern UI instead of the basic prompt box. I keep adding new features all the time, staying ahead of competition.
β‘οΈ Read full article with links to GitHub, at https://mltblog.com/3DsyZSq
In this article, I share my latest Gen AI and LLM advances, featuring innovative approaches radically different from both standard AI and classical ML/NLP. The focus is on doing better with less, using efficient architectures, new algorithms and evaluation metrics. It originates from research that I started long ago. It gained significant momentum in the last two years. See background and history at https://mltblog.com/4g2sKTv.
OpenAI, Perplexity, Anthropic, Llama and others typically follow the trend and implement solutions very similar to mines within 3 to 6 months after I publish new milestones. For instance, multi-tokens, knowledge graph tokens, multi-indexes, real-time fine-tuning, mixtures of experts, LLM routers, small enterprise sub-LLMs, prompt distillation, relevancy scoring engine, deep contextual retrieval, optimum agentic chunking, and modern UI instead of the basic prompt box. I keep adding new features all the time, staying ahead of competition.
β‘οΈ Read full article with links to GitHub, at https://mltblog.com/3DsyZSq
reacted to
merve's
post with ππβ€οΈπ₯
11 days ago
Post
2731
Aya by Cohere For AI can now see! π
C4AI community has built Maya 8B, a new open-source multilingual VLM built on SigLIP and Aya 8B π± works on 8 languages! π£οΈ
The authors extend Llava dataset using Aya's translation capabilities with 558k examples!
ry it here kkr5155/maya_demo
Dataset maya-multimodal/pretrain
Model maya-multimodal/maya π
kudos @nahidalam and team
C4AI community has built Maya 8B, a new open-source multilingual VLM built on SigLIP and Aya 8B π± works on 8 languages! π£οΈ
The authors extend Llava dataset using Aya's translation capabilities with 558k examples!
ry it here kkr5155/maya_demo
Dataset maya-multimodal/pretrain
Model maya-multimodal/maya π
kudos @nahidalam and team
reacted to
lewtun's
post with πππβ€οΈπ₯
18 days ago
Post
6630
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute π₯
How? By combining step-wise reward models with tree search algorithms :)
We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"
We're open sourcing the full recipe and sharing a detailed blog post.
In our blog post we cover:
π Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.
π Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.
π§ Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM
Here's the links:
- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute
- Code: https://github.com/huggingface/search-and-learn
Enjoy!
How? By combining step-wise reward models with tree search algorithms :)
We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"
We're open sourcing the full recipe and sharing a detailed blog post.
In our blog post we cover:
π Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.
π Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.
π§ Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM
Here's the links:
- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute
- Code: https://github.com/huggingface/search-and-learn
Enjoy!
reacted to
lorraine2's
post with π
18 days ago
Post
1987
π¦New NVIDIA paper: LLaMA-Mesh π¦
We enable large language models to generate and understand 3D meshes by representing them as text and fine-tuning. This unifies the 3D and text modalities in a single model and preserves language abilities, unlocking conversational 3D creation with mesh understanding.
π Project Page: https://research.nvidia.com/labs/toronto-ai/LLaMA-Mesh/
πΉοΈ Interactive Demo: Zhengyi/LLaMA-Mesh (courtesy of HuggingFace and Gradio)
π Full Paper: https://arxiv.org/abs/2411.09595
π¨βπ»Code: https://github.com/nv-tlabs/LLaMa-Mesh
πΎ Model Checkpoint: Zhengyi/LLaMA-Mesh
𧩠Blender Addon: https://github.com/huggingface/meshgen (courtesy of Dylan Ebert)
π₯ 5-min Overview Video: https://youtu.be/eZNazN-1lPo?si=-idQa5aaceVw0Bbj (courtesy of AI Papers Academy)
We enable large language models to generate and understand 3D meshes by representing them as text and fine-tuning. This unifies the 3D and text modalities in a single model and preserves language abilities, unlocking conversational 3D creation with mesh understanding.
π Project Page: https://research.nvidia.com/labs/toronto-ai/LLaMA-Mesh/
πΉοΈ Interactive Demo: Zhengyi/LLaMA-Mesh (courtesy of HuggingFace and Gradio)
π Full Paper: https://arxiv.org/abs/2411.09595
π¨βπ»Code: https://github.com/nv-tlabs/LLaMa-Mesh
πΎ Model Checkpoint: Zhengyi/LLaMA-Mesh
𧩠Blender Addon: https://github.com/huggingface/meshgen (courtesy of Dylan Ebert)
π₯ 5-min Overview Video: https://youtu.be/eZNazN-1lPo?si=-idQa5aaceVw0Bbj (courtesy of AI Papers Academy)
reacted to
YerbaPage's
post with π
18 days ago
Post
1421
Curated list of **Repository-level Code Generation** papers & benchmarks! π₯
Stay ahead with the latest in:
β Repo-level Issue Resolution
β Repo-level Code Completion
β Datasets & Benchmarks
π Check it out: https://github.com/YerbaPage/Awesome-Repo-Level-Code-Generation π₯
Stay ahead with the latest in:
β Repo-level Issue Resolution
β Repo-level Code Completion
β Datasets & Benchmarks
π Check it out: https://github.com/YerbaPage/Awesome-Repo-Level-Code-Generation π₯
reacted to
wenhuach's
post with π₯π
18 days ago
Post
1801
AutoRound has demonstrated strong results even at 2-bit precision for VLM models like QWEN2-VL-72B. Check it out here:
OPEA/Qwen2-VL-72B-Instruct-int2-sym-inc.