Collections

9

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5, 2024 • 41
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 44
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 48
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 75

62

183

18

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

allenai/OLMo-2-1124-13B-Instruct

allenai/OLMo-2-1124-7B-Instruct

allenai/OLMo-2-1124-13B-DPO

allenai/OLMo-2-1124-7B-DPO

Open LLM Leaderboard

MTEB Leaderboard

Chatbot Arena Leaderboard

LLM-Perf Leaderboard

Scaling test-time compute

meta-llama/Llama-3.2-1B-Instruct

RLHFlow/Llama3.1-8B-PRM-Deepseek-Data

HuggingFaceH4/MATH-500

Qwen2-VL-72B

Qwen/Qwen2-VL-2B

Qwen/Qwen2-VL-2B-Instruct

Qwen/Qwen2-VL-7B

knowledgator/gliner-multitask-v1.0

knowledgator/gliner-multitask-large-v0.5

GLiNER HandyLab

GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks

microsoft/Phi-3.5-mini-instruct

microsoft/Phi-3.5-MoE-instruct

microsoft/Phi-3.5-vision-instruct

microsoft/Phi-3-mini-4k-instruct

DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters

DavidAU/MPT-7b-WizardLM_Uncensored-Storywriter-Merge-Q6_K-GGUF

DavidAU/Buttocks-7B-v1.0-Q6_K-GGUF

DavidAU/llama-2-16b-nastychat-Q6_K-GGUF

OS-Copilot/OS-Genesis-7B-AC

OS-Copilot/OS-Genesis-8B-AC

OS-Copilot/OS-Genesis-4B-AC

deepseek-ai/deepseek-vl2-tiny

deepseek-ai/deepseek-vl2-small

deepseek-ai/deepseek-vl2

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding