ldwang's picture

ldwang

ldwang

·

ftgreat

AI & ML interests

None yet

Recent Activity

liked a dataset about 3 hours ago

OpenCoder-LLM/RefineCode-code-corpus-meta

upvoted a paper about 3 hours ago

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

upvoted a collection about 3 hours ago

OpenCoder Model

View all activity

Organizations

ldwang's activity

upvoted a paper about 3 hours ago

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published Nov 7, 2024 • 113

upvoted a collection about 3 hours ago

OpenCoder Model

OpenCoder Models • 9 items • Updated Nov 19, 2024 • 10

upvoted a collection about 20 hours ago

MiscModels

2 items • Updated about 20 hours ago • 1

upvoted a paper 2 days ago

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 88

upvoted an article 3 days ago

Article

Low Latency CPU Based Educational Value Classifier With Generic Educational Value

By

•

Jun 12, 2024

• 9

upvoted a collection 7 days ago

Qwen2.5

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Nov 28, 2024 • 453

upvoted an article 11 days ago

Article

LLM数据工程3——数据收集魔法：获取顶级训练数据的方法

By

•

Jun 4, 2024

• 14

upvoted 2 collections 11 days ago

Datasets built with ⚗️ distilabel

This collection contains some datasets generated and/or labelled using https://github.com/argilla-io/distilabel • 8 items • Updated 26 days ago • 12

Synthetic Data Generator

A collection of tools and datasets related to no-code the Synthetic Data Generation. • 16 items • Updated 3 days ago • 5

upvoted a collection 14 days ago

Scaling Test-Time Compute with Open Models

Models and datasets used in our blog post: https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute • 10 items • Updated about 1 hour ago • 19

upvoted a paper 14 days ago

Solving math word problems with process- and outcome-based feedback

Paper • 2211.14275 • Published Nov 25, 2022 • 7

upvoted 3 collections 14 days ago

MiscBlogs

1 item • Updated 14 days ago • 1

MiscTools

Misc tools for llm & vlm. • 6 items • Updated 14 days ago • 1

MiscDatasets

4 items • Updated 1 day ago • 1

upvoted a collection 19 days ago

BGE

23 items • Updated 16 days ago • 72

upvoted a paper 23 days ago

POINTS1.5: Building a Vision-Language Model towards Real World Applications

Paper • 2412.08443 • Published 26 days ago • 38

upvoted 3 collections about 1 month ago

NeMo Curator - Classifier Models

Classifier models that can be used in NeMo Curator for labelling/filtering datasets. • 9 items • Updated about 10 hours ago • 10

Molmo

Artifacts for open multimodal language models. • 5 items • Updated Nov 27, 2024 • 291

Tulu 3 Datasets

All datasets released with Tulu 3 -- state of the art open post-training recipes. • 32 items • Updated Nov 27, 2024 • 64

upvoted a collection about 2 months ago

The Big Benchmarks Collection

Gathering benchmark spaces on the hub (beyond the Open LLM Leaderboard) • 13 items • Updated Nov 18, 2024 • 183