arxiv:2411.19870
Jeffrey Quesnelle PRO
emozilla
AI & ML interests
None yet
Recent Activity
authored
a paper
30 days ago
DeMo: Decoupled Momentum Optimization
updated
a model
30 days ago
bug-free-chainsaw/15b-100bt
commented
a paper
about 1 month ago
DeMo: Decoupled Momentum Optimization
Organizations
Papers
3
models
87
emozilla/llama2-15b-gqa-init
Text Generation
•
Updated
•
315
emozilla/llama2-1.1b-gqa-init
Text Generation
•
Updated
•
27
emozilla/llama2-15b-init
Text Generation
•
Updated
•
24
emozilla/llama2-1.2b-nanotron-init
Updated
•
44
emozilla/llama2-1.2b-init-6
Text Generation
•
Updated
•
26
emozilla/smol-3b-init
Text Generation
•
Updated
•
10
emozilla/smol-7b-init
Text Generation
•
Updated
•
9
emozilla/smol-15b-init
Text Generation
•
Updated
•
16
emozilla/llama2-215m-init
Text Generation
•
Updated
•
47
emozilla/llama3-1.4b-init-2
Text Generation
•
Updated
•
22
•
1
datasets
50
emozilla/dolma-v1_7-30B-tokenized-llama2-nanoset
Updated
•
47
emozilla/fineweb-10bt-tokenized-datatrove-llama2
Updated
•
66
•
1
emozilla/fineweb-350bt-tokenized-datatrove-llama2
Updated
•
56
emozilla/dolma-v1_7-305B-tokenized-llama2-nanoset
Updated
•
83
emozilla/proofpile-test-tokenized-llama3
Viewer
•
Updated
•
46.3k
•
30
emozilla/PaulGrahamEssays
Viewer
•
Updated
•
49
•
30
emozilla/dolma-v1_7-cc_en_head
Viewer
•
Updated
•
475M
•
41
emozilla/dolma-v1_7-c4
Viewer
•
Updated
•
250M
•
131
•
1
emozilla/dolma-v1_7-305B-tokenized-llama3-nanoset
Updated
•
233
emozilla/dolma-v1_7-books
Viewer
•
Updated
•
56k
•
6
•
1