2 4

Alan Tseng

agentlans

agentlans

AI & ML interests

Small data, boring AI

Recent Activity

updated a model about 14 hours ago

agentlans/Llama-3.2-1B-Instruct-CrashCourse12K

updated a model about 24 hours ago

agentlans/deberta-v3-base-zyda-2-v2

updated a model 1 day ago

agentlans/multilingual-e5-small-aligned-v2

View all activity

Organizations

None yet

agentlans's activity

updated a model about 14 hours ago

agentlans/Llama-3.2-1B-Instruct-CrashCourse12K

Updated about 14 hours ago

updated a model about 24 hours ago

agentlans/deberta-v3-base-zyda-2-v2

updated a model 1 day ago

agentlans/multilingual-e5-small-aligned-v2

Sentence Similarity • Updated 1 day ago

updated a model 2 days ago

agentlans/multilingual-e5-small-aligned

updated a model 3 days ago

agentlans/Gemma2-9B-AdvancedFuse

Text Generation • Updated 3 days ago • 30

reacted to grimjim's post with 👍 3 days ago

Post

2439

I'm (finally) releasing a Python script that trims excess weights in Gemma2 full-weight models that bloated by ~1B parameters due to an early mergekit bug.
https://github.com/jim-plus/Gemma2-mergekit-remediation

I'd noticed something was off when merges of Gemma2 9B models ended up having ~10B parameters. The current mergekit package is fine, but there are still bloated models on HF that could stand to be fixed.

The script assumes that it will be run from the same directory as the model weights, and will trim the unnecessary lm_head.weight tensor and corresponding index entry.