crumbly
/

horizon-25m-v0

Text Generation

Model card Files Files and versions Community

horizon-25m-v0 / README.md

crumb's picture

Update README.md

cc3a577 about 1 year ago

|

history blame contribute delete

946 Bytes

	---
	license: apache-2.0
	language:
	- en
	---

	A modified GPT2 architecture with 25m non-embedding parameters, no biases, embedding-ln, scaled sin position embeddings, and a modification that makes the model's transformer run over the sequence four times before going to the language modelling head.

	\| model \| avg \| arc \| hellaswag \| mmlu \| truthfulqa \|
	\| --- \| --- \| --- \| --- \| --- \| --- \|
	\| horizon-25m-v0 \| 30.625 \| 20.22 \| 26.23 \| 25.9 \| 50.15 \|
	\| cramp-25m \| 30.57 \| 21.76 \| 27.35 \| 25.53 \| 47.66 \|
	\| gpt2 \| 30.06 \| 22.1 \| 31.6 \| 25.86 \| 40.67 \|
	\| pythia 70m deduped \| 30.25 \| 21.08 \| 27.17 \| 25.26 \| 47.51 \|
	\| pythia 70m \| 30.46 \| 21.59 \| 27.29 \| 25.9 \| 47.06 \|
	\| pythia 160m deduped \| 31.16 \| 24.06 \| 30.34 \| 24.95 \| 44.34 \|
	\| pythia 160m \| 30.58 \| 22.78 \| 30.34 \| 24.95 \| 44.26 \|

	Dataset (Horizon-v0)

	\| Source \| Documents \|
	\| --- \| --- \|
	\| arxiv \| 8.78k \|
	\| github \| 8.82k \|
	\| books \| 10k \|
	\| wiki \| 14.67k \|
	\| openwebtext v2 \| 30.73k