jbochi
/

madlad400-8b-lm

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

jbochi commited on Nov 20, 2023

Commit

ce2c917

·

1 Parent(s): 9244048

Update README.md

Files changed (1) hide show

README.md +16 -3

README.md CHANGED Viewed

@@ -426,12 +426,25 @@ datasets:
 - allenai/MADLAD-400
 ---
-This model has the safetensors weights for [Madlad-400](https://github.com/google-research/google-research/tree/master/madlad_400) 8B param language model.
-There's currently no Python code to run inference. See details in this [thread](https://huggingface.co/jbochi/madlad400-3b-mt/discussions/2#6548f33cc6cd77f0a7c9798a)
-Available language models models:
 - [3B](https://huggingface.co/jbochi/madlad400-3b-mt)
 - [7B](https://huggingface.co/jbochi/madlad400-7b-mt)
 - [7B-BT](https://huggingface.co/jbochi/madlad400-7b-mt-bt)

 - allenai/MADLAD-400
 ---
+This model has the safetensors weights for the [Madlad-400](https://github.com/google-research/google-research/tree/master/madlad_400) 8B param **language model**.
+The Python code to run inference is not ready yet.
+The model architecture is the same as [Palm 8B](https://arxiv.org/pdf/2204.02311.pdf).
+It's a decoder-only T5 with 32 layers, 16 query heads, 1 KV head, and 4096 embedding size.
+These are the main differences relative to the original T5 architecture:
+- SwiGLU Activation
+- Parallel Layers
+- Multi-Query Attention
+- RoPE Embeddings
+- Shared Input-Output Embeddings
+- No biases
+- Bidirectional attention
+If you are looking for the language models models, here are the available versions:
 - [3B](https://huggingface.co/jbochi/madlad400-3b-mt)
 - [7B](https://huggingface.co/jbochi/madlad400-7b-mt)
 - [7B-BT](https://huggingface.co/jbochi/madlad400-7b-mt-bt)