jbochi commited on
Commit
ce2c917
·
1 Parent(s): 9244048

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -3
README.md CHANGED
@@ -426,12 +426,25 @@ datasets:
426
  - allenai/MADLAD-400
427
  ---
428
 
429
- This model has the safetensors weights for [Madlad-400](https://github.com/google-research/google-research/tree/master/madlad_400) 8B param language model.
430
 
431
- There's currently no Python code to run inference. See details in this [thread](https://huggingface.co/jbochi/madlad400-3b-mt/discussions/2#6548f33cc6cd77f0a7c9798a)
432
 
 
433
 
434
- Available language models models:
 
 
 
 
 
 
 
 
 
 
 
 
435
  - [3B](https://huggingface.co/jbochi/madlad400-3b-mt)
436
  - [7B](https://huggingface.co/jbochi/madlad400-7b-mt)
437
  - [7B-BT](https://huggingface.co/jbochi/madlad400-7b-mt-bt)
 
426
  - allenai/MADLAD-400
427
  ---
428
 
429
+ This model has the safetensors weights for the [Madlad-400](https://github.com/google-research/google-research/tree/master/madlad_400) 8B param **language model**.
430
 
431
+ The Python code to run inference is not ready yet.
432
 
433
+ The model architecture is the same as [Palm 8B](https://arxiv.org/pdf/2204.02311.pdf).
434
 
435
+ It's a decoder-only T5 with 32 layers, 16 query heads, 1 KV head, and 4096 embedding size.
436
+
437
+ These are the main differences relative to the original T5 architecture:
438
+
439
+ - SwiGLU Activation
440
+ - Parallel Layers
441
+ - Multi-Query Attention
442
+ - RoPE Embeddings
443
+ - Shared Input-Output Embeddings
444
+ - No biases
445
+ - Bidirectional attention
446
+
447
+ If you are looking for the language models models, here are the available versions:
448
  - [3B](https://huggingface.co/jbochi/madlad400-3b-mt)
449
  - [7B](https://huggingface.co/jbochi/madlad400-7b-mt)
450
  - [7B-BT](https://huggingface.co/jbochi/madlad400-7b-mt-bt)