Commit
·
107b50b
1
Parent(s):
845517d
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,34 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
tags:
|
4 |
+
- endpoints-template
|
5 |
---
|
6 |
+
|
7 |
+
# FORK of [google/flan-ul2](https://huggingface.co/google/flan-ul2)
|
8 |
+
|
9 |
+
> This is a fork of google/flan-ul2 20B implementing a custom `handler.py` for deploying the model to inference-endpoints on a 4x NVIDIA T4.
|
10 |
+
|
11 |
+
You can deploy the flan-ul2 with a [1-click](https://ui.endpoints.huggingface.co/new?repository=philschmid/flan-ul2-20b-fp16).
|
12 |
+
|
13 |
+
![createEndpoint](createEndpoint.png)
|
14 |
+
|
15 |
+
|
16 |
+
# TL;DR
|
17 |
+
|
18 |
+
Flan-UL2 is an encoder decoder model based on the `T5` architecture. It uses the same configuration as the [`UL2 model`](https://huggingface.co/google/ul2) released earlier last year. It was fine tuned using the "Flan" prompt tuning
|
19 |
+
and dataset collection.
|
20 |
+
|
21 |
+
According ot the original [blog](https://www.yitay.net/blog/flan-ul2-20b) here are the notable improvements:
|
22 |
+
- The original UL2 model was only trained with receptive field of 512, which made it non-ideal for N-shot prompting where N is large.
|
23 |
+
- The Flan-UL2 checkpoint uses a receptive field of 2048 which makes it more usable for few-shot in-context learning.
|
24 |
+
- The original UL2 model also had mode switch tokens that was rather mandatory to get good performance. However, they were a little cumbersome as this requires often some changes during inference or finetuning. In this update/change, we continue training UL2 20B for an additional 100k steps (with small batch) to forget “mode tokens” before applying Flan instruction tuning. This Flan-UL2 checkpoint does not require mode tokens anymore.
|
25 |
+
|
26 |
+
**Important**: For more details, please see sections 5.2.1 and 5.2.2 of the [paper](https://arxiv.org/pdf/2205.05131v1.pdf).
|
27 |
+
|
28 |
+
# Contribution
|
29 |
+
|
30 |
+
This model was originally contributed by [Yi Tay](https://www.yitay.net/?author=636616684c5e64780328eece), and added to the Hugging Face ecosystem by [Younes Belkada](https://huggingface.co/ybelkada) & [Arthur Zucker](https://huggingface.co/ArthurZ).
|
31 |
+
|
32 |
+
# Citation
|
33 |
+
|
34 |
+
If you want to cite this work, please consider citing the [blogpost](https://www.yitay.net/blog/flan-ul2-20b) announcing the release of `Flan-UL2`.
|