power-greg
commited on
Commit
·
45e79fd
1
Parent(s):
e4d644f
Update README.md
Browse files
README.md
CHANGED
@@ -47,7 +47,7 @@ Now that you know a bit about where we’re going, today, we’re excited to rel
|
|
47 |
Steps to a ChatGPT-like LLM for your use case 1️⃣2️⃣3️⃣
|
48 |
Here are the steps to get an instruction-following LLM like ChatGPT to handle your use case:
|
49 |
|
50 |
-
(Show me the code: Play with our dataset generator for creating ChatGPT-like datasets.)
|
51 |
|
52 |
1. Try prompt-tuning ChatGPT or another model. You can use Lamini library’s APIs to quickly prompt-tune across different models, swapping between OpenAI and open-source models in just one line of code. We optimize the right prompt for you, so you can take advantage of different models without worrying about the right prompt template for each model.
|
53 |
2. Build a large dataset of input-output pairs. These will show your model how it should respond to its inputs, whether that's following instructions given in English, or responding in JSON. Today, we’re releasing a repo with just a few lines of code using the Lamini library to generate 50k data points from as few as 100 data points. We include an open-source 50k dataset in the repo. (More details below on how you can do this!)
|
@@ -66,7 +66,7 @@ You'll need a dataset of ~50k instruction-following examples to start. Don't pan
|
|
66 |
|
67 |
You can customize the initial 100+ instructions so that the LLM follows instructions in your own vertical. Once you have those, submit them to the Lamini dataset generator, and voilà: you get a large instruction-following dataset on your use case as a result!
|
68 |
|
69 |
-
How the dataset generator works
|
70 |
|
71 |
The Lamini dataset generator is a pipeline of LLMs that takes your original small set of 100+ instructions, paired with the expected responses, to generate 50k+ new pairs, inspired by Stanford Alpaca. This generation pipeline uses the Lamini library to define and call LLMs to generate different, yet similar, pairs of instructions and responses. Trained on this data, your LLM will improve to follow these instructions.
|
72 |
|
@@ -76,7 +76,7 @@ The Lamini library allows you to swap our defaults for other open-source or Open
|
|
76 |
|
77 |
If you’re interested in more details on how our dataset generator works, read more or run it here.
|
78 |
|
79 |
-
|
80 |
We have used the above pipeline to generate a filtered dataset having around 37k questions and responses samples. But that's not all! We've also fine-tuned a language model based on EleutherAI’s pythia model. It is hosted on Hugging-Face website as lamini/instruct-tuned-2.8b and is available for use under CC-BY license here. This model is optimized for generating accurate and relevant responses to instruction-based tasks, making it perfect for tasks like question answering, code autocomplete, and chatbots. Feel free to run queries by yourself on our playground!!
|
81 |
|
82 |
# Pushing the boundaries of fast & usable generative AI
|
|
|
47 |
Steps to a ChatGPT-like LLM for your use case 1️⃣2️⃣3️⃣
|
48 |
Here are the steps to get an instruction-following LLM like ChatGPT to handle your use case:
|
49 |
|
50 |
+
(Show me the [code](https://github.com/lamini-ai/lamini): Play with our dataset generator for creating ChatGPT-like datasets.)
|
51 |
|
52 |
1. Try prompt-tuning ChatGPT or another model. You can use Lamini library’s APIs to quickly prompt-tune across different models, swapping between OpenAI and open-source models in just one line of code. We optimize the right prompt for you, so you can take advantage of different models without worrying about the right prompt template for each model.
|
53 |
2. Build a large dataset of input-output pairs. These will show your model how it should respond to its inputs, whether that's following instructions given in English, or responding in JSON. Today, we’re releasing a repo with just a few lines of code using the Lamini library to generate 50k data points from as few as 100 data points. We include an open-source 50k dataset in the repo. (More details below on how you can do this!)
|
|
|
66 |
|
67 |
You can customize the initial 100+ instructions so that the LLM follows instructions in your own vertical. Once you have those, submit them to the Lamini dataset generator, and voilà: you get a large instruction-following dataset on your use case as a result!
|
68 |
|
69 |
+
### How the dataset generator works
|
70 |
|
71 |
The Lamini dataset generator is a pipeline of LLMs that takes your original small set of 100+ instructions, paired with the expected responses, to generate 50k+ new pairs, inspired by Stanford Alpaca. This generation pipeline uses the Lamini library to define and call LLMs to generate different, yet similar, pairs of instructions and responses. Trained on this data, your LLM will improve to follow these instructions.
|
72 |
|
|
|
76 |
|
77 |
If you’re interested in more details on how our dataset generator works, read more or run it here.
|
78 |
|
79 |
+
# We Fine-tuned a custom model and hosted it 🎉
|
80 |
We have used the above pipeline to generate a filtered dataset having around 37k questions and responses samples. But that's not all! We've also fine-tuned a language model based on EleutherAI’s pythia model. It is hosted on Hugging-Face website as lamini/instruct-tuned-2.8b and is available for use under CC-BY license here. This model is optimized for generating accurate and relevant responses to instruction-based tasks, making it perfect for tasks like question answering, code autocomplete, and chatbots. Feel free to run queries by yourself on our playground!!
|
81 |
|
82 |
# Pushing the boundaries of fast & usable generative AI
|