yashwardhan20417
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,8 +1,8 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
-
tags:
|
4 |
-
- code
|
5 |
license: mit
|
|
|
|
|
6 |
language:
|
7 |
- en
|
8 |
pipeline_tag: text-generation
|
@@ -10,88 +10,75 @@ pipeline_tag: text-generation
|
|
10 |
|
11 |
# Model Card for Model ID
|
12 |
|
13 |
-
|
14 |
-
|
15 |
|
16 |
|
17 |
## Model Details
|
18 |
|
19 |
### Model Description
|
20 |
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
Selective State Spaces (SSMs): At its core, Mamba utilizes SSMs, a type of recurrent model. Unlike traditional recurrent models that process everything, SSMs focus on the most relevant information within the current input. This selective approach potentially leads to faster and more efficient processing, especially for long sequences ⏱️.
|
26 |
-
|
27 |
-
Simplified Architecture: Mamba ditches the complex multi-layered structure of transformers with separate attention and MLP blocks. Instead, it employs a single, unified block built upon SSMs. This streamlined design aims to reduce computational complexity, making Mamba faster for tasks like generating text or analyzing audio .
|
28 |
|
29 |
-
|
30 |
|
31 |
-
|
32 |
|
33 |
- **Developed by:** MLsquare
|
34 |
-
- **Model type:**
|
35 |
-
- **Language(s) (NLP):**
|
36 |
- **License:** MIT
|
37 |
|
38 |
-
|
39 |
-
### Model Sources
|
40 |
-
|
41 |
|
42 |
- **Repository:** https://github.com/LegallyCoder/mamba-hf
|
43 |
- **Paper:** https://arxiv.org/abs/2312.00752
|
44 |
|
|
|
45 |
|
|
|
46 |
### Direct Use
|
|
|
47 |
|
48 |
-
The following adapter is configured for pico_seshu model by MLsquare community on huggingface.
|
49 |
-
|
50 |
-
|
51 |
-
### Recommendations
|
52 |
-
|
53 |
-
Training a Mamba model on a next-character generation task using multiple Indic language datasets is a fascinating approach. Here's why:
|
54 |
-
|
55 |
-
Multilinguality: Mamba's SSMs might prove adept at handling the unique characteristics of various Indic languages, including complex scripts and potentially shared grammatical structures. This could lead to a model that generalizes well across these languages.
|
56 |
-
|
57 |
-
Data Efficiency: With multiple datasets, Mamba can potentially learn more effective representations of characters and their relationships within a word. This might enable the model to perform well even with limited data for individual languages, a common challenge in Indic NLP.
|
58 |
-
|
59 |
-
Improved Generalizability: Training on multiple languages could foster a more robust understanding of language in general. This could benefit tasks beyond next-character generation, such as machine translation or text summarization, especially when dealing with code-switching or multilingual content.
|
60 |
-
|
61 |
-
Utility and Further Exploration: The ability to generate text effectively in multiple Indic languages has numerous applications:
|
62 |
-
|
63 |
-
Language Learning Tools: Mamba-based models could power interactive language learning applications that provide personalized feedback and suggestions based on the user's input. ️
|
64 |
-
|
65 |
-
Content Creation: The model could be used to generate different creative text formats like poems, scripts, or even code snippets in various Indic languages, aiding productivity and artistic exploration.
|
66 |
-
|
67 |
-
Multilingual Chatbots: Mamba's fluency in multiple Indic languages could power chatbots that can effectively communicate with users across different regions. This can enhance customer service reach and accessibility.
|
68 |
|
69 |
## How to Get Started with the Model
|
70 |
|
71 |
-
|
72 |
-
|
73 |
-
[More Information Needed]
|
74 |
|
75 |
## Training Details
|
76 |
|
77 |
### Training Data
|
78 |
|
79 |
-
|
|
|
|
|
|
|
|
|
80 |
|
81 |
#### Preprocessing [optional]
|
82 |
|
83 |
-
|
84 |
|
85 |
|
86 |
-
|
87 |
|
88 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
89 |
|
90 |
-
|
91 |
|
92 |
-
|
93 |
|
94 |
-
Using server samantar merged dataset
|
95 |
|
96 |
## Model Card Contact
|
|
|
97 |
MLsquare
|
|
|
1 |
---
|
2 |
library_name: transformers
|
|
|
|
|
3 |
license: mit
|
4 |
+
datasets:
|
5 |
+
- mlsquare/CLIENT_samantar_mixed_train_val
|
6 |
language:
|
7 |
- en
|
8 |
pipeline_tag: text-generation
|
|
|
10 |
|
11 |
# Model Card for Model ID
|
12 |
|
13 |
+
Adapter for mlsquare/pico_seshu_test using LoRA on "model.layers.3.dt_proj". Standard use of PEFT on Mamba-hf model
|
|
|
14 |
|
15 |
|
16 |
## Model Details
|
17 |
|
18 |
### Model Description
|
19 |
|
20 |
+
- **Developed by:** MLsquare
|
21 |
+
- **Model type:** Next Character Generation
|
22 |
+
- **Language(s) (NLP):** All languages in ai4bharat/samanantar dataset
|
23 |
+
- **License:** MIT
|
|
|
|
|
|
|
24 |
|
25 |
+
## Model Details
|
26 |
|
27 |
+
### Model Description
|
28 |
|
29 |
- **Developed by:** MLsquare
|
30 |
+
- **Model type:** Next Character Generation
|
31 |
+
- **Language(s) (NLP):** All languages in ai4bharat/samanantar dataset
|
32 |
- **License:** MIT
|
33 |
|
34 |
+
### Model Sources [optional]
|
|
|
|
|
35 |
|
36 |
- **Repository:** https://github.com/LegallyCoder/mamba-hf
|
37 |
- **Paper:** https://arxiv.org/abs/2312.00752
|
38 |
|
39 |
+
## Uses
|
40 |
|
41 |
+
Refer to the github repository for more information
|
42 |
### Direct Use
|
43 |
+
Refer to the github repository for more information
|
44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
## How to Get Started with the Model
|
47 |
|
48 |
+
Refer to the github repository: https://github.com/mlsquare/fedem
|
|
|
|
|
49 |
|
50 |
## Training Details
|
51 |
|
52 |
### Training Data
|
53 |
|
54 |
+
Individual target and source sentences from the AI4Bharat Samanantar dataset. All 11 language sentences and their translations have been stacked and used for next character generation task.
|
55 |
+
|
56 |
+
### Training Procedure
|
57 |
+
|
58 |
+
Trained on the next character generation task using cross-entropy loss.
|
59 |
|
60 |
#### Preprocessing [optional]
|
61 |
|
62 |
+
converted to raw UTF8 characters before training by using ByT5-large tokenizer
|
63 |
|
64 |
|
65 |
+
#### Training Hyperparameters
|
66 |
|
67 |
+
- **Training regime:**
|
68 |
+
output_dir="mamba",
|
69 |
+
per_device_train_batch_size=1,
|
70 |
+
per_device_eval_batch_size=1,
|
71 |
+
num_train_epochs=4,
|
72 |
+
weight_decay=0.1,
|
73 |
+
lr_scheduler_type="cosine",
|
74 |
+
learning_rate=5e-4,
|
75 |
+
fp16=False,
|
76 |
|
77 |
+
## Evaluation
|
78 |
|
79 |
+
A simple cross-entropy loss has been used to test the pipeline and working of the model.
|
80 |
|
|
|
81 |
|
82 |
## Model Card Contact
|
83 |
+
|
84 |
MLsquare
|