Text Generation
Transformers
Safetensors
English
Inference Endpoints
yashwardhan20417 commited on
Commit
743becf
·
verified ·
1 Parent(s): 702611a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -48
README.md CHANGED
@@ -1,8 +1,8 @@
1
  ---
2
  library_name: transformers
3
- tags:
4
- - code
5
  license: mit
 
 
6
  language:
7
  - en
8
  pipeline_tag: text-generation
@@ -10,88 +10,75 @@ pipeline_tag: text-generation
10
 
11
  # Model Card for Model ID
12
 
13
- The following adapter is used for training a particular section of the architecture as specified in the adapter name using LoRA method.
14
-
15
 
16
 
17
  ## Model Details
18
 
19
  ### Model Description
20
 
21
- Mamba is a novel deep learning architecture designed for sequence modeling tasks, particularly those involving long sequences of data like text or audio. It tackles a key challenge faced by transformers, the current powerhouse in this field: computational inefficiency for lengthy inputs.
22
-
23
- Mamba stand out in the following ways:
24
-
25
- Selective State Spaces (SSMs): At its core, Mamba utilizes SSMs, a type of recurrent model. Unlike traditional recurrent models that process everything, SSMs focus on the most relevant information within the current input. This selective approach potentially leads to faster and more efficient processing, especially for long sequences ⏱️.
26
-
27
- Simplified Architecture: Mamba ditches the complex multi-layered structure of transformers with separate attention and MLP blocks. Instead, it employs a single, unified block built upon SSMs. This streamlined design aims to reduce computational complexity, making Mamba faster for tasks like generating text or analyzing audio .
28
 
29
- Performance and Potential: Studies suggest that Mamba can achieve state-of-the-art performance on various sequence modeling tasks, including language modeling, while offering significant speed improvements compared to transformers of similar size. This opens doors for applications where processing lengthy data sequences is crucial .
30
 
31
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
32
 
33
  - **Developed by:** MLsquare
34
- - **Model type:** Text-generation
35
- - **Language(s) (NLP):** Indic Dataset
36
  - **License:** MIT
37
 
38
-
39
- ### Model Sources
40
-
41
 
42
  - **Repository:** https://github.com/LegallyCoder/mamba-hf
43
  - **Paper:** https://arxiv.org/abs/2312.00752
44
 
 
45
 
 
46
  ### Direct Use
 
47
 
48
- The following adapter is configured for pico_seshu model by MLsquare community on huggingface.
49
-
50
-
51
- ### Recommendations
52
-
53
- Training a Mamba model on a next-character generation task using multiple Indic language datasets is a fascinating approach. Here's why:
54
-
55
- Multilinguality: Mamba's SSMs might prove adept at handling the unique characteristics of various Indic languages, including complex scripts and potentially shared grammatical structures. This could lead to a model that generalizes well across these languages.
56
-
57
- Data Efficiency: With multiple datasets, Mamba can potentially learn more effective representations of characters and their relationships within a word. This might enable the model to perform well even with limited data for individual languages, a common challenge in Indic NLP.
58
-
59
- Improved Generalizability: Training on multiple languages could foster a more robust understanding of language in general. This could benefit tasks beyond next-character generation, such as machine translation or text summarization, especially when dealing with code-switching or multilingual content.
60
-
61
- Utility and Further Exploration: The ability to generate text effectively in multiple Indic languages has numerous applications:
62
-
63
- Language Learning Tools: Mamba-based models could power interactive language learning applications that provide personalized feedback and suggestions based on the user's input. ️
64
-
65
- Content Creation: The model could be used to generate different creative text formats like poems, scripts, or even code snippets in various Indic languages, aiding productivity and artistic exploration.
66
-
67
- Multilingual Chatbots: Mamba's fluency in multiple Indic languages could power chatbots that can effectively communicate with users across different regions. This can enhance customer service reach and accessibility.
68
 
69
  ## How to Get Started with the Model
70
 
71
- Use the code below to get started with the model.
72
-
73
- [More Information Needed]
74
 
75
  ## Training Details
76
 
77
  ### Training Data
78
 
79
- The following model uses client side merged samanantar dataset
 
 
 
 
80
 
81
  #### Preprocessing [optional]
82
 
83
- Data converted to raw UTF-8 characters numbers and fed into the model using ByT5-large tokenizer
84
 
85
 
86
- ## Evaluation
87
 
88
- Using average cross-entropy loss to evaluate the performance
 
 
 
 
 
 
 
 
89
 
90
- ### Testing Data, Factors & Metrics
91
 
92
- #### Testing Data
93
 
94
- Using server samantar merged dataset
95
 
96
  ## Model Card Contact
 
97
  MLsquare
 
1
  ---
2
  library_name: transformers
 
 
3
  license: mit
4
+ datasets:
5
+ - mlsquare/CLIENT_samantar_mixed_train_val
6
  language:
7
  - en
8
  pipeline_tag: text-generation
 
10
 
11
  # Model Card for Model ID
12
 
13
+ Adapter for mlsquare/pico_seshu_test using LoRA on "model.layers.3.dt_proj". Standard use of PEFT on Mamba-hf model
 
14
 
15
 
16
  ## Model Details
17
 
18
  ### Model Description
19
 
20
+ - **Developed by:** MLsquare
21
+ - **Model type:** Next Character Generation
22
+ - **Language(s) (NLP):** All languages in ai4bharat/samanantar dataset
23
+ - **License:** MIT
 
 
 
24
 
25
+ ## Model Details
26
 
27
+ ### Model Description
28
 
29
  - **Developed by:** MLsquare
30
+ - **Model type:** Next Character Generation
31
+ - **Language(s) (NLP):** All languages in ai4bharat/samanantar dataset
32
  - **License:** MIT
33
 
34
+ ### Model Sources [optional]
 
 
35
 
36
  - **Repository:** https://github.com/LegallyCoder/mamba-hf
37
  - **Paper:** https://arxiv.org/abs/2312.00752
38
 
39
+ ## Uses
40
 
41
+ Refer to the github repository for more information
42
  ### Direct Use
43
+ Refer to the github repository for more information
44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ## How to Get Started with the Model
47
 
48
+ Refer to the github repository: https://github.com/mlsquare/fedem
 
 
49
 
50
  ## Training Details
51
 
52
  ### Training Data
53
 
54
+ Individual target and source sentences from the AI4Bharat Samanantar dataset. All 11 language sentences and their translations have been stacked and used for next character generation task.
55
+
56
+ ### Training Procedure
57
+
58
+ Trained on the next character generation task using cross-entropy loss.
59
 
60
  #### Preprocessing [optional]
61
 
62
+ converted to raw UTF8 characters before training by using ByT5-large tokenizer
63
 
64
 
65
+ #### Training Hyperparameters
66
 
67
+ - **Training regime:**
68
+ output_dir="mamba",
69
+ per_device_train_batch_size=1,
70
+ per_device_eval_batch_size=1,
71
+ num_train_epochs=4,
72
+ weight_decay=0.1,
73
+ lr_scheduler_type="cosine",
74
+ learning_rate=5e-4,
75
+ fp16=False,
76
 
77
+ ## Evaluation
78
 
79
+ A simple cross-entropy loss has been used to test the pipeline and working of the model.
80
 
 
81
 
82
  ## Model Card Contact
83
+
84
  MLsquare