mlsquare
/

mamba_pico_small_dt_proj

Text Generation

Transformers

Safetensors

English

Inference Endpoints

Model card Files Files and versions Community

yashwardhan20417 commited on Mar 10, 2024

Commit

743becf

verified ·

1 Parent(s): 702611a

Update README.md

Browse files

Files changed (1) hide show

README.md +35 -48

README.md CHANGED Viewed

@@ -1,8 +1,8 @@
 ---
 library_name: transformers
-tags:
-- code
 license: mit
 language:
 - en
 pipeline_tag: text-generation
@@ -10,88 +10,75 @@ pipeline_tag: text-generation
 # Model Card for Model ID
-The following adapter is used for training a particular section of the architecture as specified in the adapter name using LoRA method.
 ## Model Details
 ### Model Description
-Mamba is a novel deep learning architecture designed for sequence modeling tasks, particularly those involving long sequences of data like text or audio. It tackles a key challenge faced by transformers, the current powerhouse in this field: computational inefficiency for lengthy inputs.
-Mamba stand out in the following ways:
-Selective State Spaces (SSMs): At its core, Mamba utilizes SSMs, a type of recurrent model. Unlike traditional recurrent models that process everything, SSMs focus on the most relevant information within the current input. This selective approach potentially leads to faster and more efficient processing, especially for long sequences ⏱️.
-Simplified Architecture: Mamba ditches the complex multi-layered structure of transformers with separate attention and MLP blocks. Instead, it employs a single, unified block built upon SSMs. This streamlined design aims to reduce computational complexity, making Mamba faster for tasks like generating text or analyzing audio  .
-Performance and Potential: Studies suggest that Mamba can achieve state-of-the-art performance on various sequence modeling tasks, including language modeling, while offering significant speed improvements compared to transformers of similar size. This opens doors for applications where processing lengthy data sequences is crucial .
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
 - **Developed by:** MLsquare
-- **Model type:** Text-generation
-- **Language(s) (NLP):** Indic Dataset
 - **License:** MIT
-### Model Sources
 - **Repository:** https://github.com/LegallyCoder/mamba-hf
 - **Paper:** https://arxiv.org/abs/2312.00752
 ### Direct Use
-The following adapter is configured for pico_seshu model by MLsquare community on huggingface.
-### Recommendations
-Training a Mamba model on a next-character generation task using multiple Indic language datasets is a fascinating approach. Here's why:
-Multilinguality: Mamba's SSMs might prove adept at handling the unique characteristics of various Indic languages, including complex scripts and potentially shared grammatical structures. This could lead to a model that generalizes well across these languages.
-Data Efficiency: With multiple datasets, Mamba can potentially learn more effective representations of characters and their relationships within a word. This might enable the model to perform well even with limited data for individual languages, a common challenge in Indic NLP.
-Improved Generalizability:  Training on multiple languages could foster a more robust understanding of language in general. This could benefit tasks beyond next-character generation, such as machine translation or text summarization, especially when dealing with code-switching or multilingual content.
-Utility and Further Exploration:  The ability to generate text effectively in multiple Indic languages has numerous applications:
-Language Learning Tools:  Mamba-based models could power interactive language learning applications that provide personalized feedback and suggestions based on the user's input. ️
-Content Creation:  The model could be used to generate different creative text formats like poems, scripts, or even code snippets in various Indic languages, aiding productivity and artistic exploration.
-Multilingual Chatbots:  Mamba's fluency in multiple Indic languages could power chatbots that can effectively communicate with users across different regions. This can enhance customer service reach and accessibility.
 ## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
 ### Training Data
-The following model uses client side merged samanantar dataset
 #### Preprocessing [optional]
-Data converted to raw UTF-8 characters numbers and fed into the model using ByT5-large tokenizer
-## Evaluation
-Using average cross-entropy loss to evaluate the performance
-### Testing Data, Factors & Metrics
-#### Testing Data
-Using server samantar merged dataset
 ## Model Card Contact
 MLsquare

 ---
 library_name: transformers
 license: mit
+datasets:
+- mlsquare/CLIENT_samantar_mixed_train_val
 language:
 - en
 pipeline_tag: text-generation
 # Model Card for Model ID
+Adapter for mlsquare/pico_seshu_test using LoRA on "model.layers.3.dt_proj". Standard use of PEFT on Mamba-hf model
 ## Model Details
 ### Model Description
+- **Developed by:** MLsquare
+- **Model type:** Next Character Generation
+- **Language(s) (NLP):** All languages in ai4bharat/samanantar dataset
+- **License:** MIT
+## Model Details
+### Model Description
 - **Developed by:** MLsquare
+- **Model type:** Next Character Generation
+- **Language(s) (NLP):** All languages in ai4bharat/samanantar dataset
 - **License:** MIT
+### Model Sources [optional]
 - **Repository:** https://github.com/LegallyCoder/mamba-hf
 - **Paper:** https://arxiv.org/abs/2312.00752
+## Uses
+Refer to the github repository for more information
 ### Direct Use
+Refer to the github repository for more information
 ## How to Get Started with the Model
+Refer to the github repository: https://github.com/mlsquare/fedem
 ## Training Details
 ### Training Data
+Individual target and source sentences from the AI4Bharat Samanantar dataset. All 11 language sentences and their translations have been stacked and used for next character generation task.
+### Training Procedure
+Trained on the next character generation task using cross-entropy loss.
 #### Preprocessing [optional]
+converted to raw UTF8 characters before training by using ByT5-large tokenizer
+#### Training Hyperparameters
+- **Training regime:**
+  output_dir="mamba",
+  per_device_train_batch_size=1,
+  per_device_eval_batch_size=1,
+  num_train_epochs=4,
+  weight_decay=0.1,
+  lr_scheduler_type="cosine",
+  learning_rate=5e-4,
+  fp16=False,
+## Evaluation
+A simple cross-entropy loss has been used to test the pipeline and working of the model.
 ## Model Card Contact
 MLsquare