--- library_name: transformers license: mit datasets: - mlsquare/CLIENT_samantar_mixed_train_val language: - en pipeline_tag: text-generation --- # Model Card for Model ID Adapter for mlsquare/pico_seshu_test using LoRA on "model.layers.3.dt_proj", "model.layers.3.x_proj", "model.layers.3.out_proj". Standard use of PEFT on Mamba-hf model ## Model Details ### Model Description - **Developed by:** MLsquare - **Model type:** Next Character Generation - **Language(s) (NLP):** All languages in ai4bharat/samanantar dataset - **License:** MIT ## Model Details ### Model Description - **Developed by:** MLsquare - **Model type:** Next Character Generation - **Language(s) (NLP):** All languages in ai4bharat/samanantar dataset - **License:** MIT ### Model Sources [optional] - **Repository:** https://github.com/LegallyCoder/mamba-hf - **Paper:** https://arxiv.org/abs/2312.00752 ## Uses Refer to the github repository for more information ### Direct Use Refer to the github repository for more information ## How to Get Started with the Model Refer to the github repository: https://github.com/mlsquare/fedem ## Training Details ### Training Data Individual target and source sentences from the AI4Bharat Samanantar dataset. All 11 language sentences and their translations have been stacked and used for next character generation task. ### Training Procedure Trained on the next character generation task using cross-entropy loss. #### Preprocessing [optional] converted to raw UTF8 characters before training by using ByT5-large tokenizer #### Training Hyperparameters - **Training regime:** output_dir="mamba", per_device_train_batch_size=1, per_device_eval_batch_size=1, num_train_epochs=4, weight_decay=0.1, lr_scheduler_type="cosine", learning_rate=5e-4, fp16=False, ## Evaluation A simple cross-entropy loss has been used to test the pipeline and working of the model. ## Model Card Contact MLsquare