mlsquare
/

pico_seshu_test

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

pico_seshu_test / README.md

SaiNikhileshReddy's picture

SaiNikhileshReddy

Upload MambaForCausalLM

77a2668 verified 10 months ago

|

history blame contribute delete

1.75 kB

	---
	language:
	- en
	license: mit
	library_name: transformers
	datasets:
	- mlsquare/CLIENT_samantar_mixed_train_val
	pipeline_tag: text-generation
	---

	# Model Card for Model ID

	Testing model for the Seshu pipeline.

	## Model Details

	### Model Description


	This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

	- Developed by: MLsquare
	- Model type: Next Character Generation
	- Language(s) (NLP): All languages in ai4bharat/samanantar dataset
	- License: MIT

	### Model Sources [optional]

	- Repository: https://github.com/LegallyCoder/mamba-hf
	- Paper: https://arxiv.org/abs/2312.00752

	## Uses

	Refer to the github repository for more information
	### Direct Use
	Refer to the github repository for more information


	## How to Get Started with the Model

	Refer to the github repository: https://github.com/mlsquare/fedem

	## Training Details

	### Training Data

	Individual target and source sentences from the AI4Bharat Samanantar dataset. All 11 language sentences and their translations have been stacked and used for next character generation task.

	### Training Procedure

	Trained on next character generation task using cross-entropy loss.

	#### Preprocessing [optional]

	converted to raw UTF8 characters before training by using ByT5-large tokenizer


	#### Training Hyperparameters

	- Training regime:
	output_dir="mamba",
	per_device_train_batch_size=1,
	per_device_eval_batch_size=1,
	num_train_epochs=4,
	weight_decay=0.1,
	lr_scheduler_type="cosine",
	learning_rate=5e-4,
	fp16=False,

	## Evaluation

	A simple cross-entropy loss has been used to test the pipeline and working of the model.


	## Model Card Contact

	MLsquare