library_name: transformers license: mit model_name: MBart-Urdu-Text-Summarization pipeline_tag: summarization tags: - text-generation - mbart - nlp - transformers - text-generation-inference author: Wali Muhammad Ahmad private: false gated: false inference: true mask_token: widget_data: text: Enter your para here transformers_info: auto_class: MBartForConditionalGeneration processor: AutoTokenizer language: - en - ur --- # Model Card MBart-Urdu-Text-Summarization is a fine-tuned MBart model designed for summarizing Urdu text. It leverages the multilingual capabilities of MBart to generate concise and accurate summaries for Urdu paragraphs. ## Model Details ### Model Description This model is based on the MBart architecture, which is a sequence-to-sequence model pre-trained on multilingual data. It has been fine-tuned specifically for Urdu text summarization tasks. The model is capable of understanding and generating text in both English and Urdu, making it suitable for multilingual applications. ### Model Sources [optional] - **Repository:** [https://github.com/WaliMuhammadAhmad/UrduTextSummarizationUsingm-BART] - **Paper [Multilingual Denoising Pre-training for Neural Machine Translation]:** [https://arxiv.org/abs/2001.08210] ## Uses ### Direct Use This model can be used directly for Urdu text summarization tasks. It is suitable for applications such as news summarization, document summarization, and content generation. ### Downstream Use [optional] The model can be fine-tuned for specific downstream tasks such as sentiment analysis, question answering, or machine translation for Urdu and English. ### Out-of-Scope Use This model is not intended for generating biased, harmful, or misleading content. It should not be used for tasks outside of text summarization without proper fine-tuning and evaluation. ## Bias, Risks, and Limitations - The model may generate biased or inappropriate content if the input text contains biases. - It is trained on a specific dataset and may not generalize well to other domains or languages. - The model's performance may degrade for very long input texts. ### Recommendations Users should carefully evaluate the model's outputs for biases and appropriateness. Fine-tuning on domain-specific data is recommended for better performance in specialized applications. ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import AutoTokenizer, MBartForConditionalGeneration # Load the model and tokenizer model_name = "ihatenlp/MBart-Urdu-Text-Summarization" tokenizer = AutoTokenizer.from_pretrained(model_name) model = MBartForConditionalGeneration.from_pretrained(model_name) # Example input text input_text = "Enter your Urdu paragraph here." # Tokenize and generate summary inputs = tokenizer(input_text, return_tensors="pt") summary_ids = model.generate(inputs["input_ids"], max_length=50, num_beams=4, early_stopping=True) summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) print("Summary:", summary) ``` ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). ## Citation [optional] **BibTeX:** ```bibtex @misc{liu2020multilingualdenoisingpretrainingneural, title={Multilingual Denoising Pre-training for Neural Machine Translation}, author={Yinhan Liu and Jiatao Gu and Naman Goyal and Xian Li and Sergey Edunov and Marjan Ghazvininejad and Mike Lewis and Luke Zettlemoyer}, year={2020}, eprint={2001.08210}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2001.08210}, } ``` ## Model Card Authors [optional] - **Wali Muhammad Ahmad** - **Muhammad Labeeb Tariq** ## Model Card Contact - **Email:** [wali.muhammad.ahmad@gmail.com] - **Hugging Face Profile:** [Wali Muhammad Ahmad](https://huggingface.co/ihatenlp)