|
--- |
|
tags: |
|
- autotrain |
|
- text-generation-inference |
|
- text-generation |
|
- peft |
|
library_name: transformers |
|
base_model: allenai/Llama-3.1-Tulu-3-8B |
|
widget: |
|
- messages: |
|
- role: user |
|
content: What are the requirements for cross-examination according to Indian law? |
|
license: other |
|
--- |
|
# InLawMate-peft: Indian Legal Domain PEFT Model |
|
|
|
## Model Description |
|
InLawMate-peft is a Parameter-Efficient Fine-Tuned (PEFT) language model specifically optimized for understanding and reasoning about Indian legal documentation. The model was trained on a carefully curated dataset of nearly 7,000 question-answer pairs derived from Indian criminal law documentation, making it particularly adept at legal comprehension and explanation tasks. |
|
|
|
## Training Data |
|
The training data consists of nearly 7,000 high-quality legal Q&A pairs that were systematically generated using a sophisticated two-stage process: |
|
1. **Question Generation**: Questions were extracted to cover key legal concepts, definitions, procedures, and roles, ensuring comprehensive coverage of: |
|
- Legal terminology and definitions |
|
- Procedural rules and steps |
|
- Rights and penalties |
|
- Jurisdictional aspects |
|
- Roles of legal entities (judges, lawyers, law enforcement) |
|
|
|
2. **Answer Generation**: Answers were crafted following a structured legal reasoning approach, ensuring: |
|
- Legal precision and accuracy |
|
- Comprehensive coverage of relevant points |
|
- Clear explanation of legal concepts |
|
- Professional legal discourse style |
|
|
|
## Training Details |
|
- **Base Model**: allenai/Llama-3.1-Tulu-3-8B |
|
- **Architecture**: PEFT (Parameter-Efficient Fine-Tuning) |
|
- **Training Epochs**: 3 |
|
- **Batch Size**: 2 (with gradient accumulation steps of 4) |
|
- **Learning Rate**: 3e-05 with cosine scheduler |
|
- **Sequence Length**: 1024 tokens |
|
- **Mixed Precision**: BF16 |
|
- **Optimization**: AdamW with β1=0.9, β2=0.999 |
|
|
|
## Use Cases |
|
This model is particularly suited for: |
|
- Legal document analysis and comprehension |
|
- Answering questions about Indian criminal law |
|
- Understanding legal procedures and requirements |
|
- Explaining legal concepts and terminology |
|
- Assisting in legal research and education |
|
|
|
## Limitations |
|
- The model is specifically trained on Indian legal documentation |
|
- Responses should be verified by legal professionals for critical applications |
|
- The model should not be used as a substitute for professional legal advice |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
"aryaman/legalpara-lm", |
|
device_map="auto", |
|
torch_dtype='auto' |
|
).eval() |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("Aryaman02/InLawMate-peft") |
|
|
|
# Example legal query |
|
messages = [ |
|
{"role": "user", "content": "What are the requirements for cross-examination according to Indian law?"} |
|
] |
|
|
|
input_ids = tokenizer.apply_chat_template( |
|
conversation=messages, |
|
tokenize=True, |
|
add_generation_prompt=True, |
|
return_tensors='pt' |
|
) |
|
output_ids = model.generate(input_ids.to('cuda')) |
|
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True) |
|
print(response) |
|
``` |
|
|
|
## Citation |
|
If you use this model in your research, please cite: |
|
```bibtex |
|
@misc{legalpara-lm, |
|
title={InLawMate: A PEFT Model for Indian Legal Domain Understanding}, |
|
year={2024}, |
|
publisher={Aryaman}, |
|
note={Model trained on Indian legal documentation} |
|
} |
|
``` |
|
Our training data and procedure for synth data creation is outlined in https://github.com/DarryCrucian/law-llm |