Aryaman02
/

InLawMate-peft

Text Generation

Trained with AutoTrain

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

InLawMate-peft / README.md

Aryaman02's picture

Update README.md

ece0b5f verified about 1 month ago

|

history blame contribute delete

3.53 kB

	---
	tags:
	- autotrain
	- text-generation-inference
	- text-generation
	- peft
	library_name: transformers
	base_model: allenai/Llama-3.1-Tulu-3-8B
	widget:
	- messages:
	- role: user
	content: What are the requirements for cross-examination according to Indian law?
	license: other
	---
	# InLawMate-peft: Indian Legal Domain PEFT Model

	## Model Description
	InLawMate-peft is a Parameter-Efficient Fine-Tuned (PEFT) language model specifically optimized for understanding and reasoning about Indian legal documentation. The model was trained on a carefully curated dataset of nearly 7,000 question-answer pairs derived from Indian criminal law documentation, making it particularly adept at legal comprehension and explanation tasks.

	## Training Data
	The training data consists of nearly 7,000 high-quality legal Q&A pairs that were systematically generated using a sophisticated two-stage process:
	1. Question Generation: Questions were extracted to cover key legal concepts, definitions, procedures, and roles, ensuring comprehensive coverage of:
	- Legal terminology and definitions
	- Procedural rules and steps
	- Rights and penalties
	- Jurisdictional aspects
	- Roles of legal entities (judges, lawyers, law enforcement)

	2. Answer Generation: Answers were crafted following a structured legal reasoning approach, ensuring:
	- Legal precision and accuracy
	- Comprehensive coverage of relevant points
	- Clear explanation of legal concepts
	- Professional legal discourse style

	## Training Details
	- Base Model: allenai/Llama-3.1-Tulu-3-8B
	- Architecture: PEFT (Parameter-Efficient Fine-Tuning)
	- Training Epochs: 3
	- Batch Size: 2 (with gradient accumulation steps of 4)
	- Learning Rate: 3e-05 with cosine scheduler
	- Sequence Length: 1024 tokens
	- Mixed Precision: BF16
	- Optimization: AdamW with β1=0.9, β2=0.999

	## Use Cases
	This model is particularly suited for:
	- Legal document analysis and comprehension
	- Answering questions about Indian criminal law
	- Understanding legal procedures and requirements
	- Explaining legal concepts and terminology
	- Assisting in legal research and education

	## Limitations
	- The model is specifically trained on Indian legal documentation
	- Responses should be verified by legal professionals for critical applications
	- The model should not be used as a substitute for professional legal advice

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained(
	"aryaman/legalpara-lm",
	device_map="auto",
	torch_dtype='auto'
	).eval()

	tokenizer = AutoTokenizer.from_pretrained("Aryaman02/InLawMate-peft")

	# Example legal query
	messages = [
	{"role": "user", "content": "What are the requirements for cross-examination according to Indian law?"}
	]

	input_ids = tokenizer.apply_chat_template(
	conversation=messages,
	tokenize=True,
	add_generation_prompt=True,
	return_tensors='pt'
	)
	output_ids = model.generate(input_ids.to('cuda'))
	response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)
	print(response)
	```

	## Citation
	If you use this model in your research, please cite:
	```bibtex
	@misc{legalpara-lm,
	title={InLawMate: A PEFT Model for Indian Legal Domain Understanding},
	year={2024},
	publisher={Aryaman},
	note={Model trained on Indian legal documentation}
	}
	```
	Our training data and procedure for synth data creation is outlined in https://github.com/DarryCrucian/law-llm