InLawMate-peft / README.md
Aryaman02's picture
Update README.md
ece0b5f verified
metadata
tags:
  - autotrain
  - text-generation-inference
  - text-generation
  - peft
library_name: transformers
base_model: allenai/Llama-3.1-Tulu-3-8B
widget:
  - messages:
      - role: user
        content: >-
          What are the requirements for cross-examination according to Indian
          law?
license: other

InLawMate-peft: Indian Legal Domain PEFT Model

Model Description

InLawMate-peft is a Parameter-Efficient Fine-Tuned (PEFT) language model specifically optimized for understanding and reasoning about Indian legal documentation. The model was trained on a carefully curated dataset of nearly 7,000 question-answer pairs derived from Indian criminal law documentation, making it particularly adept at legal comprehension and explanation tasks.

Training Data

The training data consists of nearly 7,000 high-quality legal Q&A pairs that were systematically generated using a sophisticated two-stage process:

  1. Question Generation: Questions were extracted to cover key legal concepts, definitions, procedures, and roles, ensuring comprehensive coverage of:

    • Legal terminology and definitions
    • Procedural rules and steps
    • Rights and penalties
    • Jurisdictional aspects
    • Roles of legal entities (judges, lawyers, law enforcement)
  2. Answer Generation: Answers were crafted following a structured legal reasoning approach, ensuring:

    • Legal precision and accuracy
    • Comprehensive coverage of relevant points
    • Clear explanation of legal concepts
    • Professional legal discourse style

Training Details

  • Base Model: allenai/Llama-3.1-Tulu-3-8B
  • Architecture: PEFT (Parameter-Efficient Fine-Tuning)
  • Training Epochs: 3
  • Batch Size: 2 (with gradient accumulation steps of 4)
  • Learning Rate: 3e-05 with cosine scheduler
  • Sequence Length: 1024 tokens
  • Mixed Precision: BF16
  • Optimization: AdamW with β1=0.9, β2=0.999

Use Cases

This model is particularly suited for:

  • Legal document analysis and comprehension
  • Answering questions about Indian criminal law
  • Understanding legal procedures and requirements
  • Explaining legal concepts and terminology
  • Assisting in legal research and education

Limitations

  • The model is specifically trained on Indian legal documentation
  • Responses should be verified by legal professionals for critical applications
  • The model should not be used as a substitute for professional legal advice

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "aryaman/legalpara-lm",
    device_map="auto",
    torch_dtype='auto'
).eval()

tokenizer = AutoTokenizer.from_pretrained("Aryaman02/InLawMate-peft")

# Example legal query
messages = [
    {"role": "user", "content": "What are the requirements for cross-examination according to Indian law?"}
]

input_ids = tokenizer.apply_chat_template(
    conversation=messages, 
    tokenize=True, 
    add_generation_prompt=True, 
    return_tensors='pt'
)
output_ids = model.generate(input_ids.to('cuda'))
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Citation

If you use this model in your research, please cite:

@misc{legalpara-lm,
  title={InLawMate: A PEFT Model for Indian Legal Domain Understanding},
  year={2024},
  publisher={Aryaman},
  note={Model trained on Indian legal documentation}
}

Our training data and procedure for synth data creation is outlined in https://github.com/DarryCrucian/law-llm