VERY IMPORTANT

  • This model is in alpha phase and is NOT yet recommended for use.
  • This model is obsolete today, recommended model here.

RAGPT-2 (unfunctional): Fine-tuned GPT-2 for Context-Based Question Answering

Model Description

RAGPT-2 is a fine-tuned version of GPT-2 small, specifically adapted for context-based question answering tasks. This model has been trained to generate relevant answers based on a given context and question, similar to a Retrieval-Augmented Generation (RAG) system.

Key Features

  • Based on the GPT-2 small architecture (124M parameters)
  • Fine-tuned on the "neural-bridge/rag-dataset-12000" and others dataset from Hugging Face
  • Capable of generating answers based on provided context and questions
  • Suitable for various question-answering applications

Training Data

The model was fine-tuned using the "neural-bridge/rag-dataset-12000" and "neural-bridge/rag-dataset-1200" dataset, which contains:

  • Context passages
  • Questions related to the context
  • Corresponding answers

Fine-tuning Process

The fine-tuning process involved:

  1. Loading the pre-trained GPT-2 small model
  2. Preprocessing the dataset to combine context, question, and answer into a single text
  3. Training the model to predict the next token given the context and question

Hyperparameters

  • Base model: GPT-2 small
  • Number of training epochs: 8
  • Batch size: 4
  • Learning rate: Default AdamW optimizer settings
  • Max sequence length: 512 tokens

Usage

To use the model:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("BueormLLC/RAGPT-2_unfunctional")
model = AutoModelForCausalLM.from_pretrained("BueormLLC/RAGPT-2_unfunctional")

context = "Mount Everest is the highest mountain in the world, with a height of 8,848 meters."
question = "What is the height of Mount Everest?"
input_text = f"Context: {context}\nquestion: {question}\nanswer:"

input_ids = tokenizer.encode(input_text, return_tensors="pt")
output = model.generate(input_ids, max_length=150, num_return_sequences=1)
answer = tokenizer.decode(output[0], skip_special_tokens=True)

print(f"Respuesta generada: {answer}")

Limitations

  • The model's knowledge is limited to its training data and the base GPT-2 model.
  • It may sometimes generate irrelevant or incorrect answers, especially for topics outside its training domain.
  • The model does not have access to external information or real-time data.

Ethical Considerations

Users should be aware that this model, like all language models, may reflect biases present in its training data. It should not be used as a sole source of information for critical decisions.

Future Improvements

  • Fine-tuning on a larger and more diverse dataset
  • Experimenting with larger base models (e.g., GPT-2 medium or large)
  • Implementing techniques to improve factual accuracy and reduce hallucinations

Support us

We appreciate your support, without you we could not do what we do.

Citation

If you use this model in your research, please cite:

@misc{RAGPT,
  author = {Bueorm},
  title = {RAGPT-2: Fine-tuned GPT-2 for Context-Based Question Answering},
  year = {2024},
  publisher = {GitHub},
  journal = {None},
  howpublished = {\url{https://huggingface.co/BueormLLC/RAGPT-2_unfunctional}}
}
Downloads last month
14
Safetensors
Model size
124M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train BueormLLC/RAGPT-2_unfunctional