---
library_name: transformers
tags:
- unsloth
- trl
- sft
base_model:
- meta-llama/Llama-3.1-8B-Instruct
---

# Model Card for Critical Thinker

## Model Details

### Model Description
The **Critical Thinker** model is a fine-tuned version of **meta-llama/Llama-3.1-8B-Instruct**, optimized for developing and evaluating **critical thinking** and **investigative reasoning** skills. It is specifically trained on the **Critical Thinking Synthetic Dataset**, which focuses on logical reasoning, forensic investigation, and multi-layered decision-making scenarios.

- **Developed by:** Theeseus AI
- **Funded by [optional]:** Independent Research Grant
- **Shared by:** [Theeseus AI](https://www.linkedin.com/in/theeseus/)
- **Model type:** Transformer-based Language Model
- **Language(s):** English
- **License:** Apache 2.0
- **Finetuned from model:** meta-llama/Llama-3.1-8B-Instruct

### Model Sources
- **Repository:** [Critical Thinker on HuggingFace](https://huggingface.co/datasets/theeseus-ai/CriticalThinker)
- **Dataset:** [Critical Thinking Dataset](https://huggingface.co/datasets/theeseus-ai/CriticalThinker)

---

## Uses

### Direct Use
- **Critical Thinking Assessments:** Evaluating logical reasoning and problem-solving capabilities.
- **Digital Forensics Investigations:** Testing AI capabilities in analyzing logs, metadata, and cybersecurity incidents.
- **AI Research:** Studying and benchmarking multi-step reasoning and decision-making models.

### Downstream Use
- **Cybersecurity Training Programs:** Training AI models to detect vulnerabilities, analyze logs, and identify attack patterns.
- **Question-Answering Applications:** Developing reasoning-focused QA systems for educational and research tools.
- **AI Decision Support Systems:** Building AI assistants for forensic investigations and cybersecurity monitoring.

### Out-of-Scope Use
- Tasks requiring **real-time decision-making** under high constraints.
- Applications involving **medical diagnosis** or **legal interpretations** without human oversight.

---

## Bias, Risks, and Limitations

### Known Limitations
- May **misinterpret ambiguous evidence** or scenarios that lack sufficient context.
- Performance may degrade when analyzing **multi-lingual inputs** as the training data is primarily in **English**.
- Model output can include **false positives** when assessing evidence in forensic cases.

### Recommendations
- Use outputs as **supporting evidence**, not definitive conclusions.
- Perform **manual validation** for high-stakes decision-making.
- Implement **bias-checking algorithms** when deploying in production environments.

---

## How to Get Started with the Model

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("theeseus-ai/CriticalThinker")
model = AutoModelForCausalLM.from_pretrained("theeseus-ai/CriticalThinker")

input_text = "Investigate unusual logins from multiple IP addresses in a network."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))
```

---

## Training Details

### Training Data
The model is fine-tuned on the **Critical Thinking Synthetic Dataset** available at [HuggingFace](https://huggingface.co/datasets/theeseus-ai/CriticalThinker). The dataset simulates digital forensics, cybersecurity incidents, and logical deduction scenarios.

### Training Procedure
#### Preprocessing
- Cleaned and validated JSONL format.
- Schema enforcement to ensure consistency.

#### Hyperparameters
- **Optimizer:** AdamW
- **Batch Size:** 16
- **Learning Rate:** 2e-5
- **Epochs:** 3
- **Precision:** bfloat16 (bf16) mixed precision

#### Compute Resources
- **Hardware:** NVIDIA A100 (80 GB) GPU
- **Training Time:** ~24 hours

---

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data
The dataset was split into **80% training**, **10% validation**, and **10% testing** sets.

#### Metrics
- **Accuracy:** Measures correctness of predictions.
- **F1 Score:** Evaluates precision and recall balance.
- **Log-likelihood Loss:** Assesses model confidence and robustness.

### Results
- **Accuracy:** 89.4%
- **F1 Score:** 88.7%
- **Log-likelihood Loss:** 0.21

#### Summary
The model demonstrates high performance in **logical deduction tasks** and **multi-choice reasoning problems**. It is particularly effective in identifying **patterns in digital forensics scenarios**.

---

## Environmental Impact

Carbon emissions estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute):
- **Hardware Type:** NVIDIA A100 GPU
- **Hours Used:** 24
- **Cloud Provider:** AWS
- **Compute Region:** US-East
- **Carbon Emitted:** ~30 kg CO2eq

---

## Technical Specifications

### Model Architecture and Objective
- **Architecture:** Transformer-based autoregressive model (decoder-only).
- **Objective:** Minimize cross-entropy loss for sequence prediction.

### Compute Infrastructure
- **Hardware:** NVIDIA A100 (80 GB) GPUs.
- **Frameworks:** PyTorch and HuggingFace Transformers.

---

## Citation
If you use this model, please cite it as follows:
```
@model{critical_thinker,
  author       = {Theeseus AI},
  title        = {Critical Thinker Model},
  year         = {2024},
  version      = {1.0},
  publisher    = {HuggingFace Models},
  url          = {https://huggingface.co/datasets/theeseus-ai/CriticalThinker}
}
```

---

## Contact
For questions or contributions, contact:
- **Email:** theeseus@protonmail.com
- **LinkedIn:** [Theeseus](https://www.linkedin.com/in/theeseus/)