--- library_name: transformers tags: - unsloth - trl - sft base_model: - meta-llama/Llama-3.1-8B-Instruct --- # Model Card for Critical Thinker ## Model Details ### Model Description The **Critical Thinker** model is a fine-tuned version of **meta-llama/Llama-3.1-8B-Instruct**, optimized for developing and evaluating **critical thinking** and **investigative reasoning** skills. It is specifically trained on the **Critical Thinking Synthetic Dataset**, which focuses on logical reasoning, forensic investigation, and multi-layered decision-making scenarios. - **Developed by:** Theeseus AI - **Funded by [optional]:** Independent Research Grant - **Shared by:** [Theeseus AI](https://www.linkedin.com/in/theeseus/) - **Model type:** Transformer-based Language Model - **Language(s):** English - **License:** Apache 2.0 - **Finetuned from model:** meta-llama/Llama-3.1-8B-Instruct ### Model Sources - **Repository:** [Critical Thinker on HuggingFace](https://huggingface.co/datasets/theeseus-ai/CriticalThinker) - **Dataset:** [Critical Thinking Dataset](https://huggingface.co/datasets/theeseus-ai/CriticalThinker) --- ## Uses ### Direct Use - **Critical Thinking Assessments:** Evaluating logical reasoning and problem-solving capabilities. - **Digital Forensics Investigations:** Testing AI capabilities in analyzing logs, metadata, and cybersecurity incidents. - **AI Research:** Studying and benchmarking multi-step reasoning and decision-making models. ### Downstream Use - **Cybersecurity Training Programs:** Training AI models to detect vulnerabilities, analyze logs, and identify attack patterns. - **Question-Answering Applications:** Developing reasoning-focused QA systems for educational and research tools. - **AI Decision Support Systems:** Building AI assistants for forensic investigations and cybersecurity monitoring. ### Out-of-Scope Use - Tasks requiring **real-time decision-making** under high constraints. - Applications involving **medical diagnosis** or **legal interpretations** without human oversight. --- ## Bias, Risks, and Limitations ### Known Limitations - May **misinterpret ambiguous evidence** or scenarios that lack sufficient context. - Performance may degrade when analyzing **multi-lingual inputs** as the training data is primarily in **English**. - Model output can include **false positives** when assessing evidence in forensic cases. ### Recommendations - Use outputs as **supporting evidence**, not definitive conclusions. - Perform **manual validation** for high-stakes decision-making. - Implement **bias-checking algorithms** when deploying in production environments. --- ## How to Get Started with the Model ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("theeseus-ai/CriticalThinker") model = AutoModelForCausalLM.from_pretrained("theeseus-ai/CriticalThinker") input_text = "Investigate unusual logins from multiple IP addresses in a network." inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(**inputs) print(tokenizer.decode(outputs[0])) ``` --- ## Training Details ### Training Data The model is fine-tuned on the **Critical Thinking Synthetic Dataset** available at [HuggingFace](https://huggingface.co/datasets/theeseus-ai/CriticalThinker). The dataset simulates digital forensics, cybersecurity incidents, and logical deduction scenarios. ### Training Procedure #### Preprocessing - Cleaned and validated JSONL format. - Schema enforcement to ensure consistency. #### Hyperparameters - **Optimizer:** AdamW - **Batch Size:** 16 - **Learning Rate:** 2e-5 - **Epochs:** 3 - **Precision:** bfloat16 (bf16) mixed precision #### Compute Resources - **Hardware:** NVIDIA A100 (80 GB) GPU - **Training Time:** ~24 hours --- ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data The dataset was split into **80% training**, **10% validation**, and **10% testing** sets. #### Metrics - **Accuracy:** Measures correctness of predictions. - **F1 Score:** Evaluates precision and recall balance. - **Log-likelihood Loss:** Assesses model confidence and robustness. ### Results - **Accuracy:** 89.4% - **F1 Score:** 88.7% - **Log-likelihood Loss:** 0.21 #### Summary The model demonstrates high performance in **logical deduction tasks** and **multi-choice reasoning problems**. It is particularly effective in identifying **patterns in digital forensics scenarios**. --- ## Environmental Impact Carbon emissions estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute): - **Hardware Type:** NVIDIA A100 GPU - **Hours Used:** 24 - **Cloud Provider:** AWS - **Compute Region:** US-East - **Carbon Emitted:** ~30 kg CO2eq --- ## Technical Specifications ### Model Architecture and Objective - **Architecture:** Transformer-based autoregressive model (decoder-only). - **Objective:** Minimize cross-entropy loss for sequence prediction. ### Compute Infrastructure - **Hardware:** NVIDIA A100 (80 GB) GPUs. - **Frameworks:** PyTorch and HuggingFace Transformers. --- ## Citation If you use this model, please cite it as follows: ``` @model{critical_thinker, author = {Theeseus AI}, title = {Critical Thinker Model}, year = {2024}, version = {1.0}, publisher = {HuggingFace Models}, url = {https://huggingface.co/datasets/theeseus-ai/CriticalThinker} } ``` --- ## Contact For questions or contributions, contact: - **Email:** theeseus@protonmail.com - **LinkedIn:** [Theeseus](https://www.linkedin.com/in/theeseus/)