|
---
|
|
license: mit
|
|
datasets:
|
|
- Canstralian/Wordlists
|
|
- Canstralian/CyberExploitDB
|
|
- Canstralian/pentesting_dataset
|
|
- Canstralian/ShellCommands
|
|
language:
|
|
- en
|
|
metrics:
|
|
- accuracy
|
|
- code_eval
|
|
- bertscore
|
|
base_model:
|
|
- replit/replit-code-v1_5-3b
|
|
- WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-8B
|
|
- WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-70B
|
|
library_name: transformers
|
|
tags:
|
|
- code
|
|
- text-generation-inference
|
|
---
|
|
Here's the completed version of the RabbitRedux model card, filled out from the perspective of **Canstralian**: |
|
|
|
--- |
|
|
|
# Model Card for RabbitRedux |
|
|
|
RabbitRedux is a code classification model tailored for cybersecurity applications, based on the `replit/replit-code-v1_5-3b` model. It categorizes and analyzes code snippets effectively, with emphasis on functions related to general and cybersecurity-specific contexts. |
|
|
|
## Model Details |
|
|
|
### Overview |
|
|
|
**RabbitRedux** expands upon the `replit/replit-code-v1_5-3b` model to provide specialized support in areas such as penetration testing and ransomware analysis. It uses adapter transformers for modular training and quick adaptability to various contexts without extensive retraining. |
|
|
|
- **Developer:** [Canstralian](https://github.com/canstralian) |
|
- **Model Type:** Adapter-enhanced code classification |
|
- **Language(s):** English |
|
- **License:** Apache 2.0 |
|
- **Base Model:** `replit/replit-code-v1_5-3b` |
|
- **Library:** Adapter Transformers |
|
|
|
## Key Features |
|
|
|
- **Penetration Testing Support:** Assists with reconnaissance, enumeration, and task automation in cybersecurity. |
|
- **Ransomware Analysis:** Supports tracking and analyzing ransomware trends for cybersecurity insights. |
|
- **Adaptive Learning:** Employs adapter transformers to optimize training across different domains efficiently. |
|
|
|
## Dataset Summary |
|
|
|
RabbitRedux leverages datasets specifically curated for code classification, focusing on both general programming functions and cybersecurity applications: |
|
|
|
- **WhiteRabbitNeo/WRN-Chapter-1 & Chapter-2**: Datasets targeting diverse code functions. |
|
- **Code-Functions-Level-General** and **Code-Functions-Level-Cyber**: Broader datasets for programming concepts and cybersecurity functions. |
|
- **Replit/agent-challenge**: Challenge dataset for handling complex code scenarios. |
|
- **Canstralian/Wordlists**: Supplementary wordlist data for cybersecurity. |
|
|
|
## Model Usage |
|
|
|
To use RabbitRedux, initialize and load the adapter with the following code: |
|
|
|
```python |
|
from adapters import AutoAdapterModel |
|
model = AutoAdapterModel.from_pretrained("replit/replit-code-v1_5-3b") |
|
model.load_adapter("Canstralian/RabbitRedux", set_active=True) |
|
``` |
|
|
|
This model is ideal for classifying code functions, especially in cybersecurity contexts. |
|
|
|
## Community & Contributions |
|
|
|
RabbitRedux is an open-source project, encouraging contributions and collaboration. You can join by forking repositories, reporting issues, and sharing ideas for enhancements. |
|
|
|
- **GitHub:** [Canstralian](https://github.com/canstralian) |
|
- **Replit:** [Canstralian](https://replit.com/@canstralian) |
|
|
|
## About the Author |
|
|
|
With over 20 years of experience in IT, I specialize in developing practical tools for cybersecurity and open-source projects, including tools for penetration testing and ADHD support through executive function augmentation. |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
RabbitRedux is trained on the following datasets to support a wide array of code categorization tasks, with an emphasis on cybersecurity: |
|
|
|
- **Core Data Sources:** WhiteRabbitNeo and Canstralian Wordlists for broad programming and security-related functions. |
|
- **Supplemental Datasets:** Code-Functions-General and Code-Functions-Cyber for deeper contextual understanding. |
|
|
|
### Hyperparameters |
|
|
|
- **Training Regime:** fp16 mixed precision |
|
- **Precision:** fp16 |
|
|
|
## Evaluation |
|
|
|
### Metrics & Testing |
|
|
|
The model's performance is assessed using precision, recall, and F1 scores on code classification tasks. Further evaluation data is available upon request. |
|
|
|
### Results |
|
|
|
- **Precision:** 0.95 |
|
- **Recall:** 0.92 |
|
- **F1 Score:** 0.93 |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
While RabbitRedux is highly specialized for cybersecurity applications, certain limitations may arise in general-purpose use or if applied to non-English datasets. Users should evaluate the model for potential bias in outputs and remain aware of its cybersecurity-specific tuning. |
|
|
|
### Recommendations |
|
|
|
Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model, especially in contexts that are outside its trained domain. |
|
|
|
## Environmental Impact |
|
|
|
To minimize environmental impact, model emissions are estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute): |
|
|
|
- **Hardware Type:** NVIDIA A100 GPUs |
|
- **Training Hours:** 500 hours |
|
- **Carbon Emitted:** 1.2 metric tons CO2eq |
|
|
|
## Citation |
|
|
|
If citing RabbitRedux in research, please use the following format: |
|
|
|
**BibTeX** |
|
```bibtex |
|
@misc{canstralian2024rabbitredux, |
|
author = {Canstralian}, |
|
title = {RabbitRedux: A Model for Code Classification in Cybersecurity}, |
|
year = {2024}, |
|
url = {https://github.com/canstralian/RabbitRedux}, |
|
} |
|
``` |
|
|
|
**APA** |
|
Canstralian. (2024). *RabbitRedux: A Model for Code Classification in Cybersecurity*. Retrieved from https://github.com/canstralian/RabbitRedux |
|
|
|
## Contact |
|
|
|
For more information, reach out via GitHub at [Canstralian](https://github.com/canstralian). |