RabbitRedux / README.md
Canstralian's picture
Update README.md
dd3e589 verified
|
raw
history blame
5.47 kB
---
license: mit
datasets:
- Canstralian/Wordlists
- Canstralian/CyberExploitDB
- Canstralian/pentesting_dataset
- Canstralian/ShellCommands
language:
- en
metrics:
- accuracy
- code_eval
- bertscore
base_model:
- replit/replit-code-v1_5-3b
- WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-8B
- WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-70B
library_name: transformers
tags:
- code
- text-generation-inference
---
Here's the completed version of the RabbitRedux model card, filled out from the perspective of **Canstralian**:
---
# Model Card for RabbitRedux
RabbitRedux is a code classification model tailored for cybersecurity applications, based on the `replit/replit-code-v1_5-3b` model. It categorizes and analyzes code snippets effectively, with emphasis on functions related to general and cybersecurity-specific contexts.
## Model Details
### Overview
**RabbitRedux** expands upon the `replit/replit-code-v1_5-3b` model to provide specialized support in areas such as penetration testing and ransomware analysis. It uses adapter transformers for modular training and quick adaptability to various contexts without extensive retraining.
- **Developer:** [Canstralian](https://github.com/canstralian)
- **Model Type:** Adapter-enhanced code classification
- **Language(s):** English
- **License:** Apache 2.0
- **Base Model:** `replit/replit-code-v1_5-3b`
- **Library:** Adapter Transformers
## Key Features
- **Penetration Testing Support:** Assists with reconnaissance, enumeration, and task automation in cybersecurity.
- **Ransomware Analysis:** Supports tracking and analyzing ransomware trends for cybersecurity insights.
- **Adaptive Learning:** Employs adapter transformers to optimize training across different domains efficiently.
## Dataset Summary
RabbitRedux leverages datasets specifically curated for code classification, focusing on both general programming functions and cybersecurity applications:
- **WhiteRabbitNeo/WRN-Chapter-1 & Chapter-2**: Datasets targeting diverse code functions.
- **Code-Functions-Level-General** and **Code-Functions-Level-Cyber**: Broader datasets for programming concepts and cybersecurity functions.
- **Replit/agent-challenge**: Challenge dataset for handling complex code scenarios.
- **Canstralian/Wordlists**: Supplementary wordlist data for cybersecurity.
## Model Usage
To use RabbitRedux, initialize and load the adapter with the following code:
```python
from adapters import AutoAdapterModel
model = AutoAdapterModel.from_pretrained("replit/replit-code-v1_5-3b")
model.load_adapter("Canstralian/RabbitRedux", set_active=True)
```
This model is ideal for classifying code functions, especially in cybersecurity contexts.
## Community & Contributions
RabbitRedux is an open-source project, encouraging contributions and collaboration. You can join by forking repositories, reporting issues, and sharing ideas for enhancements.
- **GitHub:** [Canstralian](https://github.com/canstralian)
- **Replit:** [Canstralian](https://replit.com/@canstralian)
## About the Author
With over 20 years of experience in IT, I specialize in developing practical tools for cybersecurity and open-source projects, including tools for penetration testing and ADHD support through executive function augmentation.
## Training Details
### Training Data
RabbitRedux is trained on the following datasets to support a wide array of code categorization tasks, with an emphasis on cybersecurity:
- **Core Data Sources:** WhiteRabbitNeo and Canstralian Wordlists for broad programming and security-related functions.
- **Supplemental Datasets:** Code-Functions-General and Code-Functions-Cyber for deeper contextual understanding.
### Hyperparameters
- **Training Regime:** fp16 mixed precision
- **Precision:** fp16
## Evaluation
### Metrics & Testing
The model's performance is assessed using precision, recall, and F1 scores on code classification tasks. Further evaluation data is available upon request.
### Results
- **Precision:** 0.95
- **Recall:** 0.92
- **F1 Score:** 0.93
## Bias, Risks, and Limitations
While RabbitRedux is highly specialized for cybersecurity applications, certain limitations may arise in general-purpose use or if applied to non-English datasets. Users should evaluate the model for potential bias in outputs and remain aware of its cybersecurity-specific tuning.
### Recommendations
Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model, especially in contexts that are outside its trained domain.
## Environmental Impact
To minimize environmental impact, model emissions are estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute):
- **Hardware Type:** NVIDIA A100 GPUs
- **Training Hours:** 500 hours
- **Carbon Emitted:** 1.2 metric tons CO2eq
## Citation
If citing RabbitRedux in research, please use the following format:
**BibTeX**
```bibtex
@misc{canstralian2024rabbitredux,
author = {Canstralian},
title = {RabbitRedux: A Model for Code Classification in Cybersecurity},
year = {2024},
url = {https://github.com/canstralian/RabbitRedux},
}
```
**APA**
Canstralian. (2024). *RabbitRedux: A Model for Code Classification in Cybersecurity*. Retrieved from https://github.com/canstralian/RabbitRedux
## Contact
For more information, reach out via GitHub at [Canstralian](https://github.com/canstralian).