RabbitRedux / README.md
Canstralian's picture
Update README.md
dd3e589 verified
|
raw
history blame
5.47 kB
metadata
license: mit
datasets:
  - Canstralian/Wordlists
  - Canstralian/CyberExploitDB
  - Canstralian/pentesting_dataset
  - Canstralian/ShellCommands
language:
  - en
metrics:
  - accuracy
  - code_eval
  - bertscore
base_model:
  - replit/replit-code-v1_5-3b
  - WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-8B
  - WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-70B
library_name: transformers
tags:
  - code
  - text-generation-inference

Here's the completed version of the RabbitRedux model card, filled out from the perspective of Canstralian:


Model Card for RabbitRedux

RabbitRedux is a code classification model tailored for cybersecurity applications, based on the replit/replit-code-v1_5-3b model. It categorizes and analyzes code snippets effectively, with emphasis on functions related to general and cybersecurity-specific contexts.

Model Details

Overview

RabbitRedux expands upon the replit/replit-code-v1_5-3b model to provide specialized support in areas such as penetration testing and ransomware analysis. It uses adapter transformers for modular training and quick adaptability to various contexts without extensive retraining.

  • Developer: Canstralian
  • Model Type: Adapter-enhanced code classification
  • Language(s): English
  • License: Apache 2.0
  • Base Model: replit/replit-code-v1_5-3b
  • Library: Adapter Transformers

Key Features

  • Penetration Testing Support: Assists with reconnaissance, enumeration, and task automation in cybersecurity.
  • Ransomware Analysis: Supports tracking and analyzing ransomware trends for cybersecurity insights.
  • Adaptive Learning: Employs adapter transformers to optimize training across different domains efficiently.

Dataset Summary

RabbitRedux leverages datasets specifically curated for code classification, focusing on both general programming functions and cybersecurity applications:

  • WhiteRabbitNeo/WRN-Chapter-1 & Chapter-2: Datasets targeting diverse code functions.
  • Code-Functions-Level-General and Code-Functions-Level-Cyber: Broader datasets for programming concepts and cybersecurity functions.
  • Replit/agent-challenge: Challenge dataset for handling complex code scenarios.
  • Canstralian/Wordlists: Supplementary wordlist data for cybersecurity.

Model Usage

To use RabbitRedux, initialize and load the adapter with the following code:

from adapters import AutoAdapterModel
model = AutoAdapterModel.from_pretrained("replit/replit-code-v1_5-3b")
model.load_adapter("Canstralian/RabbitRedux", set_active=True)

This model is ideal for classifying code functions, especially in cybersecurity contexts.

Community & Contributions

RabbitRedux is an open-source project, encouraging contributions and collaboration. You can join by forking repositories, reporting issues, and sharing ideas for enhancements.

About the Author

With over 20 years of experience in IT, I specialize in developing practical tools for cybersecurity and open-source projects, including tools for penetration testing and ADHD support through executive function augmentation.

Training Details

Training Data

RabbitRedux is trained on the following datasets to support a wide array of code categorization tasks, with an emphasis on cybersecurity:

  • Core Data Sources: WhiteRabbitNeo and Canstralian Wordlists for broad programming and security-related functions.
  • Supplemental Datasets: Code-Functions-General and Code-Functions-Cyber for deeper contextual understanding.

Hyperparameters

  • Training Regime: fp16 mixed precision
  • Precision: fp16

Evaluation

Metrics & Testing

The model's performance is assessed using precision, recall, and F1 scores on code classification tasks. Further evaluation data is available upon request.

Results

  • Precision: 0.95
  • Recall: 0.92
  • F1 Score: 0.93

Bias, Risks, and Limitations

While RabbitRedux is highly specialized for cybersecurity applications, certain limitations may arise in general-purpose use or if applied to non-English datasets. Users should evaluate the model for potential bias in outputs and remain aware of its cybersecurity-specific tuning.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model, especially in contexts that are outside its trained domain.

Environmental Impact

To minimize environmental impact, model emissions are estimated using the Machine Learning Impact calculator:

  • Hardware Type: NVIDIA A100 GPUs
  • Training Hours: 500 hours
  • Carbon Emitted: 1.2 metric tons CO2eq

Citation

If citing RabbitRedux in research, please use the following format:

BibTeX

@misc{canstralian2024rabbitredux,
  author = {Canstralian},
  title = {RabbitRedux: A Model for Code Classification in Cybersecurity},
  year = {2024},
  url = {https://github.com/canstralian/RabbitRedux},
}

APA
Canstralian. (2024). RabbitRedux: A Model for Code Classification in Cybersecurity. Retrieved from https://github.com/canstralian/RabbitRedux

Contact

For more information, reach out via GitHub at Canstralian.