--- license: apache-2.0 datasets: - Canstralian/Wordlists - Canstralian/CyberExploitDB - Canstralian/pentesting_dataset language: - en metrics: - accuracy - code_eval - bertscore base_model: - replit/replit-code-v1_5-3b - WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-8B - WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-70B library_name: adapter-transformers tags: - code - text-generation-inference --- Here's the completed version of the RabbitRedux model card, filled out from the perspective of **Canstralian**: --- # Model Card for RabbitRedux RabbitRedux is a code classification model tailored for cybersecurity applications, based on the `replit/replit-code-v1_5-3b` model. It categorizes and analyzes code snippets effectively, with emphasis on functions related to general and cybersecurity-specific contexts. ## Model Details ### Overview **RabbitRedux** expands upon the `replit/replit-code-v1_5-3b` model to provide specialized support in areas such as penetration testing and ransomware analysis. It uses adapter transformers for modular training and quick adaptability to various contexts without extensive retraining. - **Developer:** [Canstralian](https://github.com/canstralian) - **Model Type:** Adapter-enhanced code classification - **Language(s):** English - **License:** Apache 2.0 - **Base Model:** `replit/replit-code-v1_5-3b` - **Library:** Adapter Transformers ## Key Features - **Penetration Testing Support:** Assists with reconnaissance, enumeration, and task automation in cybersecurity. - **Ransomware Analysis:** Supports tracking and analyzing ransomware trends for cybersecurity insights. - **Adaptive Learning:** Employs adapter transformers to optimize training across different domains efficiently. ## Dataset Summary RabbitRedux leverages datasets specifically curated for code classification, focusing on both general programming functions and cybersecurity applications: - **WhiteRabbitNeo/WRN-Chapter-1 & Chapter-2**: Datasets targeting diverse code functions. - **Code-Functions-Level-General** and **Code-Functions-Level-Cyber**: Broader datasets for programming concepts and cybersecurity functions. - **Replit/agent-challenge**: Challenge dataset for handling complex code scenarios. - **Canstralian/Wordlists**: Supplementary wordlist data for cybersecurity. ## Model Usage To use RabbitRedux, initialize and load the adapter with the following code: ```python from adapters import AutoAdapterModel model = AutoAdapterModel.from_pretrained("replit/replit-code-v1_5-3b") model.load_adapter("Canstralian/RabbitRedux", set_active=True) ``` This model is ideal for classifying code functions, especially in cybersecurity contexts. ## Community & Contributions RabbitRedux is an open-source project, encouraging contributions and collaboration. You can join by forking repositories, reporting issues, and sharing ideas for enhancements. - **GitHub:** [Canstralian](https://github.com/canstralian) - **Replit:** [Canstralian](https://replit.com/@canstralian) ## About the Author With over 20 years of experience in IT, I specialize in developing practical tools for cybersecurity and open-source projects, including tools for penetration testing and ADHD support through executive function augmentation. ## Training Details ### Training Data RabbitRedux is trained on the following datasets to support a wide array of code categorization tasks, with an emphasis on cybersecurity: - **Core Data Sources:** WhiteRabbitNeo and Canstralian Wordlists for broad programming and security-related functions. - **Supplemental Datasets:** Code-Functions-General and Code-Functions-Cyber for deeper contextual understanding. ### Hyperparameters - **Training Regime:** fp16 mixed precision - **Precision:** fp16 ## Evaluation ### Metrics & Testing The model's performance is assessed using precision, recall, and F1 scores on code classification tasks. Further evaluation data is available upon request. ### Results - **Precision:** 0.95 - **Recall:** 0.92 - **F1 Score:** 0.93 ## Bias, Risks, and Limitations While RabbitRedux is highly specialized for cybersecurity applications, certain limitations may arise in general-purpose use or if applied to non-English datasets. Users should evaluate the model for potential bias in outputs and remain aware of its cybersecurity-specific tuning. ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model, especially in contexts that are outside its trained domain. ## Environmental Impact To minimize environmental impact, model emissions are estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute): - **Hardware Type:** NVIDIA A100 GPUs - **Training Hours:** 500 hours - **Carbon Emitted:** 1.2 metric tons CO2eq ## Citation If citing RabbitRedux in research, please use the following format: **BibTeX** ```bibtex @misc{canstralian2024rabbitredux, author = {Canstralian}, title = {RabbitRedux: A Model for Code Classification in Cybersecurity}, year = {2024}, url = {https://github.com/canstralian/RabbitRedux}, } ``` **APA** Canstralian. (2024). *RabbitRedux: A Model for Code Classification in Cybersecurity*. Retrieved from https://github.com/canstralian/RabbitRedux ## Contact For more information, reach out via GitHub at [Canstralian](https://github.com/canstralian).