Canstralian commited on
Commit
9d84683
·
verified ·
1 Parent(s): bff2939

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -1
README.md CHANGED
@@ -18,4 +18,84 @@ library_name: transformers
18
  tags:
19
  - code
20
  - text-generation-inference
21
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  tags:
19
  - code
20
  - text-generation-inference
21
+ ---
22
+
23
+ **Model Card for the Code Generation Model**
24
+
25
+ **Model Details**
26
+
27
+ - **Model Name**: CodeGen-Enhanced
28
+ - **Model ID**: codegen-enhanced-v1
29
+ - **License**: MIT
30
+ - **Base Models**:
31
+ - replit/replit-code-v1_5-3b
32
+ - WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-8B
33
+ - WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-70B
34
+
35
+ **Model Description**
36
+
37
+ CodeGen-Enhanced is a state-of-the-art code generation model designed to assist developers by generating code snippets, completing code blocks, and providing code-related suggestions. It leverages advanced architectures, including Replit's Code v1.5 and WhiteRabbitNeo's Llama series, to deliver high-quality code generation across multiple programming languages.
38
+
39
+ **Training Data**
40
+
41
+ The model was trained on a diverse dataset comprising:
42
+
43
+ - **Wordlists**: A comprehensive collection of programming language keywords and syntax.
44
+ - **CyberExploitDB**: A curated database of cybersecurity exploits and related code snippets.
45
+ - **Pentesting Dataset**: A compilation of penetration testing scripts and tools.
46
+ - **Shell Commands**: A repository of Unix/Linux shell commands and scripts.
47
+
48
+ These datasets were sourced from Canstralian's repositories:
49
+
50
+ - Canstralian/Wordlists
51
+ - Canstralian/CyberExploitDB
52
+ - Canstralian/pentesting_dataset
53
+ - Canstralian/ShellCommands
54
+
55
+ **Intended Use**
56
+
57
+ CodeGen-Enhanced is intended for:
58
+
59
+ - **Code Completion**: Assisting developers by suggesting code completions in real-time.
60
+ - **Code Generation**: Creating boilerplate code or entire functions based on user prompts.
61
+ - **Educational Purposes**: Serving as a learning tool for understanding coding patterns and best practices.
62
+
63
+ **Performance Metrics**
64
+
65
+ The model's performance was evaluated using the following metrics:
66
+
67
+ - **Accuracy**: Measures the correctness of the generated code snippets.
68
+ - **Code Evaluation**: Assesses the functionality and efficiency of the generated code through execution tests.
69
+
70
+ **Ethical Considerations**
71
+
72
+ While CodeGen-Enhanced aims to provide accurate and helpful code suggestions, users should:
73
+
74
+ - **Verify Generated Code**: Always review and test generated code to ensure it meets security and performance standards.
75
+ - **Avoid Sensitive Data**: Do not input sensitive or proprietary information into the model to prevent potential data leakage.
76
+
77
+ **Limitations**
78
+
79
+ CodeGen-Enhanced may:
80
+
81
+ - **Produce Inaccurate Code**: Occasionally generate code with errors or inefficiencies.
82
+ - **Lack Context**: May not fully understand the broader context of a project, leading to less relevant suggestions.
83
+
84
+ **Future Improvements**
85
+
86
+ Plans for future enhancements include:
87
+
88
+ - **Expanded Language Support**: Incorporating additional programming languages to broaden usability.
89
+ - **Contextual Understanding**: Improving the model's ability to comprehend and generate context-aware code snippets.
90
+
91
+ **Acknowledgments**
92
+
93
+ We acknowledge the contributions of the Canstralian community for providing the datasets used in training and the open-source community for developing the base models.
94
+
95
+ **References**
96
+
97
+ - [Replit Code v1.5 Model Card](https://huggingface.co/replit/replit-code-v1_5-3b)
98
+ - [WhiteRabbitNeo Llama-3.1 Model Card](https://huggingface.co/WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-8B)
99
+ - [Canstralian GitHub Repositories](https://github.com/canstralian)
100
+
101
+ This model card provides a comprehensive overview of the CodeGen-Enhanced model, its capabilities, and considerations for its use.