metadata
license: mit
datasets:
- HuggingFaceFW/fineweb-2
- amphora/QwQ-LongCoT-130K
- bigcode/the-stack
- codeparrot/github-code
- code_search_net/code_search_net
- google/pythia-code-dataset
- DeepMind/alphacode_data
- jsdatasets/crosswoz
- google/web-questions-sp
- facebook/react
- react-community/react-native-datasets
- nodejs/node-test-commit
- your-org/awesome-nodejs-curated
- edx/edx-platform
- django/django
- W3C/web-platform-tests
- your-org/diverse-html-dataset
- DeepMind/alphamind_data
- OpenAI/human-eval
language:
- en
metrics:
- accuracy
- code_bleu
- execution_accuracy
- unit_test_accuracy
- code_coverage
- human_evaluation_results
base_model:
- codellama/CodeLlama-70b-Instruct-hf
- prithivMLmods/Codepy-Deepthink-3B
pipeline_tag: text-generation
tags:
- code
- ide
- code-generation
- code-completion
- code-refactoring
- bug-detection
- code-review
- security
- best-practices
- web-development
- react
- nodejs
- python
- html
inference:
optimizations:
- quantization
Detailed Model Description (Fill this in after training)
Model Description
This model is designed to power an AI-driven IDE with a focus on web development, particularly React, Node.js, Python, and HTML. It has been trained on a diverse range of datasets, including:
- General web text and code for broad language understanding.
- Code in multiple programming languages (with a focus on web-related languages).
- Datasets specifically related to React, Node.js, and general web development tasks.
- Data to enhance deep thinking and reasoning capabilities.
- Synthetic and/or collected data simulating IDE interactions (code editing, debugging, UI element navigation).
- Datasets focused on security vulnerabilities and coding best practices.
The model is intended to assist developers with:
- Code generation
- Code completion
- Code refactoring
- Bug detection and fixing
- Code review
- Adherence to security and best practices
Intended Uses & Limitations
- Intended Use: To be integrated into an IDE to enhance developer productivity and code quality, especially in the context of web development.
- Limitations:
- The model may still generate incorrect or suboptimal code. Human oversight is always required.
- Performance may vary across programming languages and specific coding tasks.
- The model's knowledge is limited to the data it was trained on.
Evaluation Results
- Provide detailed quantitative evaluation results using the metrics specified above.
- Summarize the findings from human evaluations and user studies.
Training Procedure
- Describe the fine-tuning process, including hyperparameters, training duration, and any special techniques used.
Ethical Considerations
- Discuss any potential biases in the training data or model behavior.
- Address the responsible use of AI for code generation.