metadata

license: mit
datasets:
  - HuggingFaceFW/fineweb-2
  - amphora/QwQ-LongCoT-130K
  - bigcode/the-stack
  - codeparrot/github-code
  - code_search_net/code_search_net
  - google/pythia-code-dataset
  - DeepMind/alphacode_data
  - jsdatasets/crosswoz
  - google/web-questions-sp
  - facebook/react
  - react-community/react-native-datasets
  - nodejs/node-test-commit
  - your-org/awesome-nodejs-curated
  - edx/edx-platform
  - django/django
  - W3C/web-platform-tests
  - your-org/diverse-html-dataset
  - DeepMind/alphamind_data
  - OpenAI/human-eval
language:
  - en
metrics:
  - accuracy
  - code_bleu
  - execution_accuracy
  - unit_test_accuracy
  - code_coverage
  - human_evaluation_results
base_model:
  - codellama/CodeLlama-70b-Instruct-hf
  - prithivMLmods/Codepy-Deepthink-3B
pipeline_tag: text-generation
tags:
  - code
  - ide
  - code-generation
  - code-completion
  - code-refactoring
  - bug-detection
  - code-review
  - security
  - best-practices
  - web-development
  - react
  - nodejs
  - python
  - html
inference:
  optimizations:
    - quantization

Detailed Model Description (Fill this in after training)

Model Description

This model is designed to power an AI-driven IDE with a focus on web development, particularly React, Node.js, Python, and HTML. It has been trained on a diverse range of datasets, including:

General web text and code for broad language understanding.
Code in multiple programming languages (with a focus on web-related languages).
Datasets specifically related to React, Node.js, and general web development tasks.
Data to enhance deep thinking and reasoning capabilities.
Synthetic and/or collected data simulating IDE interactions (code editing, debugging, UI element navigation).
Datasets focused on security vulnerabilities and coding best practices.

The model is intended to assist developers with:

Code generation
Code completion
Code refactoring
Bug detection and fixing
Code review
Adherence to security and best practices

Intended Uses & Limitations

Intended Use: To be integrated into an IDE to enhance developer productivity and code quality, especially in the context of web development.
Limitations:
- The model may still generate incorrect or suboptimal code. Human oversight is always required.
- Performance may vary across programming languages and specific coding tasks.
- The model's knowledge is limited to the data it was trained on.

Evaluation Results

Provide detailed quantitative evaluation results using the metrics specified above.
Summarize the findings from human evaluations and user studies.

Training Procedure

Describe the fine-tuning process, including hyperparameters, training duration, and any special techniques used.

Ethical Considerations

Discuss any potential biases in the training data or model behavior.
Address the responsible use of AI for code generation.