CodeGen-350M-multi-xlcost

CodeGen-350M-multi-xlcost is a CodeGen model fine-tuned on the Python split of XLCost dataset.

Usage

You can load the CodeGen-350M-multi-xlcost model and tokenizer directly in transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("giulio98/codegen-350M-multi-xlcost")
model = AutoModelForCausalLM.from_pretrained("giulio98/codegen-350M-multi-xlcost")

text = tokenizer.eos_token + "\'\'\'\n" + "function to add two numbers" + "\n\'\'\'\n" + "###\n"
input_ids = tokenizer(text, return_tensors="pt").input_ids

generated_ids = model.generate(input_ids, max_length=128)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

Output:

'''
function to add two numbers 
'''
###
def add(a, b):
    return a + b

Training

The model was finetuned on XLCost-single-prompt, an improved version of the original XLCost dataset xlcost-text-to-code. Below the hyperparameters.

Hyperparameter value
Per device train batch size 8
Context size 1024
Training steps 258
Gradient accumulation 4
Gradient checkpointing True
Learning rate 1.8e-05
Weight decay 0.0
Warmup steps 10
Schedule linear

The training was executed on 1 x V100 (16GB) GPU for 6h 42m

Performance

We evaluated the model on the first 400 samples of XLCOST's XLCost-single-prompt test split and comparing the outputs of the generated codes with respect to the expected output using pass@k metric.

Metric codegen-350M-multi-xlcost codegen-350M-mono(zero-shot) codegen-350M-mono (one-shot) codegen-350M-mono(few-shot)
pass@1 3.70% 0.4% 0.35% 0.48%
pass@10 14.5% 3.5% 3 % 3.75%

The pass@k metric tells the probability that at least one out of k generations passes the tests.

Citations

@article{Nijkamp2022ACP,
  title={A Conversational Paradigm for Program Synthesis},
  author={Nijkamp, Erik and Pang, Bo and Hayashi, Hiroaki and Tu, Lifu and Wang, Huan and Zhou, Yingbo and Savarese, Silvio and Xiong, Caiming},
  journal={arXiv preprint},
  year={2022}
}
Downloads last month
14
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train giulio98/codegen-350M-multi-xlcost

Evaluation results