Model Card for CodeCSE

A simple pre-trained model for code and comment sentence embeddings using contrastive learning. This model was pretrained using CodeSearchNet.

Please clone the CodeCSE repository to get GraphCodeBERTForCL and other dependencies to use this pretrained model. https://github.com/emu-se/CodeCSE

Detailed instructions are listed in the repository's README.md. Overall, you will need:

  1. GraphCodeBERT (CodeCSE uses GraphCodeBERT's input format for code)
  2. GraphCodeBERTForCL defined in codecse/codecse

Inference example

NL input example: example_nl.json

{
    "original_string": "", 
    "docstring_tokens": ["Save", "model", "to", "a", "pickle", "located", "at", "path"], 
    "url": "https://github.com/openai/baselines/blob/3301089b48c42b87b396e246ea3f56fa4bfc9678/baselines/deepq/deepq.py#L55-L72"
}

Code snippet to get the embedding of an NL document (link to complete code):

nl_json = load_example("example_nl.json")
batch = prepare_inputs(nl_json, tokenizer, args)
nl_inputs = batch[3]
with torch.no_grad():        
    nl_vec = model(input_ids=nl_inputs, sent_emb="nl")[1] 
Downloads last month
9
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.