A simple Phi-2 model fine-tuned on a function identification task of disassembled binary functions. It will output function names as a JSON object. You can use the following code to identify a function name:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "seanmor5/phi-2-function-identification",
    attn_implementation="flash_attention_2",
    torch_dtype=torch.bfloat16,
)
model.to(torch.device("cuda"))
tokenizer = AutoTokenizer.from_pretrained("seanmor5/phi-2-function-identification")

def prompt(code):
    return (
        "Input: Given the following disassembled code, provide a descriptive"
        + " function name for the code. Your function name should"
        + " accurately describe the purpose of the code. It should"
        + " be formatted in C style with lowercase and snakecase."
        + f" Only output the name as valid JSON, e.g. {json.dumps({'name': 'function_name'})}"
        + f"\nCode: {code}\nOutput:"
    )

def identify_function(code):
    eos_tokens = tokenizer.convert_tokens_to_ids(['"}', "<|endoftext|>"])
    inputs = tokenizer(prompt(func), return_tensors="pt")
    inputs.to(torch.device("cuda"))

    outputs = model.generate(**inputs, max_new_tokens=64, eos_token_id=eos_tokens)
    text = tokenizer.batch_decode(outputs[:, inputs["input_ids"].shape[1] :])[0]
    return text

func = """
void fcn.140030b80(ulong param_1, ulong param_2, ulong param_3) {
    ulong uVar1; uVar1 = fcn.140030ae0(param_3);
    fcn.14002efc0(param_1, param_2, uVar1); return;
}
"""

print(identify_function(func))

The model tends to repeat itself excessively, so you should set the EOS token to "} when generating.

Downloads last month
11
Safetensors
Model size
2.78B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train seanmor5/phi-2-function-identification-v0.1