|
--- |
|
license: wtfpl |
|
datasets: |
|
- cakiki/rosetta-code |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
library_name: transformers |
|
pipeline_tag: text-classification |
|
tags: |
|
- code |
|
- programming-language |
|
- code-classification |
|
base_model: huggingface/CodeBERTa-small-v1 |
|
--- |
|
This Model is a fine-tuned version of *huggingface/CodeBERTa-small-v1* on *cakiki/rosetta-code* Dataset for 26 Programming Languages as mentioned below. |
|
## Training Details: |
|
Model is trained for 25 epochs on Azure for nearly 26000 Datapoints for above Mentioned 26 Programming Languages<br> extracted from Dataset having 1006 of total Programming Language. |
|
### Programming Languages this model is able to detect vs Examples used for training |
|
<ol> |
|
<li>'ARM Assembly':</li> |
|
<li>'AppleScript'</li> |
|
<li>'C'</li> |
|
<li>'C#'</li> |
|
<li>'C++'</li> |
|
<li>'COBOL'</li> |
|
<li>'Erlang'</li> |
|
<li>'Fortran'</li> |
|
<li>'Go'</li> |
|
<li>'Java'</li> |
|
<li>'JavaScript'</li> |
|
<li>'Kotlin'</li> |
|
<li>'Lua</li> |
|
<li>'Mathematica/Wolfram Language'</li> |
|
<li>'PHP'</li> |
|
<li>'Pascal'</li> |
|
<li>'Perl'</li> |
|
<li>'PowerShell'</li> |
|
<li>'Python'</li> |
|
<li>'R</li> |
|
<li>'Ruby'</li> |
|
<li>'Rust'</li> |
|
<li>'Scala'</li> |
|
<li>'Swift'</li> |
|
<li>'Visual Basic .NET'</li> |
|
<li>'jq'</li> |
|
</ol> |
|
<br> |
|
|
|
## Below is the Training Result for 25 epochs. |
|
<ul> |
|
<li>Training Computer Configuration: <ul> |
|
<li>GPU:1xNvidia Tesla T4, </li> |
|
<li>VRam: 16GB,</li> |
|
<li>Ram:112GB,</li> |
|
<li>Cores:6 Cores </li> |
|
</ul></li> |
|
|
|
<li>Training Time taken: exactly 7 hours for 25 epochs</li> |
|
<li>Training Hyper-parameters: </li> |
|
</ul> |
|
|
|
|
|
|
|
![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F645c859ad90782b1a6a3e957%2FYIYl1XZk0zpi3DCvn3D80.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END --> |
|
|
|
|
|
|
|
![training detail.png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F645c859ad90782b1a6a3e957%2FOi9TuJ8nEjtt6Z_W56myn.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END --> |
|
|
|
## Inference Code |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline |
|
model_name = 'philomath-1209/programming-language-identification' |
|
loaded_tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
loaded_model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') |
|
text = """ |
|
PROGRAM Triangle |
|
IMPLICIT NONE |
|
REAL :: a, b, c, Area |
|
PRINT *, 'Welcome, please enter the& |
|
&lengths of the 3 sides.' |
|
READ *, a, b, c |
|
PRINT *, 'Triangle''s area: ', Area(a,b,c) |
|
END PROGRAM Triangle |
|
FUNCTION Area(x,y,z) |
|
IMPLICIT NONE |
|
REAL :: Area ! function type |
|
REAL, INTENT( IN ) :: x, y, z |
|
REAL :: theta, height |
|
theta = ACOS((x**2+y**2-z**2)/(2.0*x*y)) |
|
height = x*SIN(theta); Area = 0.5*y*height |
|
END FUNCTION Area |
|
|
|
""" |
|
inputs = loaded_tokenizer(text, return_tensors="pt",truncation=True) |
|
with torch.no_grad(): |
|
logits = loaded_model(**inputs).logits |
|
predicted_class_id = logits.argmax().item() |
|
loaded_model.config.id2label[predicted_class_id] |
|
``` |