File size: 4,170 Bytes
75e092b 9a3a8fd 34b34de 9a3a8fd 0462837 9a3a8fd 0462837 0011b35 9a3a8fd 0462837 9a3a8fd 0462837 0011b35 34b34de 9a3a8fd 0011b35 34b34de 0011b35 34b34de 0011b35 34b34de 0011b35 34b34de 0011b35 34b34de 0011b35 9a3a8fd 75e092b 9a3a8fd 75e092b 0462837 0011b35 0462837 0011b35 0462837 9a3a8fd 0462837 34b34de 0462837 9a3a8fd 34b34de 0011b35 0462837 34b34de 0462837 34b34de 0011b35 34b34de 0462837 75e092b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
---
license: mit
pipeline_tag: text-classification
tags:
- TEXT
- MODEL
---
Text Detector
## Model Description
This model is designed to detect whether a text is AI-generated or human-written. It uses XLM-RoBERTa architecture for accurate multilingual text classification.
## Model Usage
### Python Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("yaya36095/text-detector")
model = AutoModelForSequenceClassification.from_pretrained("yaya36095/text-detector")
def detect_text(text):
# Tokenize input
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
# Get prediction
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Process results
scores = predictions[0].tolist()
results = [
{"label": "HUMAN", "score": scores[0]},
{"label": "AI", "score": scores[1]}
]
return {
"prediction": results[0]["label"],
"confidence": f"{results[0]['score']*100:.2f}%",
"detailed_scores": [
f"{r['label']}: {r['score']*100:.2f}%"
for r in results
]
}
```
### API Usage (Supabase Edge Function)
```typescript
import { serve } from 'https://deno.land/[email protected]/http/server.ts'
const corsHeaders = {
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Headers': 'authorization, x-client-info, apikey, content-type',
}
serve(async (req) => {
if (req.method === 'OPTIONS') {
return new Response('ok', { headers: corsHeaders })
}
try {
const { text } = await req.json()
if (!text) throw new Error('No text provided')
const response = await fetch(
`/static-proxy?url=https%3A%2F%2Fapi-inference.huggingface.co%2Fmodels%2Fyaya36095%2Ftext-detector%60%2C%3C%2Fspan%3E
{
method: 'POST',
headers: {
'Authorization': `Bearer ${Deno.env.get('HUGGINGFACE_API_KEY')}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
inputs: text,
options: {
wait_for_model: true,
use_cache: true
}
})
}
)
if (!response.ok) {
const errorData = await response.json().catch(() => ({}))
throw new Error(`API error: ${response.statusText}`)
}
const result = await response.json()
const formattedResult = {
success: true,
prediction: result[0].label,
confidence: `${(result[0].score * 100).toFixed(2)}%`,
detailed_scores: result.map(r => ({
label: r.label,
score: `${(r.score * 100).toFixed(2)}%`
}))
}
return new Response(
JSON.stringify(formattedResult),
{ headers: { 'Content-Type': 'application/json', ...corsHeaders } }
)
} catch (error) {
return new Response(
JSON.stringify({
success: false,
error: 'Error analyzing text',
details: error.message
}),
{ status: 500, headers: { 'Content-Type': 'application/json', ...corsHeaders } }
)
}
})
```
### Examples
#### Example Response
```json
{
"success": true,
"prediction": "HUMAN",
"confidence": "92.45%",
"detailed_scores": [
{
"label": "HUMAN",
"score": "92.45%"
},
{
"label": "AI",
"score": "7.55%"
}
]
}
```
## Technical Details
- **Architecture**: XLM-RoBERTa
- **Task**: Text Classification (Human vs AI)
- **Model Size**: ~1.1GB
- **Max Length**: 512 tokens
- **Languages**: Multilingual support
## Requirements
- `transformers>=4.30.0`
- `torch>=2.0.0`
## Limitations
### Text Length:
- Best results with texts longer than 3-4 sentences.
- Maximum input length: 512 tokens.
### Language Support:
- Works with multiple languages.
- Performance may vary by language.
### AI Detection:
- Trained on current AI text patterns.
- May need updates as AI technology evolves.
## Developer
- **Created by**: yaya36095
- **License**: MIT
- **Repository**: https://huggingface.co/yaya36095/text-detector |