yaya36095/text-detector · Hugging Face

Text Detector

Model Description

This model is designed to detect whether a text is AI-generated or human-written. It uses XLM-RoBERTa architecture for accurate multilingual text classification.

Model Usage

Python Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("yaya36095/text-detector")
model = AutoModelForSequenceClassification.from_pretrained("yaya36095/text-detector")

def detect_text(text):
    # Tokenize input
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    
    # Get prediction
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    
    # Process results
    scores = predictions[0].tolist()
    results = [
        {"label": "HUMAN", "score": scores[0]},
        {"label": "AI", "score": scores[1]}
    ]
    
    return {
        "prediction": results[0]["label"],
        "confidence": f"{results[0]['score']*100:.2f}%",
        "detailed_scores": [
            f"{r['label']}: {r['score']*100:.2f}%" 
            for r in results
        ]
    }

API Usage (Supabase Edge Function)

import { serve } from 'https://deno.land/[email protected]/http/server.ts'

const corsHeaders = {
  'Access-Control-Allow-Origin': '*',
  'Access-Control-Allow-Headers': 'authorization, x-client-info, apikey, content-type',
}

serve(async (req) => {
  if (req.method === 'OPTIONS') {
    return new Response('ok', { headers: corsHeaders })
  }

  try {
    const { text } = await req.json()
    if (!text) throw new Error('No text provided')

    const response = await fetch(
      `/static-proxy?url=https%3A%2F%2Fapi-inference.huggingface.co%2Fmodels%2Fyaya36095%2Ftext-detector%60%3C%2Fspan%3E%2C
      {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${Deno.env.get('HUGGINGFACE_API_KEY')}`,
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          inputs: text,
          options: {
            wait_for_model: true,
            use_cache: true
          }
        })
      }
    )

    if (!response.ok) {
      const errorData = await response.json().catch(() => ({}))
      throw new Error(`API error: ${response.statusText}`)
    }

    const result = await response.json()
    const formattedResult = {
      success: true,
      prediction: result[0].label,
      confidence: `${(result[0].score * 100).toFixed(2)}%`,
      detailed_scores: result.map(r => ({
        label: r.label,
        score: `${(r.score * 100).toFixed(2)}%`
      }))
    }

    return new Response(
      JSON.stringify(formattedResult),
      { headers: { 'Content-Type': 'application/json', ...corsHeaders } }
    )

  } catch (error) {
    return new Response(
      JSON.stringify({
        success: false,
        error: 'Error analyzing text',
        details: error.message
      }),
      { status: 500, headers: { 'Content-Type': 'application/json', ...corsHeaders } }
    )
  }
})

Examples

Example Response

{
    "success": true,
    "prediction": "HUMAN",
    "confidence": "92.45%",
    "detailed_scores": [
        {
            "label": "HUMAN",
            "score": "92.45%"
        },
        {
            "label": "AI",
            "score": "7.55%"
        }
    ]
}

Technical Details

Architecture: XLM-RoBERTa
Task: Text Classification (Human vs AI)
Model Size: ~1.1GB
Max Length: 512 tokens
Languages: Multilingual support

Requirements

transformers>=4.30.0
torch>=2.0.0

Limitations

Text Length:

Best results with texts longer than 3-4 sentences.
Maximum input length: 512 tokens.

Language Support:

Works with multiple languages.
Performance may vary by language.

AI Detection:

Trained on current AI text patterns.
May need updates as AI technology evolves.

Developer

Created by: yaya36095
License: MIT
Repository: https://huggingface.co/yaya36095/text-detector