Token Classification Model

Description

This project involves developing a machine learning model for token classification, specifically for Named Entity Recognition (NER). Using a fine-tuned BERT model from the Hugging Face library, this system classifies tokens in text into predefined categories like names, locations, and dates.

The model is trained on a dataset annotated with entity labels to accurately classify each token. This token classification system is useful for information extraction, document processing, and conversational AI applications.

Technologies Used

Dataset

  • Source: Kaggle: conll2003
  • Purpose: Contains text data with annotated entities for token classification.

Model

  • Base Model: BERT (bert-base-uncased)
  • Library: Hugging Face transformers
  • Task: Token Classification (Named Entity Recognition)

Approach

Preprocessing:

  • Load and preprocess the dataset.
  • Tokenize the text data and align labels with tokens.

Fine-Tuning:

  • Fine-tune the BERT model on the token classification dataset.

Training:

  • Train the model to classify each token into predefined entity labels.

Inference:

  • Use the trained model to predict entity labels for new text inputs.

Key Technologies

  • Deep Learning (BERT): For advanced token classification and contextual understanding.
  • Natural Language Processing (NLP): For text preprocessing, tokenization, and entity recognition.
  • Machine Learning Algorithms: For model training and prediction tasks.

Streamlit App

You can view and interact with the Streamlit app for token classification here.

Examples

Here are some examples of outputs from the model:

example1 example2

Google Colab Notebook

You can view and run the Google Colab notebook for this project here.

Acknowledgements

  • Hugging Face for transformer models and libraries.
  • Streamlit for creating the interactive web interface.
  • [Your Dataset Provider] for the token classification dataset.

Author

Feedback

If you have any feedback, please reach out to us at [email protected].

Downloads last month
10
Safetensors
Model size
108M params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train AdilHayat173/token_classification

Space using AdilHayat173/token_classification 1