You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

RF 48 Sectors Classification Model

Overview

This machine learning model is a Random Forest classifier designed to categorize datasets into 48 predefined sectors based on column names. By leveraging BERT embeddings and a sophisticated Random Forest algorithm, the model provides intelligent sector classification for various types of datasets.

Model Details

  • Model Type: Random Forest Classifier
  • Embedding Method: BERT (bert-base-uncased)
  • Number of Sectors: 48
  • Classification Approach: Column name embedding and prediction

48 Supported Sectors

The model can classify datasets into the following sectors:

  1. Agriculture Sector

    • Crop Production
    • Livestock Farming
    • Agricultural Equipment
    • Agri-tech
  2. Banking & Finance Sector

    • Retail Banking
    • Corporate Banking
    • Investment Banking
    • Digital Banking
    • Asset Management
    • Securities & Investments
    • Financial Planning & Advice
  3. Construction & Infrastructure

    • Residential Construction
    • Commercial Construction
    • Industrial Construction
    • Infrastructure
  4. Consulting Sector

    • Management Consulting
    • IT Consulting
    • Human Resources Consulting
    • Legal Consulting
  5. Education Sector

    • Early Childhood Education
    • Primary & Secondary Education
    • Higher Education
    • Adult Education & Vocational Training
  6. Engineering Sector

    • Civil Engineering
    • Mechanical Engineering
    • Electrical Engineering
    • Chemical Engineering
  7. Entertainment & Media

    • Film & Television
    • Music Industry
    • Video Games
    • Live Events
  8. Environmental Sector

    • Environmental Protection
    • Waste Management
    • Renewable Energy
    • Wildlife Conservation
  9. Insurance Sector

    • General Insurance Services
    • Life Insurance
    • Health Insurance
    • Property & Casualty Insurance
    • Reinsurance
  10. Food Industry

    • Food Processing
    • Food Retail
    • Food Services
    • Food Safety & Quality Control
  11. Healthcare Sector

    • Hospitals
    • Clinics & Outpatient Care
    • Pharmaceuticals
    • Medical Equipment & Supplies

Installation

pip install transformers torch joblib scikit-learn

Usage

from transformers import BertTokenizer, BertModel
import joblib
import torch

# Initialize model
bert_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
bert_model = BertModel.from_pretrained('bert-base-uncased', ignore_mismatched_sizes=True)

# Download and load the Random Forest model
model_path = hf_hub_download(repo_id="Mageswaran/rf_48_sectors", filename="model_48_sectors.pkl")
label_encoder_path = hf_hub_download(repo_id="Mageswaran/rf_48_sectors", filename="label_encoder_48_sectors.pkl")

rf = joblib.load(model_path)
label_encoder = joblib.load(label_encoder_path)

def predict_sector(column_names):
    # Convert column names to BERT embeddings
    embeddings = get_bert_embeddings([column_names])
    
    # Predict sector
    prediction = rf.predict(embeddings)
    return label_encoder.inverse_transform(prediction)[0]

# Example
column_names = "clinical_trail_duration, computer_analysis_score, customer_feedback_score"
sector = predict_sector(column_names)
print(f"Predicted Sector: {sector}")

Model Performance

  • Embedding Technique: BERT embeddings from 'bert-base-uncased'
  • Classification Algorithm: Random Forest
  • Unique Feature: Sector classification based on column name semantics

Limitations

  • Model performance depends on the semantic similarity of column names to training data
  • Works best with column names that clearly represent the dataset's domain
  • Requires careful preprocessing of column names

Contributing

Contributions, issues, and feature requests are welcome! Feel free to check the issues page.

License and Usage Restrictions

Proprietary Usage Policy

IMPORTANT: This model is NOT freely available for unrestricted use.

Usage Restrictions

  • Prior written permission is REQUIRED before using this model
  • Commercial use is strictly prohibited without explicit authorization
  • Academic or research use requires formal permission from the model's creator
  • Unauthorized use, distribution, or reproduction is prohibited

Licensing Terms

  • This model is protected under proprietary intellectual property rights
  • Any use of the model requires a formal licensing agreement
  • Contact the model's creator for licensing inquiries and permissions

Permissions and Inquiries

To request permission for model usage, please contact:

  • Email: [Your Contact Email]
  • Hugging Face Profile: [Your Hugging Face Profile URL]

Unauthorized use will result in legal action.

Contact

[email protected]

Citing this Model

If you use this model in your research, please cite it using the following BibTeX entry:

@misc{mageswaran_rf_48_sectors,
  title = {Random Forest 48 Sectors Classification Model},
  author = {Mageswaran},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Mageswaran/rf_48_sectors}}
}

Additional Resources

Acknowledgments

  • Hugging Face Transformers
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Inference API (serverless) has been turned off for this model.