Model description
This is a logistic regression model trained on customers' credit card risk data in a bank using sklearn library. The model predicts whether a customer is worth issuing a credit card or not. The full dataset can be viewed at the following link: https://huggingface.co/datasets/saifhmb/CreditCardRisk
Training Procedure
The data preprocessing steps applied include the following:
- Dropping high cardinality features, specifically ID
- Transforming and Encoding categorical features namely: GENDER, MARITAL, HOWPAID, MORTGAGE and the target variable, RISK
- Splitting the dataset into training/test set using 85/15 split ratio
- Applying feature scaling on all features
Hyperparameters
Click to expand
Hyperparameter | Value |
---|---|
memory | |
steps | [('preprocessor', ColumnTransformer(remainder='passthrough', transformers=[('cat', Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore'))]), ['GENDER', 'MARITAL', 'HOWPAID', 'MORTGAGE']), ('num', Pipeline(steps=[('scale', StandardScaler())]), Index(['AGE', 'INCOME', 'NUMKIDS', 'NUMCARDS', 'STORECAR', 'LOANS'], dtype='object'))])), ('classifier', LogisticRegression())] |
verbose | False |
preprocessor | ColumnTransformer(remainder='passthrough', transformers=[('cat', Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore'))]), ['GENDER', 'MARITAL', 'HOWPAID', 'MORTGAGE']), ('num', Pipeline(steps=[('scale', StandardScaler())]), Index(['AGE', 'INCOME', 'NUMKIDS', 'NUMCARDS', 'STORECAR', 'LOANS'], dtype='object'))]) |
classifier | LogisticRegression() |
preprocessor__n_jobs | |
preprocessor__remainder | passthrough |
preprocessor__sparse_threshold | 0.3 |
preprocessor__transformer_weights | |
preprocessor__transformers | [('cat', Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore'))]), ['GENDER', 'MARITAL', 'HOWPAID', 'MORTGAGE']), ('num', Pipeline(steps=[('scale', StandardScaler())]), Index(['AGE', 'INCOME', 'NUMKIDS', 'NUMCARDS', 'STORECAR', 'LOANS'], dtype='object'))] |
preprocessor__verbose | False |
preprocessor__verbose_feature_names_out | True |
preprocessor__cat | Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore'))]) |
preprocessor__num | Pipeline(steps=[('scale', StandardScaler())]) |
preprocessor__cat__memory | |
preprocessor__cat__steps | [('onehot', OneHotEncoder(handle_unknown='ignore'))] |
preprocessor__cat__verbose | False |
preprocessor__cat__onehot | OneHotEncoder(handle_unknown='ignore') |
preprocessor__cat__onehot__categories | auto |
preprocessor__cat__onehot__drop | |
preprocessor__cat__onehot__dtype | <class 'numpy.float64'> |
preprocessor__cat__onehot__handle_unknown | ignore |
preprocessor__cat__onehot__max_categories | |
preprocessor__cat__onehot__min_frequency | |
preprocessor__cat__onehot__sparse | deprecated |
preprocessor__cat__onehot__sparse_output | True |
preprocessor__num__memory | |
preprocessor__num__steps | [('scale', StandardScaler())] |
preprocessor__num__verbose | False |
preprocessor__num__scale | StandardScaler() |
preprocessor__num__scale__copy | True |
preprocessor__num__scale__with_mean | True |
preprocessor__num__scale__with_std | True |
classifier__C | 1.0 |
classifier__class_weight | |
classifier__dual | False |
classifier__fit_intercept | True |
classifier__intercept_scaling | 1 |
classifier__l1_ratio | |
classifier__max_iter | 100 |
classifier__multi_class | auto |
classifier__n_jobs | |
classifier__penalty | l2 |
classifier__random_state | |
classifier__solver | lbfgs |
classifier__tol | 0.0001 |
classifier__verbose | 0 |
classifier__warm_start | False |
Model Plot
Pipeline(steps=[('preprocessor',ColumnTransformer(remainder='passthrough',transformers=[('cat',Pipeline(steps=[('onehot',OneHotEncoder(handle_unknown='ignore'))]),['GENDER', 'MARITAL','HOWPAID', 'MORTGAGE']),('num',Pipeline(steps=[('scale',StandardScaler())]),Index(['AGE', 'INCOME', 'NUMKIDS', 'NUMCARDS', 'STORECAR', 'LOANS'], dtype='object'))])),('classifier', LogisticRegression())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('preprocessor',ColumnTransformer(remainder='passthrough',transformers=[('cat',Pipeline(steps=[('onehot',OneHotEncoder(handle_unknown='ignore'))]),['GENDER', 'MARITAL','HOWPAID', 'MORTGAGE']),('num',Pipeline(steps=[('scale',StandardScaler())]),Index(['AGE', 'INCOME', 'NUMKIDS', 'NUMCARDS', 'STORECAR', 'LOANS'], dtype='object'))])),('classifier', LogisticRegression())])
ColumnTransformer(remainder='passthrough',transformers=[('cat',Pipeline(steps=[('onehot',OneHotEncoder(handle_unknown='ignore'))]),['GENDER', 'MARITAL', 'HOWPAID', 'MORTGAGE']),('num',Pipeline(steps=[('scale', StandardScaler())]),Index(['AGE', 'INCOME', 'NUMKIDS', 'NUMCARDS', 'STORECAR', 'LOANS'], dtype='object'))])
['GENDER', 'MARITAL', 'HOWPAID', 'MORTGAGE']
OneHotEncoder(handle_unknown='ignore')
Index(['AGE', 'INCOME', 'NUMKIDS', 'NUMCARDS', 'STORECAR', 'LOANS'], dtype='object')
StandardScaler()
[]
passthrough
LogisticRegression()
Evaluation Results
- The target variable, RISK is multiclass. In sklearn, precision and recall functions have a parameter called, average. This parameter is required for a multiclass/multilabel target. average = 'micro' was used to calculate the precision and recall metrics globally by counting the total true positives, false negatives and false positives
Metric | Value |
---|---|
accuracy | 0.699187 |
precision | 0.699187 |
recall | 0.699187 |
Model Explainability
SHAP was used to determine the important features that helps the model make decisions
Confusion Matrix
Model Card Authors
This model card is written by following authors: Seifullah Bello
Model Card Contact
You can contact the model card authors through following channels: [email protected]
- Downloads last month
- 0
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.