gabrieltxs commited on
Commit
fb6cf6d
·
1 Parent(s): da28fc4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -3
README.md CHANGED
@@ -1,3 +1,71 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AMR prediction with LGBMClassifier models
2
+ This repository contains a Python script for predicting antimicrobial resistance (AMR) using the LGBMClassifier model. The script reads input datasets from a directory, applies feature extraction techniques to obtain k-mer features, trains and tests the models using cross-validation, and outputs the results in text files.
3
+
4
+ ![Retrospectives](https://user-images.githubusercontent.com/43249674/224884310-71214a69-3f27-4628-ad21-bb34c6daac45.jpg)
5
+
6
+
7
+ ## Getting Started
8
+ These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
9
+
10
+ ### Prerequisites
11
+ This script requires the following Python libraries:
12
+
13
+ pandas
14
+ scikit-learn
15
+ numpy
16
+ tqdm
17
+ lightgbm
18
+ hyperopt
19
+ joblib
20
+ bayesian-optimization
21
+ skopt
22
+
23
+ ### Installing
24
+ Clone the repository to your local machine and install the required libraries:
25
+
26
+
27
+ ```bash
28
+ $ git clone https://github.com/username/repo.git
29
+ $ cd repo
30
+ $ pip install -r requirements.txt
31
+ ```
32
+
33
+
34
+ ### Usage
35
+ To use the script, execute the following command:
36
+
37
+ css
38
+ Copy code
39
+
40
+ ```bash
41
+ $ python main.py
42
+ ```
43
+
44
+ ## Code Structure
45
+ The main script consists of several sections:
46
+
47
+ 1 Import necessary libraries
48
+ 2 Set seed for reproducibility
49
+ 3 Define function to get list of models to evaluate
50
+ 4 Load list of selected samples
51
+ 5 Call function to get list of models
52
+ 6 Initialize KFold cross-validation
53
+ 7 Iterate over values of k to read the corresponding k-mer feature dataset
54
+ 8 Iterate over the models list
55
+ 9 Write results to text file
56
+
57
+ ## Data Description
58
+ The input datasets are CSV files containing bacterial genomic sequences and their corresponding resistance profiles for selected antibiotics. The script reads these files from a directory and applies k-mer feature extraction techniques to obtain numerical feature vectors.
59
+
60
+ ## Models
61
+ The script uses two models for AMR prediction: Random Forest and LGBMClassifier.
62
+
63
+ ## Output
64
+ The script outputs the results of each model to a text file in the specified output directory. The results include accuracy, precision, recall, F1 score, and area under the ROC curve.
65
+
66
+ ## Authors
67
+ Gabriel Sousa - gabrieltxs
68
+
69
+ ## License
70
+ This project is licensed under the MIT License - see the LICENSE.md file for details.
71
+ [![MIT License](https://img.shields.io/badge/License-MIT-green.svg)](https://choosealicense.com/licenses/mit/)