upload model weights, readme, and config
Browse files- README.md +76 -3
- config.json +41 -0
- model-with-instance-classifiers.safetensors +3 -0
- model.safetensors +3 -0
- torchscript_model.pt +3 -0
README.md
CHANGED
@@ -1,3 +1,76 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Pancancer tissue classifier
|
2 |
+
|
3 |
+
This model classifies among 32 cancers from TCGA. It was trained by Jakub Kaczmarzyk using CLAM.
|
4 |
+
|
5 |
+
Output classes: ACC, BLCA, BRCA, CESC, CHOL, COAD, DLBC, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LGG, LIHC, LUAD, LUSC, MESO, OV, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, TGCT, THCA, THYM, UCEC, UCS, UVM.
|
6 |
+
|
7 |
+
Please see the [TCGA study abbreviations](https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/tcga-study-abbreviations) to map these class names to the TCGA study names.
|
8 |
+
|
9 |
+
## Data
|
10 |
+
|
11 |
+
Diagnostic slides in TCGA (e.g., `DX`) were used to train the model. The whole slide images were tiles into 128x128um patches, and each patch was encoded using CTransPath (this produces 768-dimensional embeddings).
|
12 |
+
|
13 |
+
Train, validation, and test splits were stratified by TCGA study, and patients did not cross split boundaries.
|
14 |
+
|
15 |
+
Samples sizes:
|
16 |
+
- Train: 9,257 slides (7,633 patients)
|
17 |
+
- Validation: 1,186 slides (955 patients)
|
18 |
+
- Test: 1,163 slides (955 patients)
|
19 |
+
|
20 |
+
## Model performance
|
21 |
+
|
22 |
+
The model achieved a weighted average AUROC of 0.99 (one-vs-rest).
|
23 |
+
|
24 |
+
Here are the one-vs-rest AUROC values for each TCGA study.
|
25 |
+
|
26 |
+
- ACC: 0.9993
|
27 |
+
- BLCA: 0.9814
|
28 |
+
- BRCA: 0.9908
|
29 |
+
- CESC: 0.9868
|
30 |
+
- CHOL: 0.9972
|
31 |
+
- COAD: 0.9927
|
32 |
+
- DLBC: 0.9996
|
33 |
+
- ESCA: 0.9571
|
34 |
+
- GBM: 0.9984
|
35 |
+
- HNSC: 0.9974
|
36 |
+
- KICH: 0.9998
|
37 |
+
- KIRC: 0.9993
|
38 |
+
- KIRP: 0.9952
|
39 |
+
- LGG: 0.9984
|
40 |
+
- LIHC: 0.9988
|
41 |
+
- LUAD: 0.9879
|
42 |
+
- LUSC: 0.9868
|
43 |
+
- MESO: 0.9961
|
44 |
+
- OV: 0.9900
|
45 |
+
- PAAD: 0.9897
|
46 |
+
- PCPG: 0.9944
|
47 |
+
- PRAD: 1.0000
|
48 |
+
- READ: 0.9752
|
49 |
+
- SARC: 0.9946
|
50 |
+
- SKCM: 0.9957
|
51 |
+
- STAD: 0.9932
|
52 |
+
- TGCT: 0.9957
|
53 |
+
- THCA: 1.0000
|
54 |
+
- THYM: 0.9991
|
55 |
+
- UCEC: 0.9971
|
56 |
+
- UCS: 0.9863
|
57 |
+
- UVM: 0.9997
|
58 |
+
|
59 |
+
### Renal cell carcinoma (RCC) subtyping
|
60 |
+
|
61 |
+
RCC subtyping is a relatively common benchmark task for slide-level classification. We evaluate this model on RCC subtyping.
|
62 |
+
|
63 |
+
When tested on a set of 52 KIRC slides and 28 KIRP slides (from the overall test set), the model achieved a balanced accuracy of 0.88.
|
64 |
+
|
65 |
+
### Non-small cell lung cancer (NSCLC) subtyping
|
66 |
+
|
67 |
+
NSCLC subtyping is a relatively common benchmark task for slide-level classification. We evaluate this model on NSCLC subtyping.
|
68 |
+
|
69 |
+
When tested on a set of 55 LUAD slides and 58 LUSC slides (from the overall test set), the model achieved a balanced accuracy of 0.76.
|
70 |
+
|
71 |
+
|
72 |
+
# Intended uses
|
73 |
+
|
74 |
+
This model is ONLY intended for research purposes.
|
75 |
+
|
76 |
+
**This model may not be used for clinical purposes.** This model is distributed without warranties, either express or implied.
|
config.json
ADDED
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"spec_version": "1.0",
|
3 |
+
"type": "clam",
|
4 |
+
"patch_size_um": 128,
|
5 |
+
"feature_extractor": "ctranspath",
|
6 |
+
"num_classes": 32,
|
7 |
+
"class_names": [
|
8 |
+
"ACC",
|
9 |
+
"BLCA",
|
10 |
+
"BRCA",
|
11 |
+
"CESC",
|
12 |
+
"CHOL",
|
13 |
+
"COAD",
|
14 |
+
"DLBC",
|
15 |
+
"ESCA",
|
16 |
+
"GBM",
|
17 |
+
"HNSC",
|
18 |
+
"KICH",
|
19 |
+
"KIRC",
|
20 |
+
"KIRP",
|
21 |
+
"LGG",
|
22 |
+
"LIHC",
|
23 |
+
"LUAD",
|
24 |
+
"LUSC",
|
25 |
+
"MESO",
|
26 |
+
"OV",
|
27 |
+
"PAAD",
|
28 |
+
"PCPG",
|
29 |
+
"PRAD",
|
30 |
+
"READ",
|
31 |
+
"SARC",
|
32 |
+
"SKCM",
|
33 |
+
"STAD",
|
34 |
+
"TGCT",
|
35 |
+
"THCA",
|
36 |
+
"THYM",
|
37 |
+
"UCEC",
|
38 |
+
"UCS",
|
39 |
+
"UVM"
|
40 |
+
]
|
41 |
+
}
|
model-with-instance-classifiers.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8c15dcf4dd1d901acd0581850edae97d837fbc920c6f0592baebcb7c7aa2542e
|
3 |
+
size 2830572
|
model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8e0539cb88046f2b8515c149b525af8f05e505e5e81e59b5405ebdf07b64e4f4
|
3 |
+
size 2693188
|
torchscript_model.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1c9737b34ba3c1041de80e9b1f42096ebf11438acf0a80d542fdf8d9aed7ed98
|
3 |
+
size 2711792
|