File size: 2,109 Bytes
43ca29c 0eab3b8 2cf93bf 0eab3b8 79c2fb3 0eab3b8 0148c4a 0eab3b8 79c2fb3 0eab3b8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
---
license: afl-3.0
---
## Suri: Making Large Model Training Efficient for Single-cell RNA-seq data
## Data
You can try pretraining on the PanglaoDB dataset in the 'data' folder. This dataset is the same as that provided in the scBERT paper.
The datasets available from the CELLxGENE website almost are organized as '.h5ad' file formats. It is also easy to use those single-cell RNA-seq dataset for pretraining.
## Usage
- Pretrain on single-cell RNA-seq data
```
python --data_path "data_path" pretrain.py
```
## Hardware
- GPU: Nvidia A10
- GPU Memory: 24 GB
- CPU: 30 vCPU
- Memory: 200 GB
## Time cost
The estimated pretraining time of one epoch using about 1,000,000 cells on an NVIDIA A10 is about 4 hours.
## Disclaimer
This project is used for academic research purposes.
## Citations
You can find more information in these citations if you are interested in the technical details.
```bibtex
@article{yang2022scbert,
title={scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data},
author={Yang, Fan and Wang, Wenchuan and Wang, Fang and Fang, Yuan and Tang, Duyu and Huang, Junzhou and Lu, Hui and Yao, Jianhua},
journal={Nature Machine Intelligence},
volume={4},
number={10},
pages={852--866},
year={2022},
publisher={Nature Publishing Group UK London}
}
```
```bibtex
@inproceedings{choromanski2020rethinking,
title = {Rethinking Attention with Performers},
author = {Krzysztof Choromanski and Valerii Likhosherstov and David Dohan and Xingyou Song and Andreea Gane and Tamas Sarlos and Peter Hawkins and Jared Davis and Afroz Mohiuddin and Lukasz Kaiser and David Belanger and Lucy Colwell and Adrian Weller},
booktitle = {International Conference on Learning Representations},
year = {2021},
}
```
```bibtex
@article{liu2023sophia,
title={Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training},
author={Liu, Hong and Li, Zhiyuan and Hall, David and Liang, Percy and Ma, Tengyu},
journal={arXiv preprint arXiv:2305.14342},
year={2023}
}
```
|