monai
medical
katielink commited on
Commit
d0252ec
·
1 Parent(s): 2349c14

enable deterministic training

Browse files
README.md CHANGED
@@ -105,7 +105,7 @@ Example `dataset.json` in output folder:
105
 
106
  ![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_classification_val_in_out.jpeg)
107
 
108
- ## Scores
109
  This model achieves the following F1 score on the validation data provided as part of the dataset:
110
 
111
  - Train F1 score = 0.941
@@ -132,26 +132,29 @@ Confusion Metrics for <b>Training</b> for individual classes are (at epoch 50):
132
 
133
 
134
 
135
- ## Training Performance
136
  A graph showing the training Loss and F1-score over 50 epochs.
137
 
138
  ![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_classification_train_loss_v2.png) <br>
139
  ![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_classification_train_f1_v2.png) <br>
140
 
141
- ## Validation Performance
142
  A graph showing the validation F1-score over 50 epochs.
143
 
144
  ![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_classification_val_f1_v2.png) <br>
145
 
 
 
146
 
147
- ## commands example
148
- Execute training:
 
149
 
150
  ```
151
  python -m monai.bundle run --config_file configs/train.json
152
  ```
153
 
154
- Override the `train` config to execute multi-GPU training:
155
 
156
  ```
157
  torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config_file "['configs/train.json','configs/multi_gpu_train.json']"
@@ -160,19 +163,19 @@ torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config
160
  Please note that the distributed training related options depend on the actual running environment, thus you may need to remove `--standalone`, modify `--nnodes` or do some other necessary changes according to the machine you used.
161
  Please refer to [pytorch's official tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) for more details.
162
 
163
- Override the `train` config to execute evaluation with the trained model:
164
 
165
  ```
166
  python -m monai.bundle run --config_file "['configs/train.json','configs/evaluate.json']"
167
  ```
168
 
169
- Override the `train` config and `evaluate` config to execute multi-GPU evaluation:
170
 
171
  ```
172
  torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config_file "['configs/train.json','configs/evaluate.json','configs/multi_gpu_evaluate.json']"
173
  ```
174
 
175
- Execute inference:
176
 
177
  ```
178
  python -m monai.bundle run --config_file configs/inference.json
 
105
 
106
  ![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_classification_val_in_out.jpeg)
107
 
108
+ ## Performance
109
  This model achieves the following F1 score on the validation data provided as part of the dataset:
110
 
111
  - Train F1 score = 0.941
 
132
 
133
 
134
 
135
+ #### Training Performance
136
  A graph showing the training Loss and F1-score over 50 epochs.
137
 
138
  ![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_classification_train_loss_v2.png) <br>
139
  ![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_classification_train_f1_v2.png) <br>
140
 
141
+ #### Validation Performance
142
  A graph showing the validation F1-score over 50 epochs.
143
 
144
  ![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_classification_val_f1_v2.png) <br>
145
 
146
+ ## MONAI Bundle Commands
147
+ In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
148
 
149
+ For more details usage instructions, visit the [MONAI Bundle Configuration Page](https://docs.monai.io/en/latest/config_syntax.html).
150
+
151
+ #### Execute training:
152
 
153
  ```
154
  python -m monai.bundle run --config_file configs/train.json
155
  ```
156
 
157
+ #### Override the `train` config to execute multi-GPU training:
158
 
159
  ```
160
  torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config_file "['configs/train.json','configs/multi_gpu_train.json']"
 
163
  Please note that the distributed training related options depend on the actual running environment, thus you may need to remove `--standalone`, modify `--nnodes` or do some other necessary changes according to the machine you used.
164
  Please refer to [pytorch's official tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) for more details.
165
 
166
+ #### Override the `train` config to execute evaluation with the trained model:
167
 
168
  ```
169
  python -m monai.bundle run --config_file "['configs/train.json','configs/evaluate.json']"
170
  ```
171
 
172
+ #### Override the `train` config and `evaluate` config to execute multi-GPU evaluation:
173
 
174
  ```
175
  torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config_file "['configs/train.json','configs/evaluate.json','configs/multi_gpu_evaluate.json']"
176
  ```
177
 
178
+ #### Execute inference:
179
 
180
  ```
181
  python -m monai.bundle run --config_file configs/inference.json
configs/metadata.json CHANGED
@@ -1,7 +1,8 @@
1
  {
2
  "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
3
- "version": "0.0.7",
4
  "changelog": {
 
5
  "0.0.7": "update benchmark on A100",
6
  "0.0.6": "adapt to BundleWorkflow interface",
7
  "0.0.5": "add name tag",
 
1
  {
2
  "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
3
+ "version": "0.0.8",
4
  "changelog": {
5
+ "0.0.8": "enable deterministic training",
6
  "0.0.7": "update benchmark on A100",
7
  "0.0.6": "adapt to BundleWorkflow interface",
8
  "0.0.5": "add name tag",
configs/multi_gpu_evaluate.json CHANGED
@@ -21,7 +21,6 @@
21
  "$import torch.distributed as dist",
22
  "$dist.is_initialized() or dist.init_process_group(backend='nccl')",
23
  "$torch.cuda.set_device(@device)",
24
- "$setattr(torch.backends.cudnn, 'benchmark', True)",
25
  "$import logging",
26
  "$@validate#evaluator.logger.setLevel(logging.WARNING if dist.get_rank() > 0 else logging.INFO)",
27
  "$import scripts",
 
21
  "$import torch.distributed as dist",
22
  "$dist.is_initialized() or dist.init_process_group(backend='nccl')",
23
  "$torch.cuda.set_device(@device)",
 
24
  "$import logging",
25
  "$@validate#evaluator.logger.setLevel(logging.WARNING if dist.get_rank() > 0 else logging.INFO)",
26
  "$import scripts",
configs/multi_gpu_train.json CHANGED
@@ -31,7 +31,6 @@
31
  "$dist.is_initialized() or dist.init_process_group(backend='nccl')",
32
  "$torch.cuda.set_device(@device)",
33
  "$monai.utils.set_determinism(seed=123)",
34
- "$setattr(torch.backends.cudnn, 'benchmark', True)",
35
  "$import logging",
36
  "$@train#trainer.logger.setLevel(logging.WARNING if dist.get_rank() > 0 else logging.INFO)",
37
  "$@validate#evaluator.logger.setLevel(logging.WARNING if dist.get_rank() > 0 else logging.INFO)"
 
31
  "$dist.is_initialized() or dist.init_process_group(backend='nccl')",
32
  "$torch.cuda.set_device(@device)",
33
  "$monai.utils.set_determinism(seed=123)",
 
34
  "$import logging",
35
  "$@train#trainer.logger.setLevel(logging.WARNING if dist.get_rank() > 0 else logging.INFO)",
36
  "$@validate#evaluator.logger.setLevel(logging.WARNING if dist.get_rank() > 0 else logging.INFO)"
configs/train.json CHANGED
@@ -343,8 +343,7 @@
343
  "initialize": [
344
  "$import sys",
345
  "$sys.path.append(@bundle_root)",
346
- "$monai.utils.set_determinism(seed=123)",
347
- "$setattr(torch.backends.cudnn, 'benchmark', True)"
348
  ],
349
  "run": [
350
  "$@train#trainer.run()"
 
343
  "initialize": [
344
  "$import sys",
345
  "$sys.path.append(@bundle_root)",
346
+ "$monai.utils.set_determinism(seed=123)"
 
347
  ],
348
  "run": [
349
  "$@train#trainer.run()"
docs/README.md CHANGED
@@ -98,7 +98,7 @@ Example `dataset.json` in output folder:
98
 
99
  ![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_classification_val_in_out.jpeg)
100
 
101
- ## Scores
102
  This model achieves the following F1 score on the validation data provided as part of the dataset:
103
 
104
  - Train F1 score = 0.941
@@ -125,26 +125,29 @@ Confusion Metrics for <b>Training</b> for individual classes are (at epoch 50):
125
 
126
 
127
 
128
- ## Training Performance
129
  A graph showing the training Loss and F1-score over 50 epochs.
130
 
131
  ![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_classification_train_loss_v2.png) <br>
132
  ![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_classification_train_f1_v2.png) <br>
133
 
134
- ## Validation Performance
135
  A graph showing the validation F1-score over 50 epochs.
136
 
137
  ![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_classification_val_f1_v2.png) <br>
138
 
 
 
139
 
140
- ## commands example
141
- Execute training:
 
142
 
143
  ```
144
  python -m monai.bundle run --config_file configs/train.json
145
  ```
146
 
147
- Override the `train` config to execute multi-GPU training:
148
 
149
  ```
150
  torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config_file "['configs/train.json','configs/multi_gpu_train.json']"
@@ -153,19 +156,19 @@ torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config
153
  Please note that the distributed training related options depend on the actual running environment, thus you may need to remove `--standalone`, modify `--nnodes` or do some other necessary changes according to the machine you used.
154
  Please refer to [pytorch's official tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) for more details.
155
 
156
- Override the `train` config to execute evaluation with the trained model:
157
 
158
  ```
159
  python -m monai.bundle run --config_file "['configs/train.json','configs/evaluate.json']"
160
  ```
161
 
162
- Override the `train` config and `evaluate` config to execute multi-GPU evaluation:
163
 
164
  ```
165
  torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config_file "['configs/train.json','configs/evaluate.json','configs/multi_gpu_evaluate.json']"
166
  ```
167
 
168
- Execute inference:
169
 
170
  ```
171
  python -m monai.bundle run --config_file configs/inference.json
 
98
 
99
  ![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_classification_val_in_out.jpeg)
100
 
101
+ ## Performance
102
  This model achieves the following F1 score on the validation data provided as part of the dataset:
103
 
104
  - Train F1 score = 0.941
 
125
 
126
 
127
 
128
+ #### Training Performance
129
  A graph showing the training Loss and F1-score over 50 epochs.
130
 
131
  ![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_classification_train_loss_v2.png) <br>
132
  ![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_classification_train_f1_v2.png) <br>
133
 
134
+ #### Validation Performance
135
  A graph showing the validation F1-score over 50 epochs.
136
 
137
  ![](https://developer.download.nvidia.com/assets/Clara/Images/monai_pathology_classification_val_f1_v2.png) <br>
138
 
139
+ ## MONAI Bundle Commands
140
+ In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
141
 
142
+ For more details usage instructions, visit the [MONAI Bundle Configuration Page](https://docs.monai.io/en/latest/config_syntax.html).
143
+
144
+ #### Execute training:
145
 
146
  ```
147
  python -m monai.bundle run --config_file configs/train.json
148
  ```
149
 
150
+ #### Override the `train` config to execute multi-GPU training:
151
 
152
  ```
153
  torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config_file "['configs/train.json','configs/multi_gpu_train.json']"
 
156
  Please note that the distributed training related options depend on the actual running environment, thus you may need to remove `--standalone`, modify `--nnodes` or do some other necessary changes according to the machine you used.
157
  Please refer to [pytorch's official tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) for more details.
158
 
159
+ #### Override the `train` config to execute evaluation with the trained model:
160
 
161
  ```
162
  python -m monai.bundle run --config_file "['configs/train.json','configs/evaluate.json']"
163
  ```
164
 
165
+ #### Override the `train` config and `evaluate` config to execute multi-GPU evaluation:
166
 
167
  ```
168
  torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config_file "['configs/train.json','configs/evaluate.json','configs/multi_gpu_evaluate.json']"
169
  ```
170
 
171
+ #### Execute inference:
172
 
173
  ```
174
  python -m monai.bundle run --config_file configs/inference.json