colizz commited on
Commit
59dc4b7
·
verified ·
1 Parent(s): 632d7bb

Update README.md upon finalizing the dataset

Browse files
Files changed (1) hide show
  1. README.md +7 -5
README.md CHANGED
@@ -16,7 +16,7 @@ This model represents the first practical implementation under the **Sophon** (S
16
 
17
  For more details, refer to the following links: [[Paper]](https://arxiv.org/abs/2405.12972), [[Github]](https://github.com/jet-universe/sophon).
18
 
19
- Try out this [[demo on Colab]](https://colab.research.google.com/github/jet-universe/sophon/blob/main/notebooks/Interacting_with_JetClassII_and_Sophon.ipynb) to get started with the model.
20
 
21
 
22
  ## Model Details
@@ -34,9 +34,11 @@ Key features of the model include:
34
 
35
  ## Uses and Impact
36
 
 
 
37
  The Sophon model is valuable for future LHC phenomenological research, particularly for estimating physics measurement sensitivity using fast-simulation (Delphes) datasets. For a quick example of using this model in Python, or integrating this model in C++ workflows to process Delphes files, check [[here]](https://github.com/jet-universe/sophon?tab=readme-ov-file#using-sophon-model-pythonc).
38
 
39
- This model also offers insights for the future development of generic and foundational AI models for particle physics experiments.
40
 
41
 
42
  ## Training Details
@@ -62,13 +64,13 @@ cd sophon
62
 
63
  ### Download dataset
64
 
65
- Download the JetClass-II dataset from [[Hugging Face Dataset]]().
66
  The training and validation files are used in this work, while the test files are not used.
67
 
68
  Ensure that all ROOT files are accessible from:
69
 
70
  ```bash
71
- ./datasets/JetClassII/Pythia/{Res2P,Res34P,QCD}_*.root
72
  ```
73
 
74
  ### Training
@@ -87,7 +89,7 @@ Ensure that all ROOT files are accessible from:
87
 
88
  > **Note:** Depending on your machine and GPU configuration, additional settings may be useful. Here are a few examples:
89
  > - Enable PyTorch's DDP for parallel training, e.g., `CUDA_VISIBLE_DEVICES=0,1,2,3 DDP_NGPUS=4 ./train_sophon.sh train --start-lr 2e-3` (the learning rate should be scaled according to `DDP_NGPUS`).
90
- > - Configure the number of data loader workers, the fetch step for loading each ROOT file, and the dataset split number to alleviate memory burden. Example command: `./train_sophon.sh train --num-workers 8 --fetch-step 0.02 --data-split-num 4`.
91
 
92
  **Step 3** (optional): Convert the model to ONNX.
93
 
 
16
 
17
  For more details, refer to the following links: [[Paper]](https://arxiv.org/abs/2405.12972), [[Github]](https://github.com/jet-universe/sophon).
18
 
19
+ Try out this [[Demo on Colab]](https://colab.research.google.com/github/jet-universe/sophon/blob/main/notebooks/Interacting_with_JetClassII_and_Sophon.ipynb) to get started with the model.
20
 
21
 
22
  ## Model Details
 
34
 
35
  ## Uses and Impact
36
 
37
+ ### Inferring Sophon model via ONNX
38
+
39
  The Sophon model is valuable for future LHC phenomenological research, particularly for estimating physics measurement sensitivity using fast-simulation (Delphes) datasets. For a quick example of using this model in Python, or integrating this model in C++ workflows to process Delphes files, check [[here]](https://github.com/jet-universe/sophon?tab=readme-ov-file#using-sophon-model-pythonc).
40
 
41
+ This model also offers insights for the future development of generic and foundation AI models for particle physics experiments.
42
 
43
 
44
  ## Training Details
 
64
 
65
  ### Download dataset
66
 
67
+ Download the JetClass-II dataset from [[HuggingFace Dataset]](https://huggingface.co/datasets/jet-universe/jetclass2).
68
  The training and validation files are used in this work, while the test files are not used.
69
 
70
  Ensure that all ROOT files are accessible from:
71
 
72
  ```bash
73
+ ./datasets/JetClassII/Pythia/{Res2P,Res34P,QCD}_*.parquet
74
  ```
75
 
76
  ### Training
 
89
 
90
  > **Note:** Depending on your machine and GPU configuration, additional settings may be useful. Here are a few examples:
91
  > - Enable PyTorch's DDP for parallel training, e.g., `CUDA_VISIBLE_DEVICES=0,1,2,3 DDP_NGPUS=4 ./train_sophon.sh train --start-lr 2e-3` (the learning rate should be scaled according to `DDP_NGPUS`).
92
+ > - Configure the number of data loader workers and the number of splits for the entire dataset. The script uses the default configuration `--num-workers 5 --data-split-num 200`, which means there are 5 workers, each responsible for processing 1/5 of the data files and reading the data synchronously; the data assigned to each worker is split into 200 parts, with each worker sequentially reading 1/200 of the total data in order.
93
 
94
  **Step 3** (optional): Convert the model to ONNX.
95