Model metrics

Model testing was performed in the held-out test set of the dataset. The Dice similarity index (Dice) and the normalized surface distance (NSD) were calculated for each label individually, and 95% confidence were computed using bootstrap resampling with 1000 iterations.

Class ID	Class Description	Dice	NSD
0	background	1.0 [1.0 - 1.0]	0.999 [0.999 - 1.0]
1	T1	0.936 [0.915 - 0.951]	0.976 [0.955 - 0.989]
2	T2	0.947 [0.925 - 0.962]	0.986 [0.966 - 0.998]
3	T3	0.954 [0.94 - 0.965]	0.993 [0.982 - 0.999]
4	T4	0.944 [0.92 - 0.963]	0.981 [0.961 - 0.996]
5	T5	0.946 [0.925 - 0.964]	0.984 [0.965 - 0.998]
6	T6	0.936 [0.906 - 0.96]	0.972 [0.946 - 0.991]
7	T7	0.924 [0.886 - 0.952]	0.962 [0.933 - 0.985]
8	T8	0.916 [0.876 - 0.949]	0.951 [0.919 - 0.978]
9	T9	0.921 [0.886 - 0.95]	0.953 [0.924 - 0.976]
10	T10	0.924 [0.89 - 0.951]	0.955 [0.926 - 0.977]
11	T11	0.919 [0.884 - 0.95]	0.947 [0.914 - 0.976]
12	T12	0.927 [0.892 - 0.955]	0.952 [0.918 - 0.98]
13	L1	0.926 [0.893 - 0.954]	0.95 [0.919 - 0.977]
14	L2	0.948 [0.921 - 0.968]	0.969 [0.943 - 0.988]
15	L3	0.939 [0.908 - 0.963]	0.958 [0.927 - 0.981]
16	L4	0.92 [0.884 - 0.947]	0.942 [0.908 - 0.966]
17	L5	0.911 [0.876 - 0.941]	0.937 [0.906 - 0.963]
18	L6	0.0 [0.0 - 0.0]	0.0 [0.0 - 0.0]
19	Sacrum	0.955 [0.946 - 0.962]	0.981 [0.973 - 0.988]
20	Os coccygis	NA	NA
21	T13	0.0 [0.0 - 0.0]	0.0 [0.0 - 0.0]