How inter-rater variability pertains to aleatoric and epistemic uncertainty: a case examine with deep learning-based paraspinal muscle segmentation
Authors: Parinaz Roshanzamir, Hassan Rivaz, Joshua Ahn, Hamza Mirza, Neda Naghdi, Meagan Anstruther, Michele C. Battié, Maryse Fortin, Yiming Xiao
Summary: Current developments in deep studying (DL) methods have led to nice efficiency enchancment in medical picture segmentation duties, particularly with the newest Transformer mannequin and its variants. Whereas labels from fusing multi-rater guide segmentations are sometimes employed as very best floor truths in DL mannequin coaching, inter-rater variability because of components resembling coaching bias, picture noise, and excessive anatomical variability can nonetheless have an effect on the efficiency and uncertainty of the ensuing algorithms. Information concerning how inter-rater variability impacts the reliability of the ensuing DL algorithms, a key factor in medical deployment, may help inform higher coaching knowledge building and DL fashions, however has not been explored extensively. On this paper, we measure aleatoric and epistemic uncertainties utilizing test-time augmentation (TTA), test-time dropout (TTD), and deep ensemble to discover their relationship with inter-rater variability. Moreover, we examine UNet and TransUNet to check the impacts of Transformers on mannequin uncertainty with two label fusion methods. We conduct a case examine utilizing multi-class paraspinal muscle segmentation from T2w MRIs. Our examine reveals the interaction between inter-rater variability and uncertainties, affected by selections of label fusion methods and DL fashions