Article summary:

1. UT-Net is a novel segmentation pipeline that combines U-Net and transformer models for accurate and automatic segmentation of optic disc (OD) and optic cup (OC) from retinal fundus images.

2. Multi-Head Contextual attention is incorporated to enhance the regular self-attention used in traditional vision transformers, allowing for better exploration of receptive fields and deep hierarchical representations.

3. The proposed model has been implemented and tested on three publicly available datasets, demonstrating superior performance compared to state-of-the-art methods in both joint OD and OC segmentation as well as glaucoma detection through measurement of the Cup to Disc Ratio (CDR) value.

