GZMH:用于有丝分裂细胞核检测和分割的乳腺癌病理图像数据集
汪华登1,2, 王雪馨2, 黎兵兵3, 刘志鹏2, 许浩2, 潘细朋1,2, 蓝如师1,2, 罗笑南1,2(1.广西图像图形与智能处理重点实验室, 桂林 541004;2.桂林电子科技大学计算机与信息安全学院, 桂林 541004;3.广东省人民医院赣州医院, 赣州市立医院病理科, 赣州 341000) 摘 要
目的 有丝分裂细胞核计数是乳腺癌诊断和组织学分级的3个重要评分指标之一,基于深度学习的自动检测方法,可以有效辅助医生进行乳腺病理图像有丝分裂细胞核识别和计数。而当前研究中的公开数据集多为竞赛所用,由举办方联合数据提供者挑选而来,与医院临床应用中所使用的数据存在较大的差异,不利于模型性能及泛化能力的测试验证。针对以上问题,本文发布了来自中国赣州市立医院临床环境的数据集GZMH (Ganzhou municipal hospital)。方法 整理并公开发布的数据集GZMH包含55幅全视野数字切片(whole slide images,WSIs)临床乳腺癌病理图像,提供了用于有丝分裂细胞核目标检测和语义分割研究的两种标注,并由2名高年资医师对3名初级病理医师的标注进行了复核。5种主流目标检测方法和5种经典分割方法在GZMH数据集上进行了训练和测试,检验它们在临床数据集GZMH上的性能。结果 目标检测方法实验结果比较中,SSD (single shot multibox detector)模型取得了最佳的效果,F1分数为0.511;分割方法实验结果比较中,R2U-Net (recurrent rsidual convolutional neural network based on U-Net)性能最佳,F1分数为0.430。所有方法在面对较大规模的临床数据集GZMH时体现的性能都明显低于它们在一些公开数据集上的性能。结论 本文所提出的GZMH数据集能够用于有丝分裂目标检测与语义分割研究任务,且此数据集中的图像更加接近实际的应用场景,在推动乳腺病理图像有丝分裂细胞核分割的研究和临床应用方面具有较大的价值。数据集的在线发布地址为:https://doi.org/10.57760/sciencedb.08547。
关键词
GZMH:a dataset of breast cancer pathological images for mitosis nuclei detection and segmentation
Wang Huadeng1,2, Wang Xuexin2, Li Bingbing3, Liu Zhipeng2, Xu Hao2, Pan Xipeng1,2, Lan Rushi1,2, Luo Xiaonan1,2(1.Guangxi Key Laboratory of Image and Graphics Intelligent Processing, Guilin University of Electronic Technology, Guilin 541004, China;2.School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China;3.Department of Pathology, Ganzhou Municipal Hospital, Guangdong Provincial People's Hospital Ganzhou Hospital, Ganzhou 341000, China) Abstract
Objective Mitosis nuclei count is one of the three important scoring indexes in the diagnosis and histological grading of breast cancer because this index is used to evaluate the aggressiveness of tumors and to provide markedly comprehensive and reliable information for accurate diagnosis and treatment. In current clinical practice, hematoxylin and eosin staining(H&E) staining is mostly used for pathological sections. Histopathological images stained with H&E can intuitively display cell components and tissue structures. For deep learning-based automated mitosis detection studies, pathologists need to manually label observed mitotic cells at high power field(HPF), which is an extremely tedious and timeconsuming task requiring extensive experience and professional equipment. However, computer-assisted automatic detection, especially the introduction of deep learning methods, has attracted increasing attention from researchers in recent years because it helps reduce doctors'workload and improve diagnostic efficiency. Multiple competitions(e. g., ICPR Mitosis Detection Challenge in 2012, AMIDA13 competition at MICCAI 2013) have been held internationally to study the specific application of deep learning methods in the mitosis detection of breast cancer. These competitions have attracted many researchers to participate, and numerous excellent methods based on these datasets have emerged. However, most public datasets in the current research are selected by organizers and data providers, which are relatively different from data used in a clinical environment and not conducive to the test and verification of model performance and generalization ability. Given the preceding problems, this research published a GZMH dataset from the clinical environment of Ganzhou Municipal Hospital in China. Method The published GZMH dataset contains 55 clinical breast cancer pathological images of whole slide images(WSIs), which provides two types of annotations for mitosis nuclei target detection and semantic segmentation research. Moreover, annotations from three primary pathologists are checked by two senior doctors. The GZMH dataset contains 1 534 RGB channel electronic images with a resolution of 2 084 × 2 084 pixels and 2 355 mitotic regions. First, the dataset selects 55 WSIs from 109 finely labeled WSIs as the original data of GZMH. Second, the dataset uses sliding window to cut the corresponding area's HPF in the XML file on WSI;HPF is cut only once when the center of the circumscribed rectangle of the nucleus is within the current HPF range. To avoid numerous nuclear fragments, we only keep the grid where the center of the circumscribed rectangle of the nucleus is located. After the preceding data processing, mitosis nuclei is labeled, in which the pixel level is labeled as a black-and-white binary label, and the target detection label is the minimum circumscribed rectangular coordinates and centroid coordinates of the nuclear fission image area. Eventually, a large-scale dataset is formed. Five mainstream object detection methods and five classical segmentation methods are trained and tested on the GZMH dataset to assess their performance on the GZMH dataset. Result This study uses five mainstream object detection models(i. e., Faster RCNN, FSAF, RetinaNet, YOLOv3, and SSD) and five classical segmentation models(i. e., U-Net, SegNet, R2U-Net, LinkNet34, and DeepLabV3+) to organize the experiments. In the comparison of experimental results of the object detection methods, the SSD model achieved the best performance, and the F1-score achieved 0. 511. In the comparison of experimental results of the segmentation methods, R2U-Net achieved the best performance, and the F1-score is 0. 430. The performance of all methods in terms of the large-scale GZMH clinical dataset is evidently lower than their performance results on some public datasets. Conclusion We published a dataset for mitotic nuclei detection, which is characterized by numerous case data and rich types, and the data characteristics approximate the actual application scenarios. In addition, the problems of memory bottleneck and nuclear fragmentation are solved through data processing. We evaluate 10 representative methods of target detection and semantic segmentation on this new dataset and review the challenging problems of various algorithms. The proposed GZMH dataset can meet the research tasks of mitosis nuclei detection and semantic segmentation. Moreover, images in this dataset approximate the actual application scenarios, which invaluable in promoting the research progress and clinical application of mitosis nuclei segmentation in breast pathological images. The proposed dataset is available at:https://doi.org/10.57760/sciencedb.08547.
Keywords
|