Current Issue Cover
海事监控视频舰船目标检测研究现状与展望

叶晨1,2, 逯天洋1, 肖潏灏1, 陆海3, 杨群慧3,4(1.同济大学电子与信息工程学院, 上海 201804;2.同济大学嵌入式系统与服务计算教育部重点实验室, 上海 201804;3.同济大学国家海底科学观测系统项目办公室, 上海 200092;4.同济大学海洋地质国家重点实验室, 上海 200092)

摘 要
舰船目标检测是海域监控、港口流量统计、舰船身份识别以及行为分析与取证等智能海事应用的基石。随着我国海洋强国建设的推进,智慧航运和智慧海洋工程迅速发展,对通过海事监控视频开展有效的舰船目标检测识别以确保航运和海洋工程安全的需求日益紧迫。本文针对基于海事监控视频的舰船目标检测任务,回顾了舰船目标检测数据集及性能评价指标、基于传统机器学习和基于卷积神经网络的深度学习的目标检测方法等方面的国内外研究现状,分析了海洋环境中舰船目标检测任务面临的舰船目标尺度的多样性、舰船类别的多样性、海洋气象的复杂性、水面的动态性、相机的运动性和图像的低质量等技术难点,并通过实验验证,在多尺度特征融合、数据增广和能耗降低等方面提出了舰船目标检测的优化方法;同时,结合前人研究指出舰船目标检测数据集的发展应关注分类粒度的适宜性、标注的一致性和数据集的易扩充性,应加强对多尺度目标(尤其是小型目标)检测的模型结构的研究,为进一步提升舰船目标检测任务的综合性能,促进舰船目标检测技术的应用提供了新的思路。
关键词
Maritime surveillance videos based ships detection algorithms: a survey

Ye Chen1,2, Lu Tianyang1, Xiao Yuhao1, Lu Hai3, Yang Qunhui3,4(1.College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China;2.The Key Laboratory of Embedded System and Service Computing (Ministry of Education), Tongji University, Shanghai 201804, China;3.China National Scientific Seafloor Observatory, Tongji University, Shanghai 200092, China;4.State Key Laboratory of Marine Geology, Tongji University, Shanghai 200092, China)

Abstract
Maritime surveillance videos based object detection methods aims to meet the quick response requirements through an effective ship detection and recognition system against the backdrop of smart ocean technology. Our research has focused on this aspects as mentioned below:1) to summarize current approaches and datasets and discuss the challenging issues of them; 2) to analyze the features and the challenges of maritime surveillance ship detection; 3) to clarify the credibility of their accuracy and efficiency and demonstrate our research potential further. In the first phase, we summarize existing ship detection algorithms based on maritime surveillance videos, introduce the common ship detection datasets and ship detection methods available, and some evaluation metrics followed for ship detection tasks. Customized attention is yielded to the interconnected results between traditional computer vision algorithms for ship identification, which mainly consist of modules such as horizon detection, background subtraction and foreground extraction, and some deep learning methods based on fast region convolutional neural network (Fast R-CNN), single shot multibox detector (SSD) and you only look once (YOLO). It can be sorted out that although mean average precision (mAP) metric remains recognized index to measure the performance of models, its effectiveness issue is still discussed in terms of ship detection tasks and present novel metrics, including bottom edge proximity (BEP), n-multiple object detection precision (N-MODP) and n-multiple object detection accuracy (N-MODA). Current datasets are capable to detect vessels motion via deep learning models. But, the accuracy and robustness of training are required to be improved greatly due to extreme weather condition and light variation or inconsistent labels. In the second phase, we evaluate the features and challenges for ship detection. The difference lies between ship detection and regular object detection. For example, a coastline platform or a ship sensor have very large visible ranges and leading to a big scale variability. In addition, it is challenged to design a set of models adapt to various image domain scenarios derived from extreme marine weather conditions. Photographic system has to withstand exposure to extremes of temperature, high vibration levels, humidity and chemicals as well. The harsh environment combined with noise pollution and limited network bandwidth can cause the loss of image quality and make uncertainty with information loss for the models. In the third phase, we improve the accuracy and efficiency of ship detection algorithms and evaluate some common methods for ship detection technology on the three aspects as following:1) multi-scale feature fusion:we carry out convolutional neural network (CNN) models manipulation based on different input scales and backbones. Some of the object detection models are degraded when facing large variations result from large field of view among ship objects during voyage. It is suggested that input scale determines the upper bound of accuracy of CNN models, and CNN models or backbones which are specially designed for multi-scale detection tasks narrow the gap of accuracy between different input scales. 2) Data augmentation:Waves and wind induce pitch and roll rotations on the sea, which is demanding for ship detection. Moreover, weather change and day-night brightness variation mean the image data shall be in multiple domains, which requires the detection models to be robust to images from different domains, or even from domains that have not been included in training samples. In light of the above-mentioned variables and marine based camera motion results, we evaluate the performance improvement when data augmentation is applied to image translation and rotation. We also adjust photo brightness and use Gaussian blur to simulate the blur caused by water condensed on cameras. Almost 5% increase is observed in mAP, verifying the robustness of data augmentation in ship detection. Other effective approaches include domain transfer based on generative adversarial network (GAN) or domain-independent models derived of multi-domain object detection tasks. 3) Light-weighted models and energy optimization:Computing complexity is constrained of semantic constraints like the horizon; Common object detection optimization is used to lower computation load, including light-weighted backbones like MobileNet and ShuffleNet. We calculate the parameter quantity and computing operation quantity of object detection models as well as each accuracy of them. We recommended that further studies should be considered on the following aspects:1) to develop new datasets or existing datasets improvement based on sufficient coverage of possible conditions, high-quality annotations, precise classification and easier extension, respectively; 2) to decrease arithmetic operations and energy consumption in object detection models; 3) to strengthen multi-scale target detection modeling; 4) to enhance data fusion between object detection in multi-sensors images and the semantic ability of single image or multiple images interpretation.
Keywords

订阅号|日报