Underwater sea turtle behavior recognition: a lightweight pose-to-action pipeline

Our take

Automated recognition of underwater sea turtle behaviors is crucial for ecological monitoring and reducing bycatch. This study presents a lightweight pose-to-action pipeline that addresses the challenges of low computational power and variable underwater conditions. By detecting key morphological points and classifying behaviors such as U-turns and reversals, we employ a two-stage training strategy that integrates real and simulated data. Our approach achieves an impressive 93.2% recognition accuracy at a minimum frame rate of 2.86 fps, significantly enhancing our understanding of turtle interactions with fishing gear.

Automated recognition of animal behaviors is an important computer vision task that improves ecological monitoring and behavioral analysis. Compared to generic human action recognition, these applications often suffer from severe constraints such as low onsite computational power, limited data availability for training learning-based models, and suboptimal image quality due to environmental conditions. For sea turtles, behavior in relation to fishing gear is particularly important for understanding and reducing the bycatch, or incidental take, and associated mortality. Monitoring such behavior underwater is challenging because viewing angles vary over time, and pose and motion trajectories are highly dependent on the camera angle. In this study, we address this problem with a compact pose-to-action pipeline that detects a small set of turtle morphological keypoints in each frame, and then classifies short sequences of keypoints into U-turn, reversal, or other routine behaviors. While we employ the YoloV8n pose model for keypoint detection, we use a shallow fully connected network for classifying the behavior types. Our two-stage training strategy allows us to train our pose estimation network with real data while optimizing the behavior recognition network with both real annotated clips and a large set of simulated trajectories including various camera geometries and motion parameters. We further reduced the computational requirements by finding a balance between the input frame rate and recognition accuracy. Our experimental results show that we can achieve 93.2% recognition accuracy with a minimum frame rate requirement of 2.86fps.

Tagged with

#ocean data#climate monitoring#in-situ monitoring#data visualization#autonomous underwater vehicles#environmental DNA#sea turtle#underwater#recognition accuracy#behavior recognition#pose-to-action#bycatch#automated recognition#morphological keypoints#ecological monitoring#behavioral analysis#incidental take#YoloV8n#motion trajectories#frame rate