As one of the key application scenarios of wireless sensor networks, the coverage optimization of underwater wireless sensor networks (UWSNs) requires special consideration of three-dimensional spatial characteristics, which distinctly differs from traditional terrestrial environment coverage issues. To address the problems of low coverage and uneven distribution in UWSNs within a three-dimensional space, we propose a Reinforcement Learning-driven Hunter-Prey Optimization (RL-HPO) algorithm. Firstly, a nonlinear convergence factor is designed to regulate the exploration and exploitation phases...