Aiming at the problem that the efficiency of scene segmentation in content-based video retrieval needs to be improved,this paper proposed a multi-modal video scene segmentation optimization algorithm based on feature extraction of convolutional neural network.Firstly,the algorithm applied the improved VGG19 network to extract underlying features and semantic features from each video shots.Secondly,this paper combined these features into vectors and applied the method of triplet loss learning and shot similarity calculation,so that converted the scene segmentation task to a binary classificatio...