To solve the problem of"semantic gap"between low-level features and high-level semantic in video scene seg-mentation, an algorithm of video scene segmentation was put forward based on multimodal feature fusion and competition.The im-age, text and audio features were abstracted as the low-level features of the video frame.Euclidean distance, cosine similarity distance were used to calculate the similarity of homogeneous data, and the method of canonical correlation analysis was used to calculate the heterogeneous data correlation, re...