• 基于混合双向DB-DWConvLSTM的工业视频摘要方案

    A hybrid bidirectional network based on DB-DWConvLSTM for industrial video summarization

    • 针对现有视频摘要算法以及摘要评价方法未能充分考虑工业智能终端所感知的视频数据特点以及工业智能感知相关应用需求,改写了代表性与多样性两种评价约束,基于此,结合DWConv(Depthwise Convolution)与ConvLSTM(Convolutional Long Short-Term Memory)设计了一种混合双向多层的工业视频摘要方案。该方案由全局粗粒度特征提取、局部细粒度特征提取、反馈更新以及以查询为驱动的特征融合这4部分组成。为应对工业数据高冗余性、感知的视频噪声大等特点,围绕着ConvLSTM与注意力机制搭建全局特征提取模块;为充分提取视频数据的时空特性,结合注意力机制与DB-DWConvLSTM构建局部特征提取模块;针对工业数据具有的周期性与局部稳定性,借助残差网络思想,设计了融合DWConv反馈模块;为了更加凸显关键帧特征,便于更好的筛选关键帧,研究以查询驱动的特征融合模块。为验证方案的有效性与可行性,将该方案在TVSum与SumMe两个数据集上进行分析验证。实验结果表明:该方案在交叉验证、消融实验以及对比分析中都有着较好的性能。

       

      Abstract: Aiming at the limitations of existing video summarization algorithms and evaluation methods in fully considering the characteristics of video data perceived by industrial intelligent terminals and the application requirements of industrial intelligent perception, this paper revises representativeness and diversity evaluation constraint. Building upon these improvements, this paper propose a hybrid bidirectional multi-layer industrial video summarization framework by integrating Depthwise Convolution (DWConv) and Convolutional Long Short-Term Memory (ConvLSTM). This framework comprises three primary components: global coarse-grained feature extraction, local fine-grained feature extraction, and query-driven feedback-based feature fusion. To address the significant redundancy inherent in industrial data, a global feature extraction module has been developed utilizing ConvLSTM in conjunction with the attention mechanism. To comprehensively capture the spatiotemporal characteristics of video data, a local feature extraction module has been established by integrating the attention mechanism with DB-DWConvLSTM. Considering the periodicity and local stability of industrial data, a fusion DWConv feedback module has been designed, inspired by the principles of residual networks. Furthermore, to emphasize the salient features of key frames and enhance the selection process for these frames, a feature fusion module centered on a query-driven approach and a secondary screening method for summary evaluation has been investigated. To assess the efficacy and practicality of the proposed scheme, an analysis and verification were conducted utilizing the TVSum and SumMe datasets. The experimental findings indicate that the methodology presented in this paper demonstrates commendable performance in cross-validation, ablation studies, and comparative analyses.

       

    /

    返回文章
    返回