• 基于时空Transformer的注视目标检测概率模型

    Probabilistic model for gaze target detection based on spatiotemporal transformer

    • 提出一种基于时空可变形 Transformer 的一致性导向的概率模型用于注视目标检测,其主要由帧注视模型、时空注视关系模型、未来语义注视估计模块构成。帧注视模型通过空间编码器、空间解码器和预测模块对帧特征进行处理,得到关于头部标签、热图等信息;时空注视关系模型将过去特征与当前帧特征拼接后,通过时序注视关系特征编码器和时序注视关系查询编码器融合特征,增强当前帧的注视关系查询;未来语义注视估计模块以时空注视关系模型为骨干学习过去-未来语义,并通过具有一致性感知的未来解码器逐帧预测未来特征。多个数据集上的实验显示:此模型在视频注视目标检测和视频共同注视目标检测任务中性能卓越,超越以往所有方法。消融实验证实模型关键模块和不同损失函数对整体性能有积极作用,可视化效果体现了模型在动态注视场景下的有效性。

       

      Abstract: This paper proposes a consistency-guided probabilistic model based on spatiotemporal deformable Transformer for gaze target detection. The model aims to address the shortcomings of existing methods in handling problems such as dynamic gaze in videos and improve performance through innovative structures and algorithms. The model mainly consists of a frame gaze model, a spatiotemporal gaze relationship model, and a future semantic gaze estimation module. Experiments on multiple datasets show that this model performs excellently in both video gaze target detection and video co-gaze target detection tasks, outperforming all previous methods. Ablation experiments demonstrate the positive impact of key modules and different loss functions in the model on the overall performance, and the visualization results show the effectiveness of the model in dynamic gaze scenarios. This research provides new methods and ideas in the field of gaze target detection and helps promote the application of related technologies in fields such as human-computer interaction.

       

    /

    返回文章
    返回