• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Yang, Xiaoda (Yang, Xiaoda.) | Cheng, Xize (Cheng, Xize.) | Duan, Jiaqi (Duan, Jiaqi.) | Qiu, Hongshun (Qiu, Hongshun.) | Hong, Minjie (Hong, Minjie.) | Fang, Minghui (Fang, Minghui.) | Ji, Shengpeng (Ji, Shengpeng.) | Zuo, Jialung (Zuo, Jialung.) | Hong, Zhiqing (Hong, Zhiqing.) | Zhang, Zhimeng (Zhang, Zhimeng.) | Jin, Tao (Jin, Tao.)

Indexed by:

EI

Abstract:

Visual Speech Recognition (VSR) aims to predict spoken content by analyzing lip movements in videos. Recently reported state-of-the-art results in VSR often rely on increasingly large amounts of video data, while the publicly available transcribed video datasets are insufficient compared to the audio data. To further enhance the VSR model using the audio data, we employed a generative model for data inflation, integrating the synthetic data with the authentic visual data. Essentially, the generative model incorporates another insight, which enhances the capabilities of the recognition model. For the cross-language issue, previous work has shown poor performance with non-Indo-European languages. We trained a multi-language-family modal fusion model, AudioVSR. Leveraging the concept of modal transfer, we achieved significant results in downstream VSR tasks under conditions of data scarcity. To the best of our knowledge, AudioVSR represents the first work on cross-language-family audio-lip alignment, achieving a new SOTA in the cross-language scenario. © 2024 Association for Computational Linguistics.

Keyword:

Video analysis Data assimilation Data integration Computational linguistics Speech recognition

Author Community:

  • [ 1 ] [Yang, Xiaoda]Zhejiang University, China
  • [ 2 ] [Cheng, Xize]Zhejiang University, China
  • [ 3 ] [Duan, Jiaqi]Qingdao University, China
  • [ 4 ] [Qiu, Hongshun]Beijing University of Technology, China
  • [ 5 ] [Hong, Minjie]Zhejiang University, China
  • [ 6 ] [Fang, Minghui]Zhejiang University, China
  • [ 7 ] [Ji, Shengpeng]Zhejiang University, China
  • [ 8 ] [Zuo, Jialung]Zhejiang University, China
  • [ 9 ] [Hong, Zhiqing]Zhejiang University, China
  • [ 10 ] [Zhang, Zhimeng]Zhejiang University, China
  • [ 11 ] [Jin, Tao]Zhejiang University, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

Year: 2024

Page: 15352-15361

Language: English

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 27

Affiliated Colleges:

Online/Total:811/10577912
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.