• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Wang, Shuo (Wang, Shuo.) | Guo, Dan (Guo, Dan.) | Xu, Xin (Xu, Xin.) | Zhuo, Li (Zhuo, Li.) (Scholars:卓力) | Wang, Meng (Wang, Meng.)

Indexed by:

EI Scopus SCIE

Abstract:

As an indispensable process of cross-media analyzing, comprehending heterogeneous data faces challenges in the fields of visual question answering (VQA), visual captioning, and cross-modality retrieval. Bridging the semantic gap between the two modalities is still difficult. In this article, to address the problem in cross-modality retrieval, we propose a cross-modal learning model with joint correlative calculation learning. First, an auto-encoder is used to embed the visual features by minimizing the error of feature reconstruction and a multi-layer perceptron (MLP) is utilized to model the textual features embedding. Then we design a joint loss function to optimize both the intra- and the inter-correlations among the image-sentence pairs, i.e., the reconstruction loss of visual features, the relevant similarity loss of paired samples, and the triplet relation loss between positive and negative examples. In the proposed method, we optimize the joint loss based on a batch score matrix and utilize all mutual mismatched paired samples to enhance its performance. Our experiments in the retrieval tasks demonstrate the effectiveness of the proposed method. It achieves comparable performance to the state-of-the-art on three benchmarks, i.e., Flickr8k, Flickr30k, and MS-COCO.

Keyword:

Cross-modality retrieval joint loss auto-encoder MLP

Author Community:

  • [ 1 ] [Wang, Shuo]Hefei Univ Technol, Sch Artificial Intelligence, Sch Comp Sci & Informat, Hefei 230601, Anhui, Peoples R China
  • [ 2 ] [Guo, Dan]Hefei Univ Technol, Sch Artificial Intelligence, Sch Comp Sci & Informat, Hefei 230601, Anhui, Peoples R China
  • [ 3 ] [Wang, Meng]Hefei Univ Technol, Sch Artificial Intelligence, Sch Comp Sci & Informat, Hefei 230601, Anhui, Peoples R China
  • [ 4 ] [Xu, Xin]Wuhan Univ Sci & Technol, Coll Comp Sci & Technol, Wuhan, Hubei, Peoples R China
  • [ 5 ] [Zhuo, Li]Beijing Univ Technol, Signal & Informat Proc Lab, Beijing, Peoples R China

Reprint Author's Address:

  • [Wang, Shuo]Hefei Univ Technol, Sch Artificial Intelligence, Sch Comp Sci & Informat, Hefei 230601, Anhui, Peoples R China;;[Guo, Dan]Hefei Univ Technol, Sch Artificial Intelligence, Sch Comp Sci & Informat, Hefei 230601, Anhui, Peoples R China

Show more details

Related Keywords:

Related Article:

Source :

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS

ISSN: 1551-6857

Year: 2019

Issue: 2

Volume: 15

5 . 1 0 0

JCR@2022

ESI Discipline: COMPUTER SCIENCE;

ESI HC Threshold:147

JCR Journal Grade:1

Cited Count:

WoS CC Cited Count: 16

SCOPUS Cited Count: 16

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 10

Online/Total:523/10598349
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.