• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Liu, Hancong (Liu, Hancong.) | Qi, Jingzhong (Qi, Jingzhong.) | Zhu, Qing (Zhu, Qing.)

Indexed by:

EI

Abstract:

In today's rapidly developing technological innovation environment, cross-modal retrieval plays a vital role in the field of modern information retrieval and data analysis. It can map data of different modalities such as images and text into a common semantic space, thereby realizing cross-modal similarity measurement and retrieval. In the field of patent analysis, image-text cross-modal retrieval is particularly important. Patent documents usually contain complex technical descriptions and related instructions. The ability to efficiently and accurately retrieve images and text from massive patent data is of great significance for technological innovation, competition analysis and intellectual property protection. This paper proposes a novel visual semantic embedding model for patents using a low-rank multi-head self-attention mechanism to improve the cross-modal retrieval performance in the field of patent analysis. Our model uses EfficientNet as an image encoder to extract richer and more detailed feature representations from patent images. The text encoder uses BERT to effectively handle the complex text structure and polysemy problems in patent documents. By optimizing the embedding generation process of the image-text encoder, the feature expression ability and retrieval accuracy of the model are significantly enhanced. At the same time, we introduced a multi-instance embedding network and used a new low-rank multi-head self-attention mechanism to combine global features and local features, and calculated multiple different representations of instances, further improving the model's ability to handle ambiguity and data diversity. To verify the effectiveness of the model, we conducted a large number of experiments on multiple public cross-modal datasets and a patent-specific dataset. The experimental results show that the model significantly outperforms most existing methods in multiple indicators, especially in patent image and text retrieval tasks, showing higher accuracy and robustness, providing a strong tool support for future patent analysis and technological innovation. © 2024 IEEE.

Keyword:

Content based retrieval Patents and inventions Image analysis Image enhancement Image coding Modal analysis Network embeddings

Author Community:

  • [ 1 ] [Liu, Hancong]Beijing University of Technology, Beijing, China
  • [ 2 ] [Qi, Jingzhong]Beijing University of Technology, Beijing, China
  • [ 3 ] [Zhu, Qing]Beijing University of Technology, Beijing, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

Year: 2024

Page: 192-198

Language: English

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 10

Affiliated Colleges:

Online/Total:1207/10606719
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.