Indexed by:
Abstract:
In today's rapidly developing technological innovation environment, cross-modal retrieval plays a vital role in the field of modern information retrieval and data analysis. It can map data of different modalities such as images and text into a common semantic space, thereby realizing cross-modal similarity measurement and retrieval. In the field of patent analysis, image-text cross-modal retrieval is particularly important. Patent documents usually contain complex technical descriptions and related instructions. The ability to efficiently and accurately retrieve images and text from massive patent data is of great significance for technological innovation, competition analysis and intellectual property protection. This paper proposes a novel visual semantic embedding model for patents using a low-rank multi-head self-attention mechanism to improve the cross-modal retrieval performance in the field of patent analysis. Our model uses EfficientNet as an image encoder to extract richer and more detailed feature representations from patent images. The text encoder uses BERT to effectively handle the complex text structure and polysemy problems in patent documents. By optimizing the embedding generation process of the image-text encoder, the feature expression ability and retrieval accuracy of the model are significantly enhanced. At the same time, we introduced a multi-instance embedding network and used a new low-rank multi-head self-attention mechanism to combine global features and local features, and calculated multiple different representations of instances, further improving the model's ability to handle ambiguity and data diversity. To verify the effectiveness of the model, we conducted a large number of experiments on multiple public cross-modal datasets and a patent-specific dataset. The experimental results show that the model significantly outperforms most existing methods in multiple indicators, especially in patent image and text retrieval tasks, showing higher accuracy and robustness, providing a strong tool support for future patent analysis and technological innovation. © 2024 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2024
Page: 192-198
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 10
Affiliated Colleges: