Enhanced Cross-Modal Retrieval with Low-Rank Multi-Head Self-Attention - Details

Author：

Liu, Hancong (Liu, Hancong.) | Qi, Jingzhong (Qi, Jingzhong.) | Zhu, Qing (Zhu, Qing.)

Indexed by：

Abstract：

In　today＇s　rapidly　developing　technological　innovation　environment,　cross-modal　retrieval　plays　a　vital　role　in　the　field　of　modern　information　retrieval　and　data　analysis.　It　can　map　data　of　different　modalities　such　as　images　and　text　into　a　common　semantic　space,　thereby　realizing　cross-modal　similarity　measurement　and　retrieval.　In　the　field　of　patent　analysis,　image-text　cross-modal　retrieval　is　particularly　important.　Patent　documents　usually　contain　complex　technical　descriptions　and　related　instructions.　The　ability　to　efficiently　and　accurately　retrieve　images　and　text　from　massive　patent　data　is　of　great　significance　for　technological　innovation,　competition　analysis　and　intellectual　property　protection.　This　paper　proposes　a　novel　visual　semantic　embedding　model　for　patents　using　a　low-rank　multi-head　self-attention　mechanism　to　improve　the　cross-modal　retrieval　performance　in　the　field　of　patent　analysis.　Our　model　uses　EfficientNet　as　an　image　encoder　to　extract　richer　and　more　detailed　feature　representations　from　patent　images.　The　text　encoder　uses　BERT　to　effectively　handle　the　complex　text　structure　and　polysemy　problems　in　patent　documents.　By　optimizing　the　embedding　generation　process　of　the　image-text　encoder,　the　feature　expression　ability　and　retrieval　accuracy　of　the　model　are　significantly　enhanced.　At　the　same　time,　we　introduced　a　multi-instance　embedding　network　and　used　a　new　low-rank　multi-head　self-attention　mechanism　to　combine　global　features　and　local　features,　and　calculated　multiple　different　representations　of　instances,　further　improving　the　model＇s　ability　to　handle　ambiguity　and　data　diversity.　To　verify　the　effectiveness　of　the　model,　we　conducted　a　large　number　of　experiments　on　multiple　public　cross-modal　datasets　and　a　patent-specific　dataset.　The　experimental　results　show　that　the　model　significantly　outperforms　most　existing　methods　in　multiple　indicators,　especially　in　patent　image　and　text　retrieval　tasks,　showing　higher　accuracy　and　robustness,　providing　a　strong　tool　support　for　future　patent　analysis　and　technological　innovation.　©　2024　IEEE.

Keyword：

Content based retrieval Patents and inventions Image analysis Image enhancement Image coding Modal analysis Network embeddings

Author Community：

[ 1 ] [Liu, Hancong]Beijing University of Technology, Beijing, China
[ 2 ] [Qi, Jingzhong]Beijing University of Technology, Beijing, China
[ 3 ] [Zhu, Qing]Beijing University of Technology, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Deep spectral-spatial feature extraction based on DCGAN for hyperspectral image retrieval
2017，15th IEEE International Conference on Dependable, Autonomic and Secure Computing, 2017 IEEE 15th International Conference on Pervasive Intelligence and Computing, 2017 IEEE 3rd International Conference on Big Data Intelligence and Computing and 2017 IEEE Cyber Science and Technology Congress, DASC-PICom-DataCom-CyberSciTec 2017
Secure image retrieval scheme in the encrypted domain
2013，8th International Conference on Computer Vision Theory and Applications, VISAPP 2013
An effective web image searching engine based on SIFT feature matching
2009，2009 2nd International Congress on Image and Signal Processing, CISP'09
A new image retrieval method based on combined features and feature statistic
2008，1st International Congress on Image and Signal Processing, CISP 2008

Source ：

Year： 2024

Page： 192-198

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 10

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to