Aesthetic multi-attributes network for image captioning - Details

Author：

Indexed by：

Scopus

Abstract：

Image　aesthetic　quality　assessment　has　witnessed　a　remarkable　rise　in　popularity　in　recent　years.　Aesthetic　captioning　has　emerged　as　a　novel　approach　to　encapsulate　the　overall　aesthetic　impression　of　an　image.　However,　the　inherently　challenging　task　of　annotating　aesthetic　attributes　has　constrained　the　scale　of　existing　datasets.　To　address　this　limitation,　the　DPChallenge　Multi-Attributes　Captions　(DPC-MAC)　dataset　was　developed　by　integrating　semi-automatically　generated　annotations　from　small-scale,　fully　annotated　datasets　with　extensive　technical　reviews　sourced　from　a　photography　platform.　The　DPC-MAC　dataset　encompasses　four　key　aesthetic　attributes:　composition,　lighting,　color,　and　subject.　To　effectively　leverage　this　data,　we　introduce　an　innovative　Aesthetic　Multi-Attributes　Captioning　Network　(AMACN),　comprising　the　Bottom-Up　and　Top-Down　Attention　Network　(BUTDAN)　and　the　Object-Semantics　Aligned　Pretrained　Network　(OSAPN).　Both　networks　are　trained　using　a　combination　of　small-scale,　fully　annotated　datasets　and　the　large-scale　DPC-MAC　dataset.　The　performance　of　the　proposed　AMACN　model　on　DPC-MAC　surpasses　existing　methods　based　on　standard　image　captioning　evaluation　metrics,　demonstrating　its　efficacy.　This　groundbreaking　task　of　aesthetic　attribute　assessment　represents　a　promising　avenue　for　advancing　research　in　this　field.　By　innovatively　integrating　aesthetic　attributes　with　descriptive　commentary,　the　DPC-MAC　dataset　provides　a　valuable　resource　for　researchers　to　develop　more　precise　and　nuanced　aesthetic　models.　This　work　not　only　paves　the　way　for　further　exploration　of　image　aesthetics　but　also　holds　the　potential　to　enhance　the　quality　and　sophistication　of　aesthetic　evaluations.　©　2025　Elsevier　Ltd

Keyword：

Image aesthetic quality assessment Semi-supervised learning Image captioning Aesthetic attributes assessment

Author Community：

[ 1 ] [Yang H.]School of Electrical and Information Engineering, Beijing Polytechnic College, Beijing, 100042, China
[ 2 ] [Li Y.]Beijing Electronic Science and Technology Institute, Beijing, 100070, China
[ 3 ] [Jin X.]Beijing Electronic Science and Technology Institute, Beijing, 100070, China
[ 4 ] [Zhou X.]University of Science and Technology, Hefei, 230026, China
[ 5 ] [Shi P.]School of Information and Communication Engineering, Communication University of China, Beijing, 100024, China
[ 6 ] [Liu Y.]School of Electrical and Information Engineering, Beijing Polytechnic College, Beijing, 100042, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Aesthetic Multi-attributes Captioning Network for Photos
2024，
Semi-supervised facial expression recognition algorithm on the condition of multi-pose
2013，Journal of Information Hiding and Multimedia Signal Processing
Pseudo-label based semi-supervised learning in the distributed machine learning framework
2022，高技术通讯（英文版）
Hash graph based semi-supervised learning method and its application in image segmentation
2010，Acta Automatica Sinica

Source ：

Computers and Electrical Engineering

ISSN： 0045-7906

Year： 2025

Volume： 123

4 . 3 0 0

JCR@2022

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 4

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to