The effectiveness of data augmentation in code readability classification - Details

Author：

Mi, Qing (Mi, Qing.) | Xiao, Yan (Xiao, Yan.) | Cai, Zhi (Cai, Zhi.) | Jia, Xibin (Jia, Xibin.) (Scholars：贾熹滨)

Indexed by：

EI Scopus SCIE

Abstract：

Context:　Training　deep　learning　models　for　code　readability　classification　requires　large　datasets　of　quality　pre-labeled　data.　However,　it　is　almost　always　time-consuming　and　expensive　to　acquire　readability　data　with　manual　labels.　Objective:　We　thus　propose　to　introduce　data　augmentation　approaches　to　artificially　increase　the　size　of　training　set,　this　is　to　reduce　the　risk　of　overfitting　caused　by　the　lack　of　readability　data　and　further　improve　the　classification　accuracy　as　the　ultimate　goal.　Method:　We　create　transformed　versions　of　code　snippets　by　manipulating　original　data　from　aspects　such　as　comments,　indentations,　and　names　of　classes/methods/variables　based　on　domain-specific　knowledge.　In　addition　to　basic　transformations,　we　also　explore　the　use　of　Auxiliary　Classifier　GANs　to　produce　synthetic　data.　Results:　To　evaluate　the　proposed　approach,　we　conduct　a　set　of　experiments.　The　results　show　that　the　classification　performance　of　deep　neural　networks　can　be　significantly　improved　when　they　are　trained　on　the　augmented　corpus,　achieving　a　state-of-the-art　accuracy　of　87.38%.　Conclusion:　We　consider　the　findings　of　this　study　as　primary　evidence　of　the　effectiveness　of　data　augmentation　in　the　field　of　code　readability　classification.

Keyword：

Generative adversarial network Deep learning Data augmentation Empirical software engineering Code readability classification

Author Community：

[ 1 ] [Mi, Qing]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
[ 2 ] [Cai, Zhi]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
[ 3 ] [Jia, Xibin]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
[ 4 ] [Xiao, Yan]Natl Univ Singapore, Sch Comp, Singapore, Singapore

Reprint Author's Address：

[Cai, Zhi]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China

Email：

miqing@bjut.edu.cn |
dcsxan@nus.edu.sg |
caiz@bjut.edu.cn |
jiaxibin@bjut.edu.cn

Show more details

Related Keywords：

Diverse sample generation with multi-branch conditional generative adversarial network for remote sensing objects detection
2020，NEUROCOMPUTING
Deep3D reconstruction: methods, data, and challenges
2021，FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING
Efficient multi-material topology optimization design with minimum compliance based on ResUNet involved generative adversarial network
2024，ACTA MECHANICA SINICA
Person image synthesis through siamese generative adversarial network
2020，NEUROCOMPUTING

Source ：

INFORMATION AND SOFTWARE TECHNOLOGY

ISSN： 0950-5849

Year： 2021

Volume： 129

3 . 9 0 0

JCR@2022

ESI Discipline： COMPUTER SCIENCE;

ESI HC Threshold：87

JCR Journal Grade：1

Cited Count：

WoS CC Cited Count： 16

SCOPUS Cited Count： 22

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 4

Affiliated Colleges：

信息科学技术学院本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to