Feature Pyramid Based Scene Text Detector - Details

Author：

En, MengYi (En, MengYi.) | Li, Rong (Li, Rong.) | Li, JianQiang (Li, JianQiang.) (Scholars：李建强) | Liu, Bo (Liu, Bo.) (Scholars：刘博)

Indexed by：

CPCI-S EI Scopus

Abstract：

Features　are　critical　for　detecting　texts　in　natural　scene　images.　Nowadays　most　of　scene　text　detection　algorithm　leverage　powerful　feature　learning　power　of　convolutional　neural　networks　(CNNs)　to　learn　discriminative　features　which　could　distinguish　text　from　non-text　well　and　perform　detection　based　on　these　features.　It　is　known　that　features　from　low　layers　of　CNN　are　high-resolution　but　have　low　discriminative　power　and　less　semantic　information;　this　compromises　the　representative　capacity　of　the　features.　On　the　other　hand,　feature　maps　from　high　layers　are　discriminative　but　coarse-resolution,　which　harms　the　power　for　detecting　small　objects.　In　this　paper,　we　present　a　feature　pyramid　based　text　detector　(FPTD)　for　detecting　scene　texts　at　different　scales,　especially　texts　at　small　scales.　Our　framework　is　based　on　the　state-of-the-art　framework　＂Single　Shot　detector＂　(SSD),　but　not　like　SSD　which　performs　detection　on　feature　maps　from　later-stage　of　the　network,　which　are　coarse　in　resolution　so　they　cannot　get　satisfied　results　on　small　objects.　Our　framework　incorporates　feature　pyramid　mechanism　with　SSD　framework.　Specifically,　in　the　framework,　we　adopt　a　top-down　fusion　strategy　to　build　new　features　with　strong　semantics　while　keep　fine　details.　Text　detections　are　conducted　on　multiple　new　constructed　features　respectively　during　a　single　forward　pass.　All　detection　results　from　each　layer　are　gathered　and　undergo　a　non-maximum　suppression　(NMS)　process.　Since　detection　is　conducted　on　feature　maps　from　several　layers　which　at　different　scales　but　are　all　discriminative,　our　framework　has　strong　power　to　detect　texts　at　different　scales.　Experimental　results　confirm　that　our　framework　achieves　competitive　performance　on　the　ICDAR2013　text　location　benchmark　and　with　marginal　extra　cost.

Keyword：

feature fusion deep learning multi-scale CNN scene text feature pyramid

Author Community：

[ 1 ] [En, MengYi]Beijing Univ Technol, Fac Informat, Beijing, Peoples R China
[ 2 ] [Li, Rong]Beijing Univ Technol, Fac Informat, Beijing, Peoples R China
[ 3 ] [Li, JianQiang]Beijing Univ Technol, Fac Informat, Beijing, Peoples R China
[ 4 ] [Liu, Bo]Beijing Univ Technol, Fac Informat, Beijing, Peoples R China

Reprint Author's Address：

[Li, Rong]Beijing Univ Technol, Fac Informat, Beijing, Peoples R China

Email：

enmengyi@emails.bjut.edu.cn |
leerong@bjut.edu.cn |
lijianqiang@bjut.edu.cn

Show more details

Related Keywords：

Improved YOLOv8 for Complex Environmental Fish Detection
2024，
Underwater Target Detection Algorithm Based on Improved YOLOv5
2022，JOURNAL OF MARINE SCIENCE AND ENGINEERING
A High-Stability Diagnosis Model Based on a Multiscale Feature Fusion Convolutional Neural Network
2021，IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT
Image Manipulation Localization Using Attentional Cross-Domain CNN Features
2021，IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Source ：

2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2017), VOL 6

ISSN： 1520-5363

Year： 2017

Page： 3-8

Language： English

Cited Count：

WoS CC Cited Count： 5

SCOPUS Cited Count： 4

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 10

Affiliated Colleges：

信息科学技术学院本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to