• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

En, MengYi (En, MengYi.) | Li, Rong (Li, Rong.) | Li, JianQiang (Li, JianQiang.) (Scholars:李建强) | Liu, Bo (Liu, Bo.) (Scholars:刘博)

Indexed by:

CPCI-S EI Scopus

Abstract:

Features are critical for detecting texts in natural scene images. Nowadays most of scene text detection algorithm leverage powerful feature learning power of convolutional neural networks (CNNs) to learn discriminative features which could distinguish text from non-text well and perform detection based on these features. It is known that features from low layers of CNN are high-resolution but have low discriminative power and less semantic information; this compromises the representative capacity of the features. On the other hand, feature maps from high layers are discriminative but coarse-resolution, which harms the power for detecting small objects. In this paper, we present a feature pyramid based text detector (FPTD) for detecting scene texts at different scales, especially texts at small scales. Our framework is based on the state-of-the-art framework "Single Shot detector" (SSD), but not like SSD which performs detection on feature maps from later-stage of the network, which are coarse in resolution so they cannot get satisfied results on small objects. Our framework incorporates feature pyramid mechanism with SSD framework. Specifically, in the framework, we adopt a top-down fusion strategy to build new features with strong semantics while keep fine details. Text detections are conducted on multiple new constructed features respectively during a single forward pass. All detection results from each layer are gathered and undergo a non-maximum suppression (NMS) process. Since detection is conducted on feature maps from several layers which at different scales but are all discriminative, our framework has strong power to detect texts at different scales. Experimental results confirm that our framework achieves competitive performance on the ICDAR2013 text location benchmark and with marginal extra cost.

Keyword:

feature fusion deep learning multi-scale CNN scene text feature pyramid

Author Community:

  • [ 1 ] [En, MengYi]Beijing Univ Technol, Fac Informat, Beijing, Peoples R China
  • [ 2 ] [Li, Rong]Beijing Univ Technol, Fac Informat, Beijing, Peoples R China
  • [ 3 ] [Li, JianQiang]Beijing Univ Technol, Fac Informat, Beijing, Peoples R China
  • [ 4 ] [Liu, Bo]Beijing Univ Technol, Fac Informat, Beijing, Peoples R China

Reprint Author's Address:

  • [Li, Rong]Beijing Univ Technol, Fac Informat, Beijing, Peoples R China

Show more details

Related Keywords:

Source :

2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2017), VOL 6

ISSN: 1520-5363

Year: 2017

Page: 3-8

Language: English

Cited Count:

WoS CC Cited Count: 5

SCOPUS Cited Count: 4

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 10

Online/Total:726/10649584
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.