UltraAcc: A Customized Low Power and High Performance CNN Accelerator with Dtaflow on FPGAs; [UltraAcc：定制的低功耗高性能 CNN 加速器，具有 FPGA 上的数据流] - Details

Author：

Bao, Z.-S. (Bao, Z.-S..) | Guo, J.-N. (Guo, J.-N..) | Zhang, W.-B. (Zhang, W.-B..) | Dang, H.-B. (Dang, H.-B..)

Indexed by：

EI Scopus

Abstract：

Convolutional　Neural　Network　(CNN)　has　remarkable　application　effect　in　object　detection,　semantic　segmentation　and　image　classification　in　recent　years.　In　order　to　meet　the　requirements　of　high　precision,　CNN　models　with　deep　layers　need　to　be　constructed.　Due　to　the　large　number　of　parameters　of　the　CNN　and　its　intensive　computational　demands,　it　is　a　great　challenge　to　the　deployment　of　CNN　applications　with　low　latency　requirements　on　edge　devices　which　are　resource-limited.　Although　GPU　can　be　used　to　complete　theoretical　verification　of　accelerated　computation　of　CNN　model.　Due　to　the　limitation　of　GPU　customization　cost　and　power　consumption,　it　cannot　be　applied　in　the　actual　low-power　system.　In　contrast,　as　a　low　power　consumption　and　high　performance　system,　FPGA　has　the　characteristics　of　high　performance　computing　capability　and　reconfigurability,　which　are　suitable　for　customized　computing　of　CNNs.　The　method　to　solve　the　acceleration　problem　is　to　use　the　customized　computing　technology　with　FPGA　reconfigurability.　We　can　use　the　composable　accelerator　to　deal　with　various　CNN　application　scenarios　and　adjust　the　accelerator　structure　to　suit　the　application　to　ensure　power　consumption　efficiency.　The　bottleneck　of　the　existing　CNN　accelerator　on　FPGA　lies　in　the　poor　adaptation　of　CNN　algorithm,　which　leads　to　the　problems　of　large　computing　gap,　the　waste　of　latency　and　low　utilization　of　computing　resources.　In　this　paper,　we　reorganize　the　dataflow　structure　to　adapt　to　CNN　parallel　operation.　According　to　the　limited　FPGA　resources,　the　matrix　multiplication,　convolution　calculation,　pooling　calculation　and　other　units　were　customized　from　the　bottom　up　to　top,　and　the　Ultra　accelerator　(UltraAcc)　is　proposed.　An　evaluation　model　is　designed　for　hyperparameter　tuning.　From　the　bottom　unit　to　the　computing　layer　unit　and　then　to　the　whole　computing　chain,　storage　resources,　computing　resources　and　latency　are　evaluated.　With　the　precision　result　of　CNN　training,　the　whole　application　system　is　balanced　and　optimized　from　both　software　and　hardware.　The　UltraAcc　can　achieve　an　average　throughput　of　126.　72　GOPs　on　the　Ultra96v2,　5.　47　times　higher　than　the　first　place　method　in　IEEE/ACM　DAC-SDC＇19　on　the　same　platform.　The　UltraAcc　was　used　to　participate　in　the　DAC-SDC＇20.　And　we　won　the　first　prize　with　accuracy　of　IoU　0.　65,　speed　of　FPS　212.　73　and　energy　consumption　of　1.64　kj.　©　2023　Science　Press.　All　rights　reserved.

Keyword：

CNN accelerator hardware-software co-design FPGA dataflow

Author Community：

[ 1 ] [Bao Z.-S.]Faulty of Information Technology, Beijing University of Technology, Beijing, 100024, China
[ 2 ] [Guo J.-N.]Faulty of Information Technology, Beijing University of Technology, Beijing, 100024, China
[ 3 ] [Zhang W.-B.]Faulty of Information Technology, Beijing University of Technology, Beijing, 100024, China
[ 4 ] [Dang H.-B.]Faulty of Information Technology, Beijing University of Technology, Beijing, 100024, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Source ：

Chinese Journal of Computers

ISSN： 0254-4164

Year： 2023

Issue： 6

Volume： 46

Page： 1139-1155

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 2

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to