• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Pham, Minh (Pham, Minh.) | Yuan, Yongke (Yuan, Yongke.) | Li, Hao (Li, Hao.) | Mou, Chengcheng (Mou, Chengcheng.) | Tu, Yicheng (Tu, Yicheng.) | Xu, Zichen (Xu, Zichen.) | Meng, Jinghan (Meng, Jinghan.)

Indexed by:

EI Scopus

Abstract:

Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of paramount importance. One such challenge is the design of a dynamic memory allocator to allocate memory at runtime. Traditionally, memory allocators have relied on maintaining a global data structure, such as a queue of free pages. However, in the context of massively parallel systems, accessing such global data structures can quickly become a bottleneck even with multiple queues in place. This paper presents a novel approach to dynamic memory allocation that eliminates the need for a centralized data structure. Our proposed approach revolves around letting threads employ random search procedures to locate free pages. Through mathematical proofs and extensive experiments, we demonstrate that the basic random search design achieves lower latency than the best-known existing solution, Ouroboros, in most situations. Furthermore, we develop more advanced techniques and algorithms to tackle the challenge of warp divergence and further enhance performance when free memory is limited. Building upon these advancements, our mathematical proofs and experimental results affirm that these advanced designs can yield an order of magnitude improvement over the basic design and consistently outperform the state-of-the-art by up to two orders of magnitude. To illustrate the practical implications of our work, we integrate our memory management techniques into two GPU algorithms: A hash join and a group-by. Both case studies provide compelling evidence of our approach's pronounced performance gains. © 2025 Copyright held by the owner/author(s).

Keyword:

Information management Storage allocation (computer) Graphics processing unit Memory management units Buffer storage Computer graphics equipment Storage management Dynamic random access storage Parallel processing systems Memory management Memory architecture

Author Community:

  • [ 1 ] [Pham, Minh]Computer Science and Engineering, University of South Florida, Tampa; FL; 60007740, United States
  • [ 2 ] [Yuan, Yongke]Beijing University of Technology, Beijing; 60022281, China
  • [ 3 ] [Li, Hao]University of South Florida, Tampa; 60007740, United States
  • [ 4 ] [Mou, Chengcheng]University of South Florida, Tampa; 60007740, United States
  • [ 5 ] [Tu, Yicheng]University of South Florida, Tampa; 60007740, United States
  • [ 6 ] [Xu, Zichen]Nanchang University, Nanchang; 60008332, China
  • [ 7 ] [Meng, Jinghan]University of South Florida, Tampa; 60007740, United States

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

ACM Transactions on Parallel Computing

ISSN: 2329-4949

Year: 2025

Issue: 1

Volume: 12

Page: 1-33

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 10

Affiliated Colleges:

Online/Total:524/10600934
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.