Indexed by:
Abstract:
With the expansion of the data scale, machine learning develops from centralized to distributed. Generally, distributed machine learning uses parameter server architecture to train in synchronous mode. At this time, data samples are statically and symmetrically allocated to each computing node according to the batch size. Each worker trains synchronously and iterates until the model converges. However, due to the different number of resources at each compute node in a mixed-load scenario, the traditional data partition strategy is usually to statically configure batch size parameters or require manual setting of batch size parameters, which makes the computational efficiency of distributed machine learning model training operations inefficient, and the data adjustment for each node will have an impact on the accuracy of the model. To solve this problem, on the premise of ensuring the accuracy of the distributed machine learning model training task, this paper proposes an optimal configuration scheme for a batch size of distributed machine learning model training task data: a data partition strategy based on distributed machine learning (DQ-DPS). DQ-DPS solves the problem of low computational efficiency caused by static data partitioning, improves the computational efficiency of distributed machine learning tasks, and ensures the accuracy of distributed machine learning training model. Through a large number of experiments, we have proved the effectiveness of DQ-DPS. Compared with the traditional data partition strategy, DQ-DPS improves the computing efficiency of each training round by 38%. © 2021 ACM.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2021
Page: 20-26
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 8
Affiliated Colleges: