The 8th International Conference on Next Generation Computing 2022 (2022.10)바로가기
페이지
pp.72-75
저자
Daegun Yoon, Sangyoon Oh
언어
영어(ENG)
URL
https://www.earticle.net/Article/A419742
원문정보
초록
영어
To train deep learning models faster, distributed training on multiple GPUs is the very popular scheme in recent years. However, the communication bandwidth is still a major bottleneck of training performance. To improve overall training performance, recent works have proposed gradient sparsification methods that reduce the communication traffic significantly. Most of them require gradient sorting to select meaningful gradients such as Top-k gradient sparsification (Top-k SGD). However, Top-k SGD has a limit to increase the speed up overall training performance because gradient sorting is significantly inefficient on GPUs. In this paper, we conduct experiments that show the inefficiency of Top-k SGD and provide the insight of the low performance. Based on observations from our empirical analysis, we plan to yield a high performance gradient sparsification method as a future work.
목차
Abstract I. INTRODUCTION II. EXPERIMENTAL RESULTS A. Convergence and Accuracy B. Breakdown of Iteration C. Further Experiments III. CONCLUSION AND FUTURE WORK REFERENCES
키워드
distributed deep learninggradient sparsification
저자
Daegun Yoon [ Department of Artificial Intelligence Ajou University ]
Sangyoon Oh [ Department of Artificial Intelligence Ajou University ]
Corresponding Author