Earticle

현재 위치 Home

Nano Information Technology (NIT)

Study on Accelerating Distributed ML Training in Orchestration

첫 페이지 보기
  • 발행기관
    국제인공지능학회(구 한국인터넷방송통신학회) 바로가기
  • 간행물
    The International Journal of Advanced Smart Convergence 바로가기
  • 통권
    Volume 13 Number 3 (2024.09)바로가기
  • 페이지
    pp.143-149
  • 저자
    Su-Yeon Kim, Seok-Jae Moon
  • 언어
    영어(ENG)
  • URL
    https://www.earticle.net/Article/A456170

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

원문정보

초록

영어
As the size of data and models in machine learning training continues to grow, training on a single server is becoming increasingly challenging. Consequently, the importance of distributed machine learning, which distributes computational loads across multiple machines, is becoming more prominent. However, several unresolved issues remain regarding the performance enhancement of distributed machine learning, including communication overhead, inter-node synchronization challenges, data imbalance and bias, as well as resource management and scheduling. In this paper, we propose ParamHub, which utilizes orchestration to accelerate training speed. This system monitors the performance of each node after the first iteration and reallocates resources to slow nodes, thereby speeding up the training process. This approach ensures that resources are appropriately allocated to nodes in need, maximizing the overall efficiency of resource utilization and enabling all nodes to perform tasks uniformly, resulting in a faster training speed overall. Furthermore, this method enhances the system's scalability and flexibility, allowing for effective application in clusters of various sizes.

목차

Abstract
1. INTRODUCTION
2. PROPOSED SYSTEM
2.1. System Overview
2.2. System Component
3. COMPERATIVE ANALYSIS
4. CONCLISION
ACKNOWLEDGMENT
REFERENCES

키워드

Distributed Machine Learning ML Model Training Orchestration Parameter Server Resource Allocation

저자

  • Su-Yeon Kim [ The master’s course, Graduate School of Smart Convergence, Kwangwoon University, Seoul, Korea ]
  • Seok-Jae Moon [ Professor, Graduate School of Smart Convergence, KwangWoon University, Seoul, Korea ] Corresponding Author

참고문헌

자료제공 : 네이버학술정보

간행물 정보

발행기관

  • 발행기관명
    국제인공지능학회(구 한국인터넷방송통신학회) [The International Association for Artificial Intelligence]
  • 설립연도
    2000
  • 분야
    공학>전자/정보통신공학
  • 소개
    인터넷방송, 인터넷 TV , 방송 통신 네트워크 및 관련 분야에 대한 국내는 물론 국제적인 학술, 기술의 진흥발전에 공헌하고 지식 정보화 사회에 기여하고자 한다.

간행물

  • 간행물명
    The International Journal of Advanced Smart Convergence
  • 간기
    계간
  • pISSN
    2288-2847
  • eISSN
    2288-2855
  • 수록기간
    2012~2025
  • 십진분류
    KDC 326 DDC 380

이 권호 내 다른 논문 / The International Journal of Advanced Smart Convergence Volume 13 Number 3

    피인용수 : 0(자료제공 : 네이버학술정보)

    함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

      페이지 저장