The 9th International Conference on Next Generation Computing 2023 (2023.12)바로가기
페이지
pp.38-39
저자
Yunpyo Hong, Seokhun Jeon, Young-Jong Jang, Byung-Soo Kim
언어
영어(ENG)
URL
https://www.earticle.net/Article/A448112
원문정보
초록
영어
As many LLMs have been released, modified network layers based on transformer have been researched to improve performance. However, it is essential to design LLMs in a large size for performance, and as a result, current LLMs can only be executed on large servers, and various attempts have been made to reduce the amount of computation. In this paper, we present a method to reduce the amount of computation by using the data attribute of the SwiGLU layer used by meta and google. Since SwiGLU contains an activation function, it generates a large number of near-zero values, and we try to reduce the amount of computation by skipping unnecessary operations. Our experiments show that our algorithm can reduce the computation by 13.3% when there are 20% zeros from activation function.
목차
Abstract I. INTRODUCTION II. BACKGROUND III. PROPOSED ARCHITECTURE IV. RESULT & CONCLUSION ACKNOWLEDGMENT REFERENCES
키워드
GLUSwiGLUdataflowLLMFFNzero skip
저자
Yunpyo Hong [ Korea Electronics Technology Institute SoC Platform Research Center ]
Seokhun Jeon [ Korea Electronics Technology Institute SoC Platform Research Center ]
Young-Jong Jang [ Korea Electronics Technology Institute SoC Platform Research Center ]
Byung-Soo Kim [ Korea Electronics Technology Institute SoC Platform Research Center ]
Corresponding Author