The 10th International Conference on Next Generation Computing 2024 (2024.11)바로가기
페이지
pp.293-294
저자
Sung-Yoon Ahn, Chan-Young Choi, Abrar Alabdulwahab, Joo-Hee Oh, Sang-Woong Lee
언어
영어(ENG)
URL
https://www.earticle.net/Article/A468866
원문정보
초록
영어
Although living organisms differ in shape and size, all are fundamentally structured by genetic sequences. Interpreting these sequences helps explain how organisms function. With the advancement of AI, significant breakthroughs have been made in protein sequencing and understanding protein function. However, there is still room for improvement, as data-intensive models require a substantial amount of protein sequences, many of which are not publicly available or lack quality. In this paper, we present a semi-supervised learning scheme to address the shortage of high-quality training data necessary for training protein language models. We demonstrate that this approach enhances the model's capability to classify toxic fungi protein sequences.
목차
Abstract I. INTRODUCTION II. RELATED WORKS III. MATERIALS AND METHOD A. Dataset and Dataset Collection B. Semi-supervised learning scheme C. Model selection IV. RESULTS V. CONCLUSION ACKNOWLEDGMENT REFERENCES
키워드
FungiProteinself-supervised
저자
Sung-Yoon Ahn [ School of Computing, Gachon University Seongnam-Si, Republic of Korea ]
Chan-Young Choi [ School of Computing Gachon University Seongnam-Si, Republic of Korea ]
Abrar Alabdulwahab [ School of Computing Gachon University Seongnam-Si, Republic of Korea ]
Joo-Hee Oh [ School of Computing Gachon University Seongnam-Si, Republic of Korea ]
Sang-Woong Lee [ School of Computing,, Gachon University Seongnam-Si, Republic of Korea ]
Corresponding Author