Earticle

다운로드

Improving In-Silico Bacterial Toxin Prediction via Semi-Supervised Dataset Curation

원문정보

초록

영어
In this study, we propose a semi-supervised dataset curation framework that leverages both high-confidence labeled protein sequence data and automated weakly labeled protein sequence data to refine dataset quality prior to model training. The approach centers on using a pre-trained ProtBERT model to iteratively assign pseudo-labels to uncertain samples, followed by subsequent model retraining, with the goal of enhancing robustness and generalization. We anticipate that a curated dataset constructed in this way will significantly enhance toxin-classification performance— measured in accuracy, F1-score, and MCC—compared to models trained solely on manually labeled or automatically annotated data.

목차

Abstract
I. INTRODUCTION
II. RELATED WORKS
A. Semi-supervised learning
B. Protein Sequence Models
III. METHODOLOGY
ACKNOWLEDGMENT
REFERENCES

저자

  • Sung-Yoon Ahn [ School of Computing, Gachon University Seongnam-Si, Republic of Korea ]
  • Sewon Kim [ School of Computing,, Gachon University Seongnam-Si, Republic of Korea ]
  • Hye Won Jeong [ Department of Microbiology and Immunology Chosun University School of Dentistry Gwangju. Korea ]
  • Sang-Woong Lee [ School of Computing,, Gachon University Seongnam-Si, Republic of Korea ]
  • Iel Soo Bang [ Department of Microbiology and Immunology Chosun University School of Dentistry Gwangju. Korea ]

참고문헌

자료제공 : 네이버학술정보

    간행물 정보

    • 간행물
      한국차세대컴퓨팅학회 학술대회
    • 간기
      반년간
    • 수록기간
      2021~2025
    • 십진분류
      KDC 566 DDC 004