It is an importance step for near-duplication detection to perform file classification in the data mining field, in this paper an improved classification course is proposed which consists of training and test course corresponding to its algorithm respectively. It utilizes the MapReduce computing model created by Google to conduct the classification calculation. Specially, the Sogou news data with various data amounts which simulated the massive data set was used for testing effectiveness and a comparative evaluation on execution time and speedup was accomplished on the experimental circumstance. The results obtained shows that the classification course obviously reduces the execution times greatly and gains the ideal speedup ratio when increasing data amounts, achieves the better performance.
목차
Abstract 1. Introduction 2. Relevant Work 3. Classification Course 4. Experimental Test 5. Conclusions Acknowledgment References
키워드
ClassificationNaïve ByesAlgorithmMapReduceMassive Data
저자
Haitao Wang [ School of Computer Science and Technology Jilin University, QianJin Street, ChangChun, JiLin, China,Henan Polytechnic University Shiji Street, Jiaozuo, Henan, China ]
Shunfeng Liu [ School of Computer Science and Technology Jilin University, QianJin Street, ChangChun, JiLin, China ]
Zongpu Jia [ Henan Polytechnic University Shiji Street, Jiaozuo, Henan, China ]
보안공학연구지원센터(IJGDC) [Science & Engineering Research Support Center, Republic of Korea(IJGDC)]
설립연도
2006
분야
공학>컴퓨터학
소개
1. 보안공학에 대한 각종 조사 및 연구
2. 보안공학에 대한 응용기술 연구 및 발표
3. 보안공학에 관한 각종 학술 발표회 및 전시회 개최
4. 보안공학 기술의 상호 협조 및 정보교환
5. 보안공학에 관한 표준화 사업 및 규격의 제정
6. 보안공학에 관한 산학연 협동의 증진
7. 국제적 학술 교류 및 기술 협력
8. 보안공학에 관한 논문지 발간
9. 기타 본 회 목적 달성에 필요한 사업
간행물
간행물명
International Journal of Grid and Distributed Computing
간기
격월간
pISSN
2005-4262
수록기간
2008~2016
십진분류
KDC 505DDC 605
이 권호 내 다른 논문 / International Journal of Grid and Distributed Computing Vol.8 No.3