Language modeling for Information Retrieval proposed a few years ago has been attractive and improved the performance of IR systems effectively comparing to classic models and approaches. Smoothing technology in parameter estimations is one of main problems in carrying out language models. The performance of IR system will be enhanced by effective smoothing methods. Semantic smoothing has been developed recently for language modeling with some knowledge of language. This paper presents a modification to a smoothing approach in general language model combining with translation modeling, which is taking synonyms in documents and the collection into account for semantic smoothing and performance improving in Chinese document retrieval. The synonym knowledge is from a well‐known thesaurus in Chinese NLP, called Tongyici Cilin (Extended). A comparison shows that the semantic smoothed approach brings approximately 1.33% improvement on average.
목차
Abstract 1. Introduction 2. Related Work 3. Semantic GLM for IR 4. Evaluation 5. Conclusion References
키워드
Language Model for Information Retrieval; Semantic Smoothing; Chinese Thesaurus
저자
Liqi Gao [ Information Retrieval Laboratory, School of Computer Science & Technology, Harbin Institute of Technology ]
Ting Liu [ Information Retrieval Laboratory, School of Computer Science & Technology, Harbin Institute of Technology ]
Ru Chen [ Information Retrieval Laboratory, School of Computer Science & Technology, Harbin Institute of Technology ]
Yu Zhang [ Information Retrieval Laboratory, School of Computer Science & Technology, Harbin Institute of Technology ]
한국어정보학회 [Korean Language Information Science Society]
설립연도
1990
분야
인문학>언어학
소개
학술적인 연구를 통하여 국어정보처리에 관련된 이론 체계를 정립하고, 산업계와의 긴밀한 협동을 통하여 정보처리 기술을 향상 시키면서 정보산업의 성장을 돕고, 대중적인 교육과 홍보를 통하여 발전된 정보 처리의 기술을 보급하므로써 국어의 문화적 가치를 높히고 국어정보 처리 기술의 국제적 지위향상과 표준화에 기여하고자 합니다.