This paper is meant for an easy approach for XML ifying of crude corpus in the field of Opinion Mining. The XMLification is done based on regular expressions. Corpus is the plural form of ‘corpora’. It is nothing but the collection of linguistic data. In this proposed work, the corpus is reviews posted on web sites; more specifically some product reviews. The reviews or the opinions are in the html files which are collected from sites like Cnet.com, Epinions.com, Amazon.com, ebay.com etc. After getting the crude corpus of html files, it is polished further to get only the required part of review details from that web page and thus removes the rest. This corpus is processed again and yields ultimate output in the form of XML files which contains only the important parts of the review details from raw html page. These XML files are ready to be used for further steps of Opinion Mining like parts of Speech(POS) tagging or any kind of language processes for machine learning process.
목차
Abstract 1. Introduction 2. Terminology 2.1. Corpus 2.2. Natural Language Processing 2.3. Linguistics 2.4. Regular Expression 2.5. Parsing 2.6 Opinion Mining 3. Related Works 4. Our Work 4.1. AEAXTCCTFOM_CORPUSREFINEMENT_MAIN () 4.2. AEAXTCCTFOM_HTML_REFINEMENT (INPUT_FOLDER,REGULAR_EXPRESSION) 4.3. AEAXTCCTFOM_XMLFILEGENERATOR (REFINED_FILE, XML_TAG) 4.4. AEAXTCCTFOM_FILEMAINTAIN (INPUT_FOLDER, XML_FILE) 5. Result and Discussion 5.1. Complexity analysis of the stated algorithm 5.2. Test Result 6. Conclusion References
키워드
Crude corpuslanguage processingregular expressionXMLparts of speech tagging.
저자
Debnath Bhattacharyya [ Computer Science and Engineering Department Heritage Institute of Technology ]
Kheyali Mitra [ Computer Science and Engineering Department Heritage Institute of Technology ]
Minkyu Choi [ Hannam University ]
Rosslin J.Robles [ Hannam University ]
Debashis Ganguly [ Computer Science and Engineering Department Heritage Institute of Technology ]
보안공학연구지원센터(IJGDC) [Science & Engineering Research Support Center, Republic of Korea(IJGDC)]
설립연도
2006
분야
공학>컴퓨터학
소개
1. 보안공학에 대한 각종 조사 및 연구
2. 보안공학에 대한 응용기술 연구 및 발표
3. 보안공학에 관한 각종 학술 발표회 및 전시회 개최
4. 보안공학 기술의 상호 협조 및 정보교환
5. 보안공학에 관한 표준화 사업 및 규격의 제정
6. 보안공학에 관한 산학연 협동의 증진
7. 국제적 학술 교류 및 기술 협력
8. 보안공학에 관한 논문지 발간
9. 기타 본 회 목적 달성에 필요한 사업
간행물
간행물명
International Journal of Grid and Distributed Computing
간기
격월간
pISSN
2005-4262
수록기간
2008~2016
십진분류
KDC 505DDC 605
이 권호 내 다른 논문 / International Journal of Grid and Distributed Computing vol.2 no.3