무작위 표본에 대한 코퍼스 언어학적 연구

홍정하

216.73.216.190

개인회원 가입

개인회원
기관회원

개인회원 로그인

개인회원 가입으로 더욱 편리하게 이용하세요. 개인회원 가입

아이디/비밀번호를 잊으셨나요? 아이디/비밀번호 찾기

기관회원 로그인

소속기관에서 검색되지 않는 기관은 무료원문다운이 불가능합니다. 개인회원 가입 후 유료구매를 하시거나 소속기관 도서관에 이용문의해 주세요.

Home

무작위 표본에 대한 코퍼스 언어학적 연구
A Corpus-linguistic Approach to Random Samples

발행기관

고려대학교 언어정보연구소 바로가기
간행물

언어정보 KCI 등재 바로가기
통권

제18호 (2014.03)바로가기
페이지

pp.137-162
저자

홍정하
언어

한국어(KOR)
URL

https://www.earticle.net/Article/A218341

※ 기관로그인 시 무료 이용이 가능합니다.

6,400원

원문정보

초록

영어: In quantitative studies, a randomsample is supposed to be randomly selected by probability sampling in sucha way that it represents a population. The statistical analysis of corpus frequencydata is based on a random sample model, which assumes that the corpus wasrandomly selected from the language. However, Kilgarriff (2005), Evert (2006),Goh (2011) show that typical corpus data severely violate the randomnessassumption. This paper aims to evaluate random sampling methods for corpuslinguistics and to explore their characteristics and applicability. They are evaluatedon the relative frequencies of 30 morphemes and the frequencies of all morphemetypes which occur in each sample observed from 1,000 resampling trials basedon how close each random sample is to the normal distribution and theZipf-Mandelbrot (Mandelbrot 1977) law. The present study creates three findings. First, systematic sampling at the unit of measurement, i.e. individual words froman entire corpus is a best way to construct random samples for corpus linguistics. Second, the closer the relative frequencies of 30 morphemes in a sample lieto the normal distribution, the closer the frequency distribution of all morphemetypes to the Zipf-Mandelbrot distribution. Third, It is an effective way to utilizerandom samples for solving problems that stem from different sample size anddata sparseness. Moreover, using them facilitates detecting rather big differencein word frequencies obtained from different corpora.

키워드

corpus random sample probability sampling simple random sampling systematic sampling unit of measurement unit of sampling Zipf-Mandelbrot's law normal distribution variation

저자

홍정하 [ Hong, Jungha. | 고려대학교 언어학과 ]

참고문헌

자료제공 : 네이버학술정보

간행물 정보

발행기관

발행기관명

고려대학교 언어정보연구소 [Research Institute for Language and Information]
설립연도
1993
분야
인문학>언어학
소개
언어정보연구소의 설립 목적은 자연언어 텍스트 및 정보 처리의 새로운 이론과 기술을 연구하고, 그 인문 사회과학적 활용을 발전시키는 데 있다. 이를 좀더 구체적으로 기술하면 다음과 같다. 1) 한국어 및 각종 외국어를 대상으로 대규모의 컴퓨터 데이타베이스를 구성하고, 이를 바탕으로 한 자연언어 처리 및 인문 사회과학적 연구를 추진한다. 2) 전자적 텍스트 및 정보 처리의 새로운 이론과 기술을 개발하고, 국내외 학계와의 협력 및 교류를 통해 본 대학의 유관 학문 분야의 발전을 촉진한다. 3) 본 연구소가 축적하는 데이타베이스를 바탕으로 새로운 차원의 각종 사전과 시소러스(thesaurus) 및 관련 연구 성과를 편찬하고 출판함으로써 새로운 정보 출판 문화의 발전에 기여한다. 4) 언어학 일반, 국어학 영어학 독어학 불어학 등의 개별 언어학, 전산학, 문학, 심리학, 사회학, 매스 커뮤니케이션 등 언어 정보의 분석과 관련된 학문 분야의 방법론적 발전 및 학제적 협력 증진에 기여한다.

간행물

간행물명

언어정보 [LANGUAGE INFORMATION]
간기
미발행
pISSN
1226-8011
eISSN
2233-9213
수록기간
1997~2019
십진분류
KDC 705 DDC 405

이 권호 내 다른 논문 / 언어정보 제18호

피인용수 : 0건 (자료제공 : 네이버학술정보)

함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

출처 : 네이버학술정보

0개의 논문이 장바구니에 담겼습니다.

페이지 저장

소속기관 조회

이용자님의 소속기관(단체)이 서비스에 가입되어 있는지 확인해 보십시오.
기관회원에 소속되어 있는 이용자는 원문을 무료로 이용할 수 있습니다.

상호: 주식회사 학술교육원 I 대표: 노방용 I 사업자등록번호: 122-81-88227 I 통신판매업신고번호: 제2008-인천부평-00176호 I 정보보호책임자: 이두영
주소: (21319)인천광역시 부평구 영성중로 50 미래타워 701호 I 전화: 0505-555-0740 I 팩스: 0505-555-0741 I 이메일: earticle@earticle.net

음성지원 및 돋보기 서비스

Earticle

무작위 표본에 대한 코퍼스 언어학적 연구
A Corpus-linguistic Approach to Random Samples

원문정보

초록

목차

키워드

저자

참고문헌

간행물 정보

발행기관

간행물

이 권호 내 다른 논문 / 언어정보 제18호

피인용수 : 0건 (자료제공 : 네이버학술정보)

함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

Earticle

무작위 표본에 대한 코퍼스 언어학적 연구 A Corpus-linguistic Approach to Random Samples

원문정보

초록

목차

키워드

저자

참고문헌

간행물 정보

발행기관

간행물

이 권호 내 다른 논문 / 언어정보 제18호

피인용수 : 0건 (자료제공 : 네이버학술정보)

함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

무작위 표본에 대한 코퍼스 언어학적 연구
A Corpus-linguistic Approach to Random Samples