Earticle

현재 위치 Home

A Comparative Study on OCR using Super-Resolution for Small Fonts

첫 페이지 보기
  • 발행기관
    국제인공지능학회(구 한국인터넷방송통신학회) 바로가기
  • 간행물
    The International Journal of Advanced Smart Convergence KCI 등재 바로가기
  • 통권
    Volume 8 Number 3 (2019.09)바로가기
  • 페이지
    pp.95-101
  • 저자
    Wooyeong Cho, Juwon Kwon, Soonchu Kwon, Jisang Yoo
  • 언어
    영어(ENG)
  • URL
    https://www.earticle.net/Article/A362980

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

원문정보

초록

영어
Recently, there have been many issues related to text recognition using Tesseract. One of these issues is that the text recognition accuracy is significantly lower for smaller fonts. Tesseract extracts text by creating an outline with direction in the image. By searching the Tesseract database, template matching with characters with similar feature points is used to select the character with the lowest error. Because of the poor text extraction, the recognition accuracy is lowerd. In this paper, we compared text recognition accuracy after applying various super-resolution methods to smaller text images and experimented with how the recognition accuracy varies for various image size. In order to recognize small Korean text images, we have used super-resolution algorithms based on deep learning models such as SRCNN, ESRCNN, DSRCNN, and DCSCN. The dataset for training and testing consisted of Korean-based scanned images. The images was resized from 0.5 times to 0.8 times with 12pt font size. The experiment was performed on x0.5 resized images, and the experimental result showed that DCSCN super-resolution is the most efficient method to reduce precision error rate by 7.8%, and reduce the recall error rate by 8.4%. The experimental results have demonstrated that the accuracy of text recognition for smaller Korean fonts can be improved by adding super-resolution methods to the OCR preprocessing module.

목차

Abstract
1. INTRODUCTION
2. TESSERACT-OCR
3. SUPER-RESOLUTION
3.1 SRCNN (SUPER-RESOLUTION CNN)
3.2 ESRCNN (EXPANDED SUPER RESOLUTION CNN)
3.3 DSRCNN (DENOISING SUPER RESOLUTION CNN)
3.4 DCSCN (DEEP CNN WITH SKIP CONNECTION AND NETWORK IN NETWORK)
4. EXPERIMENTS AND RESULTS
4.1 DATASET
4.2 EXPERIMENT METHOD
4.3 RESULTS
5. CONCLUSION
REFERENCES

키워드

Korean OCR Tesseract Super-resolution Text-recognition and Deep-learning

저자

  • Wooyeong Cho [ Department of Electronics Engineering, Kwangwoon University, Korea ]
  • Juwon Kwon [ Department of Electronics Engineering, Kwangwoon University, Korea ]
  • Soonchu Kwon [ Graduate School of Smart Convergence, Kwangwoon University, Korea ]
  • Jisang Yoo [ Department of Electronics Engineering, Kwangwoon University, Korea ] Corresponding author

참고문헌

자료제공 : 네이버학술정보

간행물 정보

발행기관

  • 발행기관명
    국제인공지능학회(구 한국인터넷방송통신학회) [The International Association for Artificial Intelligence]
  • 설립연도
    2000
  • 분야
    공학>전자/정보통신공학
  • 소개
    인터넷방송, 인터넷 TV , 방송 통신 네트워크 및 관련 분야에 대한 국내는 물론 국제적인 학술, 기술의 진흥발전에 공헌하고 지식 정보화 사회에 기여하고자 한다.

간행물

  • 간행물명
    The International Journal of Advanced Smart Convergence
  • 간기
    계간
  • pISSN
    2288-2847
  • eISSN
    2288-2855
  • 수록기간
    2012~2025
  • 십진분류
    KDC 326 DDC 380

이 권호 내 다른 논문 / The International Journal of Advanced Smart Convergence Volume 8 Number 3

    피인용수 : 0(자료제공 : 네이버학술정보)

    함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

      페이지 저장