Apache Spark is an open source, distributed, in-memory computing framework and architecture. Because it runs on a distributed system cluster with data parallelism and fault tolerance through Apache Spark, it can be applied and utilized in various fields such as agriculture, information and communication industries. Meanwhile, in the era of big data, large amount of spatio-temporal data is being generated. However, Apache Spark cannot efficiently process join operation because it does not support join operation that requires many computations in a distributed computing environment. Therefore, in this paper, we proposed an join query processing algorithm, i.e., withindistance and contain join, based on grid partitioning technique using large-scale spatio-temporal data. As a result of performance evaluation, our algorithm shows 20% better performance than the existing algorithm in terms of query processing time.
한국어
아파치 스파크(Apache Spark)는 오픈 소스, 분산, 인메모리 컴퓨팅 프레임워크 및 아키텍처이다. 아파치 스파크를 통해 데이터 병렬 및 내결함성을 갖춘 분산 시스템 클러스터에서 실행되기 때문에 농업, 정보통신 산업 등 다양한 분야에 적용 및 활용할 수 있다. 한편, 빅데이터 시대를 맞아 많은 시공간 데이터가 발생하고 있다. 그러나, 아파치 스파크를 통한 분산 컴퓨팅 환경에서 많은 연산이 필요한 조인과 같은 연산을 제공하고 있지 않기 때문에 효율적으로 처리하지 못한다. 따라서, 본 논문에서는 대용량 시공간 데이터를 이용하여 그리드 분할 기법에 근거한 withindistance, contain 조인 질의처리 알고리즘을 제안한다. 성능평가 결과, 제안한 알고리즘이 기존 알고리즘보다 약 20%의 우수한 성능을 보인다.
Ever since next generation convergence technology became one of the most important industries in the nation, computing professionals have encountered a growing number of challenges. Along with scholars and colleagues in related fields, they have gathered in avariety of forums and meetings over the last few decades to share their knowledge, experiences and the outcome of their research. These exchanges have led to the founding of the International Next-generation Convergence technology (INCA) on December 1, 2015. INCA was registered as an incorporated association under the Ministry of Information and Communications. The main purpose of the organization is to improve our society by achieving the highest capability possible in next generation convergence technology.
간행물
간행물명
차세대융합기술학회논문지 [The Journal of Next-generation Convergence Technology Association]