Earticle

현재 위치 Home

International Journal of Database Theory and Application

간행물 정보
  • 자료유형
    학술지
  • 발행기관
    보안공학연구지원센터(IJDTA) [Science & Engineering Research Support Center, Republic of Korea(IJDTA)]
  • pISSN
    2005-4270
  • 간기
    격월간
  • 수록기간
    2008 ~ 2016
  • 주제분류
    공학 > 컴퓨터학
  • 십진분류
    KDC 505 DDC 605
Vol.8 No.2 (25건)
No
1

Improvements in Data Mining Association Rules Algorithm

Dai Li

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.1-10

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Because of the traditional Apriori algorithm in data mining in the process of operation for a long time, and produce a large number of unrelated item sets, caused great waste of data space, so this article puts forward an improved Apriori algorithm based on SQL, increase degree by calculation method, for pruning association rules and is independent of item sets.

2

The real-world data process of large spatio-temporal data collection presents a very difficult technical problem. Firstly, the given process is very expensive, requiring a lot of various high-technology software instruments and modern hardware infrastructure (sensors, servers, GPS infrastructure etc.) installations; secondly, this process sometimes cannot show special traffic patterns, which we may characterize as patterned traffic trajectories. The Arena simulation framework introduced in this paper uses our suggested random linear interpolation algorithm and spatio-temporal prediction algorithm, which are applicable to visualize, handle and predict movement data with various time resolutions.

3

A Method for Building Naxi Language Dependency Treebank Based on Chinese-Naxi Language Relationship Alignment

Gao Sheng-Xiang, An Ming-Jia, Mao Cun-Li, Xian Yan-Tuan, Yu Zheng-Tao

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.25-32

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Relative to Chinese, as to Naxi Language, its corpus is very rare, its annotation is also difficult, and these factors make its Syntactic Analysis much too difficult. Aiming to the problem, in the paper, it is proposed a method for building Naxi Language Dependency Treebank based on Chinese-Naxi Language relationship alignment. Firstly, the corresponding words of Chinese-Naxi sentence pairs are aligned; then, the dependency grammar on Chinese sentences; Finally, some characteristics and rules of Naxi Language in itself being considered, the generated Chinese Dependency Tree is mapped to Naxi Sentence by using Chinese-Naxi Languages relationship alignment, as a result, Naxi Dependency Parsing Tree is generated. Experimental results show that: This approach can simplify the process of manual collection and annotation of Naxi Treebank, and save manpower and time to build the dependency treebank of Naxi Language.

4

Efficient Information Extraction Based on Signature Index

Canghong Jin, Minghui Wu, Zemin Liu, Shiwen Cheng

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.33-42

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Information Extraction (IE) is an essential tool to retrieve structured information from at text (including web-content).Given a pattern query, an ideal IE application should be able to extract the matched target effectively and efficiently. However, as far as we know, efficiency and flexibility are major concern for typical IE tools since they either use brute-force document parsing for each query off-line or support on-line query in pre-extracted elements. In order to promote accuracy and efficiency of extraction, in this paper we propose a novel framework iExtractor that leverages In-formation Retrieval (IR) indexes to speed-up IE processes. We index text blocks with their signatures (presented as bit-strings) and propose efficient IE algorithms based on the signature index. Hence, iExtractor can validate query pattern in signature index without original text. The framework also supports on-line extraction through a general and flexible pattern extraction language. Our extensive experimental results on diverse real datasets show that our approach delivers stable efficiency and has outperforms baselines in terms of extraction accuracy.

5

Named Entity Recognition by Using Maximum Entropy

Imran Ahmed, Sathyaraj R

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.43-50

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Named Entity Recognition (NER) is responsible for extracting and classifying some designators in the given specified text which can be name, location, organization etc. Since the last decade or so, researchers are greatly involved in this area as far as their interests are concerned. It is important procedure to extract the entities in a specified text based on a language which is termed as Natural Language. This language consists of various entities and the collection of such entities is called entity set. These entity sets are maintained in a uniform database called as gazetteer. In this paper we present a methodology called maximum entropy to retrieve the entity sets from the database. The machine is trained in such a way that it will retrieve the words which has the maximum entropy amongst all and has proved to be fastest method to extract and classify the entity sets from the database. The advantages of proposed method include sequence tagging which means this method has increased the freedom of choosing features to represent observations.

6

The Processing Technology in Mobile Database Transaction System

Hongli Su

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.51-60

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

7

Generating ER Diagrams from Requirement Specifications Based On Natural Language Processing

Eman S. Btoush, Mustafa M. Hammad

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.61-70

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

An Entity Relationship (ER) data model is a high level conceptual model that describes information as entities, attributes, and relationships. Entity relationship modeling designed to facilitate database design. The abstract nature of Entity Relationship diagrams can be discouraging task to both designers and student alike. This paper deals with the problem of extracting ER elements from natural language specifications using Natural Language Processing (NLP). The approach provides the opportunity of using natural language documents as a source of knowledge for generating ER data model. The structural approach is used to parse specification syntactically based a predefined set of on heuristics rules. Extracted words with its Part Of Speech (POS) mapped into entities, attributes and relationships, which are the basic elements of ER diagrams.

8

Large-Scale Data Classification Method Based on Machine Learning Model

Hao Jia

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.71-80

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Classification is to map the data item in the database into a given class. It is an important research direction in data mining. In allusion to the shortcomings of traditional classification methods, such as the decision tree, K nearest neighbor, Bayes , fuzzy logic, genetic algorithms and neural networks and so on, the support vector machine with perfect theory, strong adaptability, global optimization, short training time, good generalization performance is introduced into the classification, a machine learning model based on the SMO algorithm and RBF kernel function of the SVM is proposed to realize a classification method in this paper. This method transforms the nonlinear classification problem into linear classification problem by improving the data dimension. It can better solve the problems of the minimum error in the training set and the larger error in the test set in the traditional algorithm. Application of UCI classification experiment shows that the proposed method takes on the better convergence, faster training speed and higher classification accuracy.

9

Spatial Approximate Keyword Query Processing in Cloud Computing System

Zuping Liu

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.81-94

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

The paper proposes spatial approximate keyword query algorithms for cloud systems. Existing work targets on single server solutions, and an exact algorithm is given in memory while another approximate algorithm is given for disk resident datasets. However, a single server fails to provide reasonable throughput due to the limited CPU time and disk bandwidth. Facing the above challenges, this paper gives a two-layered index consisting of global index and local index, which works in a shared nothing cluster for larger query throughput. This paper designs a novel external memory index as local index, which returns exact answer within disks efficiently. It is equipped with keyword set signature and multiple optimizing strategies to reduce I/O cost. The global index partitions the entire spatial space, and each computing node in system maintains a partition. A global index selection algorithm is given. This paper also provides spatial approximate keyword query algorithms based edit distance, including range and the nearest neighbor spatial conditions. An experiment in a shared nothing cluster illustrates the efficiency and effectiveness of our proposed index and query algorithms.

10

A Comparative Study on Filtered, Vertical and Horizontal Inheritance Mapping in Database

Ala Ahmad Lasasmeh

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.95-106

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

The inheritance is one of the most basic ideas of object technology that can be useful for database. This paper discusses and elaborates one of the most common mapping algorithms about mapping of inheritance structure to relational database. The inheritance mapping in database contains three various approaches: filtered, vertical and horizontal inheritance mapping. This paper review the fundamental measures used to comparative among three inheritance mapping in database through determine the strengths and weaknesses for each on: ease of access to the data, the speed of data access, Ad hoc reporting, Ease of implementation, Coupling, Support for polymorphism to the user and the development of modern commercial applications to protect the time and effort taken advantage of them while working. By tow method Based on the algorithms and rules, Object Relational Mapping (ORM) tool.

11

Content-Based Social Network User Interest Tag Extraction

Mei Yu, Xu Han, Xiaolu Gou, Jian Yu, Fang Lv, Jingyu Li

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.107-118

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

12

MRG-DBSCAN: An Improved DBSCAN Clustering Method Based on Map Reduce and Grid

Li Ma, Lei Gu, Bo Li, Shouyi Qiao, Jin Wang

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.119-128

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

DBSCAN is a density-based clustering algorithm. This algorithm clusters data of high density. The traditional DBSCAN clustering algorithm in finding the core object, will use this object as the center core, extends outwards continuously. At this point, the core objects growing, unprocessed objects are retained in memory, which will occupy a lot of memory and I/O overhead, algorithm efficiency is not high. In order to ensure the high efficiency of DBSCAN clustering algorithm, and reduce its memory footprint. In this paper, the original DBSCAN algorithm was improved, and the G-DBSCAN algorithm is proposed. G-DBSCAN algorithm reduces the number of query object as a starting point. Put the data into the grid, with the center point of the data in the grid to replace all the grid points as the algorithm input. The query object will be drastically reduced, thus improving the efficiency of the algorithm, reduces the memory footprint. In order to make the G-DBSCAN algorithm can adapt to large data processing, we will parallelize the G-DBSCAN algorithm, and combining it with Map Reduce framework. The results prove that G-DBSCAN and MRG-DBSCAN algorithm are feasible and effective.

13

Universities have been trying hard in order to help students learn more effectively. However, students' learning outcomes do not look like increasing accordingly. As we observe how students learn, we presume the main reason is not on lack of professors' teaching neither skills nor students' learning abilities, but rather on lack of students' attitudes to learning. In order to make sure if our assumption holds or not, we have been investigating the students' attitude by taking the approach of analyzing the lecture data and trying to find out useful knowledge. In this paper, we investigate correlations of students' achievements and their learning attitudes by analyzing the usage of words of students which appear in the answer-texts of their looking-back self/class-evaluation questionnaire. Precisely, we classify the words into 4 types based on the students' attitudes to learning, represented by the words. As a result of the study, we found that the students in the middle-achievement group have differences in their word-usages, whereas the high- and low-achievement students rather use ordinary words.

14

Measurement and Analysis of Burst Topic in Microblog

Guozhong Dong, Xin Zou, Wei Wang, Yaxue Hu, Guowei Shen, Korawit Orkphol, Wu Yang

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.145-156

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Microblog provides the first communication platform for burst event due to the immediacy and interactivity of microblog. In this paper, we research on user-oriented and message-oriented measurements of burst topic in Sina microblog. The measurements and analysis on large-scale Sina microblog data set show that our proposed measurement method can measure the characteristics of user and message propagation in burst topic. The measurement results in this paper can describe the formation and diffusion mechanism of burst topic which will contribute to better research of relevant issues on burst topic and ensure the well-developed of microblog.

15

A Novel Combination Forecasting Algorithm Based on Time Series

Lihua Yang, Baolin Li, Xuetao Li, Lvjiang Yin

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.157-170

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

To effectively predict cigarette sales and improve the competitiveness of tobacco business enterprises, the characteristics of actual cigarette sales were detailed analyzed. Due to the long-term growth trends, seasonal fluctuations and the nonlinearity of monthly sales, we established three single forecasting models, which are Exponential Smoothing (ES), Seasonal Decomposition (SD) and Radial Basis Function (RBF) neural network. After obtaining the predicted value of three single models, the combination forecasting model was proposed. The weights of the three single models were computed using Mean Absolute Error and the mean relative error respectively, the result shows that relative error is more effective. A dynamic weight combination forecasting method based on RBF is proposed and compared with fixed weight method. Finally, the prediction accuracy of different models was compared based on the criteria of MAPE and RMSE, and the effectiveness of the combination method was proved, the proposed model can take advantage of the strengths of the three single models, the results indicate that the combination forecasting model suitable for cigarette sales has higher prediction accuracy. In some cases, the prediction accuracy of the fixed weight combination model is better than the dynamic weight combination model. The results can provide a certain reference to cigarette sales forecasting.

16

A Method of Description on the Data Association Based on Granulation Trees

Yan Shuo, Yan Lin

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.171-184

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

To investigate the association of data with other data in reality, the research begins with data sets which are divided into different partitions. Because each partition consists of granules and owns a level, all the partitions constitute a granulation set whose elements are the granules. As a hierarchy system, the granulation set together with the inclusion relation gives rise to a structure called a granulation tree. The research on the data association establishes a method to describe the associations of the data in a granulation tree with the data in another granulation tree. The method involves a necessary and sufficient condition used to check the data associations. Because the necessary and sufficient condition is bound up with the upper approximation, the study also develops a way of investigation into rough sets. As an example, a practical problem is modeled by granulation trees, and the associations of the data in a granulation tree with the data in another granulation tree can be examined by use of the necessary and sufficient condition. Meanwhile, because the study is closely linked to granules and alterations of granularity, the process can be viewed as an approach to research on granular computing

17

Research on Operation Management under the Environment of Cloud Computing Data Center

Wei Bai, Wenli Geng

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.185-192

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

18

Exploiting Historical Diffusion Data to Maximize Information Spread in Social Networks

Donghao Zhou, Wenbao Han, Yongjun Wang

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.193-204

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Information spread maximization is to find a small subset of nodes in social network such that they can maximize the expected spread of information. In this paper, we attempt harnessing historical information cascades data to learn how information propagates in social networks and how to maximize its spread. In particular, we proposed a voting algorithm to learn diffusion probabilities of edges from cascades data. Then a pruning method is developed to remove trivial edges whose weights are smaller than a threshold. Moreover, motivated by the social influence locality, we propose a Local Influence Model to evaluate node's influence within a local area instead of the whole network, which can effectively reduce the computational complexity. Based on Local Influence Model, we use greedy algorithm to find an approximate optimal solution. Experimental results show that our method significantly outperforms state-of-the-art models both in terms of information spread and algorithm runtime.

19

Research on the Improved Shuffled Frog Leaping Algorithm in Cloud Computing Resources

Li Yong-Qiang, PanJin

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.205-214

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

20

FOCCX: An Optimistic Concurrency Control Protocol over XML

Weifeng Shan, Husheng Liao

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.215-222

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

XML concurrency control protocol (CCP) is used to guard the consistence and isolation of transactions in Native XML databases. Experiments show that locking overhead of existing approaches based on locking may be huge, especially in the applications with few or without conflicts. Optimistic concurrency control (OCC) is an alternative to locking. This paper presents a new optimistic approach for concurrency control over XML documents named FOCCX (Forward oriented Optimistic Concurrency Control over XML) facing XPath-based API. FOCCX increases the degree of transaction concurrency. This is achieved by aborting the current transaction when a potential UPDATE-UPDATE conflict taking place as early as possible, and reduces comparison times by checking a small write set against read set of a limited number of concurrent transactions. Experimental results show that our protocol has superior performance to approaches based on Backward Oriented mechanism (BOCC).

21

The Research of Data Mining Classification Algorithm that Based on SJEP

Liang Zhao, Deng-Feng Chen, Sheng-Jun Xu, Jun Lu

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.223-234

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

22

Extracting Entity Relationship Diagram (ERD) from English Sentences

Amani Abdel-Salam Al-Btoush

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.235-244

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Entity Relationship Diagram (ERD) is the first step in database design; it is an important step for the database designers, users, analyst, managers and software engineering. Since English is a universal language, this paper describes a methodology that extracts ERD from English sentences. This methodology is based on a predefined set of a heuristic rules that aims to extract the elements of the ERD, then these rules are mapped into a diagram. A diagram generator automatically converts the rules into the ERD according to the rules of generating. The proposed methodology is explained by examples to show how it can provide a mechanism for quickly and easily way in extracting the ERD.

23

The Research on Measure Method of Association Rules Mining

Gao Yongmei, Bao Fuguang

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.245-258

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Data mining receives much attention from artificial intelligence and databases, and the association rule is one of the most important research fields of data mining. In this paper, the advantages and disadvantages of the specific indicators of objective measure, subjective measure, and association rule based on statistical perspective are discussed. Some indicators of statistical perspective are adopted to measure the association rules, which can effectively solve the problems of association rules. Next, a further verification of the advantage and disadvantages of the indicators is made by the combination of the theory and application, a new measure frame is put forward as well. Then, the dynamic association rules are analyzed through making a comparative analysis in the following four aspects: the traditional association analysis without the life cycle, the association rules with the life cycle, the weighted dynamic association rules and the weighted dynamic association rules weighted by the consumption amount, showing the influence of timeliness on association rules analysis, and thus effectively mining some rules with low support in global period but high support in a certain period.

24

Relational Database’s Transaction Operation and the Concurrent Control

Na Liu, Jianfei Zhou

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.259-266

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

25

Dynamic Cost-Sensitive Fussy Clustering for Uncertain Data Based on the Genetic Algorithm

Yuwen Huang

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.2 2015.04 pp.267-274

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

The existing fussy clustering algorithms for uncertain data don’t consider the dynamic cost and the treatment effect is lower, so this paper proposes the dynamic cost-sensitive fussy clustering approach for uncertain data based on the genetic algorithm (GADCSFA). Firstly, this paper gives the definition of dynamic cost and adjacent interval, and the uncertain attributes are disposed as the interval number. Secondly, we give the method of fuzzy c-means clustering based on the interval data, and the interval numbers of fussy clustering solution and cost space are coded by its centre and radius. At last, the dynamic fussy clustering approach for uncertain data based on the genetic algorithm is structured, which uses the genetic algorithm to search the optimal clustering centre and cost by the hybridization, the mutation and selection. The experiments show that, compared to the other fussy clustering algorithm for uncertain data, GADCSFA has higher classification accuracy and performance, and the total expenditure is lower.

 
페이지 저장