International Journal of Database Theory and Application Vol.7 No.1::보안공학연구지원센터(IJDTA)

1

Yongbo Yu, Lizhou Zheng, Kaifang Yang, Peiquan Jin

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.7 No.1 2014.02 pp.1-10

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

In this paper, we present the ideas and methodologies on labeling the mentioned entities with the wiki dataset. This paper presents a system for the recognition and semantic disambiguation of named entities based on information extracted from a large encyclopedic collection from Wikipedia. We focus on maximizing the similarity between the contextual information extracted from Wikipedia and the context of a document, as well as the similarity among the category tags associated with the candidate entities. Our experimental results show that the proposed methods are effective and efficient to answer complex named entities disambiguation over the Wikipedia dataset.

2

An Intelligent Method for Test Data Generation Based on Optimized Interval Arithmetic

Ying Xing, Yun-Zhan Gong, Ya-Wen Wang, Xu-Zhou Zhang

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.7 No.1 2014.02 pp.11-24

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Path-oriented test data generation is in essence a Constraint Satisfaction Problem solved by search strategies, among which backtracking algorithms are widely used. In this paper, the backtracking algorithm Branch & Bound is introduced to generate path-oriented test data automatically. A model based on state space search is proposed to construct the search tree dynamically. Aiming at the programs containing constraints of strongly related variables even equalities, the static analysis technique interval arithmetic is optimized for the precise judgment of the assignment to each variable. The analysis on conflict is made accurate via distance for further domain reduction, thus ensuring the precise direction of the next search step. Experiments show that the proposed method outperformed other methods used in static test data generation. Specifically, it produces excellent results when variables are strongly related even when they are in equalities, and generation time increases stably and linearly with the increment of number of expressions including both equalities and inequalities.

3

A Case Study of Applying SOM in Market Segmentation of Automobile Insurance Customers

Vahid Golmah

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.7 No.1 2014.02 pp.25-36

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Over the last decade, it has been observed that automobile insurer organizations are being tasked with a new challenge characterized by increased competition, increased requirements of automobile insurance quality and an increasing emphasis on time-to-market. Furthermore, knowledge regarding what customers think, what they want, and how to serve them is quite useful for insurance organization wishing to generate suitable strategies in competitive markets.

4

Enhancing Fault Tolerance based on Hadoop Cluster

Peng Hu, Wei Dai

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.7 No.1 2014.02 pp.37-48

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

5

A MapReduce Implementation of C4.5 Decision Tree Algorithm

Wei Dai, Wei Ji

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.7 No.1 2014.02 pp.49-60

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

6

KNN based Machine Learning Approach for Text and Document Mining

Vishwanath Bijalwan, Vinay Kumar, Pinki Kumari, Jordan Pascual

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.7 No.1 2014.02 pp.61-70

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Text Categorization (TC), also known as Text Classification, is the task of automatically classifying a set of text documents into different categories from a predefined set. If a document belongs to exactly one of the categories, it is a single-label classification task; otherwise, it is a multi-label classification task. TC uses several tools from Information Retrieval (IR) and Machine Learning (ML) and has received much attention in the last years from both researchers in the academia and industry developers. In this paper, we first categorize the documents using KNN based machine learning approach and then return the most relevant documents.

7

Identifying the Fraudulent Financial Information Based on Data Classification Method

Zhang Chen

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.7 No.1 2014.02 pp.71-82

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Finance fraud of companies is an international difficult problem with a long history. The finance fraud problem is concerned by lots of people. Some researchers make a lot of qualitative or quantitative researches and get some valuable conclusions. In this article , we mainly applies empirical research method, combined with normative research method. First of all, this paper reviews the relevant literatures of financial fraud detecting of listed companies, expounds existing research results from the aspects of motives, signs and detecting methods. We appraise these results are ordering to national conditions and characteristics, analyze the definition of financial fraud. We established a new method which is partial least squares (PLS) and support vector regression (SVR) to solve the above problem in finance. The PLS are able to reduce dimension effectively, acquire nonlinear factor matrix, and SVR has many advantages, such as high imitation degree, effective classification and strong robustness. The model which combines PLS and SVR has great recognition effect.

8

Research on XML Data Mining Model Based on Multi-level Technology

Jie Ma

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.7 No.1 2014.02 pp.83-92

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

The era of Web 2.0 has been coming, and more and more Web 2.0 application, such social networks and Wikipedia, have come up. As an industrial standard of the Web 2.0, the XML technique has also attracted more and more researchers. However, how to mine value information from massive XML documents is still in its infancy. In this paper, we study the basic problem of XML data mining-XML data mining model. We design a multi-level XML data mining model, propose a multi-level data mining method, and list some research issues in the implementation of XML data mining systems.

9

Discovering Database Replication Techniques in RDBMS

Anees Hussain, M. N. A. Khan

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.7 No.1 2014.02 pp.93-102

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Data replication is a key factor to achieve scalability and fault tolerance in databases as it maintains several clones of data objects. A change made in data automatically triggers carrying out similar changes in each of the replica. A number of data replication techniques have been proposed in the contemporary literature due to its large scale application in the real world like astronomy, high energy physics and biology. In this study we provide a critical analysis of these techniques.

10

Research on Learning Evidence Improvement for kNN Based Classification Algorithm

Ming Yao

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.7 No.1 2014.02 pp.103-110

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Text classification (TC) is a classic research topic in computer applications. In this paper, we firstly explore the widely used distance metrics (such as Euclidean) in TC problems, and we find that these metrics may not be appropriate for highly skewed dataset like text categorization. Therefore, a novel method of learning evidence from multiple distance metric is proposed. Based on DS theory, the evidences learnt from these distance metric are combined for improving the effectiveness of kNN based text classifier. Because the computed neighbors for the given query pattern may be from heterogeneous neighborhood sources and usually have different influence on predicting the class label. The ensemble of distance metric is tested on three standard benchmark data sets. Finally, we demonstrate the robustness of the proposed approach by a series of experiments.

Earticle

Issues

International Journal of Database Theory and Application