Earticle

현재 위치 Home

International Journal of Database Theory and Application

간행물 정보
  • 자료유형
    학술지
  • 발행기관
    보안공학연구지원센터(IJDTA) [Science & Engineering Research Support Center, Republic of Korea(IJDTA)]
  • pISSN
    2005-4270
  • 간기
    격월간
  • 수록기간
    2008 ~ 2016
  • 주제분류
    공학 > 컴퓨터학
  • 십진분류
    KDC 505 DDC 605
Vol.9 No.5 (27건)
No
1

Research of Decision Tree Classification Algorithm in Data Mining

Qing-yun Dai, Chun-ping Zhang, Hao Wu

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.1-8

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

2

Navigation through Citation Network Based on Content Similarity Using Cosine Similarity Algorithm

Abdul Ahad, Muhammad Fayaz, Abdul Salam Shah

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.9-20

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

The rate of scientific literature has been increased in the past few decades; new topics and information is added in the form of articles, papers, text documents, web logs, and patents. The growth of information at rapid rate caused a tremendous amount of additions in the current and past knowledge, during this process, new topics emerged, some topics split into many other sub-topics, on the other hand, many topics merge to formed single topic. The selection and search of a topic manually in such a huge amount of information have been found as an expensive and workforce-intensive task. For the emerging need of an automatic process to locate, organize, connect, and make associations among these sources the researchers have proposed different techniques that automatically extract components of the information presented in various formats and organize or structure them. The targeted data which is going to be processed for component extraction might be in the form of text, video or audio. The addition of different algorithms has structured information and grouped similar information into clusters and on the basis of their importance, weighted them. The organized, structured and weighted data is then compared with other structures to find similarity with the use of various algorithms. The semantic patterns can be found by employing visualization techniques that show similarity or relation between topics over time or related to a specific event. In this paper, we have proposed a model based on Cosine Similarity Algorithm for citation network which will answer the questions like, how to connect documents with the help of citation and content similarity and how to visualize and navigate through the document.

3

Exact Dominance Querying Algorithm on CP-nets

Guanlin Xin, Jinglei Liu

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.21-36

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

CP-nets (conditional preference networks) is a graphical model for representing qualitative conditional preference statements via ceteris paribus semantics interpretation. Dominance querying, the problem of determining whether an outcome is preferred to another on this graphical model, is of fundamental importance in preference querying. However, to date, the exact dominance querying algorithm with respect to any binary-valued CP-nets has not been given, hence designing an exact dominance querying algorithm is quite necessary. Fortunately, we discover that dominance querying problem is essentially a single source shortest path problem, and the induced graph of CP-nets is a sparse graph, as a result, dominance querying problem can be solved by Johnson algorithm which is suitable for solving the single source shortest path problem on sparse graph. As a byproduct, this algorithm can determine least flip numbers if an outcome dominates another outcome. At last, we present results of experiments that demonstrate the feasibility of our approach to dominance querying on acyclic CP-nets.

4

A Novel Model of Stock Data Mining with M/G/1 Queue for Evaluation of Stock Crash

Qingzhen Xu, Feifei Zhang

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.37-44

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Data mining is the process of searching the information from a large amount of data. In order to evaluate the stock crash this paper proposes general decrementing service M/G/1 queue system with multiple adaptive vacations to find information related to stock crash in data about Shanghai Composite Index. We use the probability generating function (P.G.F.) of stationary queue length and LST of waiting time, and their stochastic decomposition to calculate Existing money flow. Existing Money flow calculation model is improved based on the stationary queue length and LST of waiting time. We program to achieve the stock of existing money flow algorithm, and get the number of existing money flow. The improved algorithm can early warn the stock market crash. The empirical result shows that: There will be a rise in price before the Stock Market Crash, and the stock of existing money inflow begin to decrease. The stock market crash fell for at least six months. The stock market crash fell by at least fifty-five percent. Most of the stock market crash fell by over seventy-percent. The stock market crash down time is inversely proportional to the magnitude of the decline. If the down time is short, the magnitude of the decline is large. If the down time is long, the magnitude of the decline is small. The stock market crash is great harm to investors.

5

Research on Modern Uyghur Common Word Extraction

Azragul, Alim Murat, Li xiao

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.45-54

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

The key techniques and methods for the construction of modern Uyghur language (MUL) corpus are presented. The techniques and methods included MUL corpus, MUL corpus pre-processing, MUL corpus statistics, MUL stemming and MUL data analysis; on the basis of related works we then developed an enhanced modern Uyghur Common Words (UCW)-glossary. We conducted basic inspections upon the words from two perspectives namely the usage frequency and distribution. Upon developing enhanced MUCW glossary we considered the number of word types, word frequency, word length, and the number of texts used as major factors.

6

Scalable Distributed Real-Time Processing for Large-Scale Data Streams

Binlei Cai, Qin Guo, Shiwei Zhu, Jiadong Ren

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.55-64

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

7

With the development of computer and network communication, the economic and social mode of production has had the profound change, led to the global market gradually formed based on knowledge, competition and cooperation. In this paper, by using analytic hierarchy process, the author research on knowledge innovation and human resource management mode based on knowledge innovation Data was collected from the network big data, the result shows that work autonomy is the most important motivation factors of knowledge workers, ratio for 50.51%; while individual growth according for 33.74%. Enterprises need to pay attention to the four important factors of talent incentive, as attractive prospects for the development, personal growth opportunities, good working environment and comprehensive compensation strategy.

8

A Novel Soft Set Approach for Feature Selection

Daoli Yang, Zhi Xiao, Wei Xu, Xianning Wang, Yuchen Pana

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.77-90

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Feature selection is an important preprocess for data mining. The soft set theory is a new mathematical tool to deal with uncertainties. Merging the soft set theory into the feature selection process based on rough sets facilitates the computation with equivalence classes and improves the efficiency. We propose a paired relation soft set model based on equivalence classes of the information system. Then we use it to present the lower approximate set in the form of soft sets and calculate the degree of dependency between relations. Furthermore, we give a new mapping to obtain equivalence classes of indiscernibility relations and propose a feature selection algorithm based on the paired relation soft set model. Compared to the algorithm based NSS, this algorithm shows 18.17% improvement on an average. Meantime, both of the algorithms show a good scalability.

9

A Method for Tracking Flu Trends through Weibo

Yang Li, Changjun Hu

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.91-100

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Real-time monitoring the spread of disease and taking a rapid response is necessary. The traditional public health report is accurate, but requires a lot of manpower and resources. The main drawback is the lag of time. Notifiable infectious diseases report generally lags behind medical diagnosis about 4-5 weeks. In this paper, social network data are used to detect disease and track its rapidly changing trends. We take flu data in Sina weibo as an example and analyze flu related weibos from temporal and spatial dimensions. Compared with the previous work, most studies filter out the non-infection weibo noises directly, however, the noises have close association with flu activities. We are not simply discard these data but use them to capture the public nuanced attitude changes. Flu related weibos are divided into four categories which represent four states of public concern. The four states, gradually upgrading from concern about news to anxiety of illness, help to capture the public nuanced attitude changes toward flu trends. Flu weibos concern distributed map and influenza activities curve are drawn to show the analyze result. Multiple classification systems’ accuracy are investigated. The proposed method twice iterative classification makes the system accuracy up to 89.50%.

10

A Survey Report On Current Research And Development of Data Processing In Web Usage Data Mining

Nandita Agrawal, Anand Jawdekar

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.101-110

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

WUM is the part of web mining that identifies usages data from the web log server ,in order to known and improved serve the requirements of the web applications. The WUM involves three types, Data Preprocessing , Pattern Discovery & Pattern Analysis. pattern discovery phase contains many web data mining methods are apply to process the data so as to discover patterns. Once the pattern discovered. Analysis is done using various operation with unique session and unique user. This paper gives a brief overview of WUM and its phases .

11

Professional Competence Evaluation of Information Management Undergraduates Based on Rough Set and D-S Evidence Theory

Di Li, Hu Wang, Rui Wang, Yazhou Xiong

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.111-120

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Based on the analysis of knowledge structure and factors that influence the professional competence of IMIS undergraduates, this paper selects 7 courses as evaluation indexes. Firstly, it gets 3 indexes of database theory, data oriented programming (C#), Management Information Systems as evidences through the method of information entropy reduction. Secondly, it improves the BPA of subjective experience by using rough set theory and gets an objective BPA. Finally, it uses D-S evidence theory to synthesize the evidence and evaluates the professional competence of IMIS undergraduates.

12

Word Sense Disambiguation Based on Perceptron Model

Zhang Chun-Xiang, Gao Xue-Yao, Lu Zhi-Mao

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.121-128

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Word sense disambiguation (WSD) is an important research topic in natural language processing field, which is very useful for machine translation and information retrieval. In this paper, a linear combination model based on multiple discriminative features is proposed to determine correct sense of an ambiguous word, in which morphology and part of speech in left and right words around ambiguous word are used as features. Then, perceptron algorithm is applied to optimize the WSD model. Experiments show that the WSD performance is improved after the proposed method is applied.

13

Cloud Data Migration Method Based on ABC Algorithm

Geng Yushui, Yuan Jiaheng, Sun Tao

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.141-148

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Cloud storage systems has played an important role in the support of large-scale and high-performance cloud application. For the purposes of the cloud storage system, data migration is the key technology to the elastic load balancing.In this paper, we take data migration issues into the load-balancing scenarios and propose the data migration method based on the ABC algorithm. At the same time, we validated the method.In the method,we compute the load of each storage node through the comprehensive evaluation system.The system contains the following four aspects, including the available CPU,available memory, data access heat and the system response time.The cloud storage system performs the data migration operation according to the data obtained from the comprehensive evaluation system.The results show that this method can satisfy the application needs.At the same time, this method can reflect the integrated load of each node, and it also can achieve the optimal performance of the cloud storage system.

14

With the development of the internet big data, the impact of new technology has led to educational reform. University humanities education is not only to impart knowledge and training skills to students, more importantly is to spread sports philosophy, spirit and sports ethics. Sports humanities education pays attention to the shaping of students' personality and the cultivation of ideological and moral character, which has as a unique educational value and significance. In this paper, we analyze the college sports culture index system, and make students physical fitness test by using network platform system. The result shows that during the implementation of humanistic education in college, qualification rate of students' physical fitness test is maintained at 90%. Therefore, college should pay attention to the university sports humanities education, fully mobilize the enthusiasm and creativity of the students, and promote coordinated development of physical exercise and social adaptation.

15

The author conducts research on the mining problem of implication type data aiming at the characteristics of implication, uncertainty, nonlinearity, dynamic, complexity existing in the process of data mining and establishes an extension mining model of implication type data with multi factors based on extension theory. This model, first of all, carries out implication analysis on the implication type data and builds corresponding implication set; then, the author conducts extension classification on the hypogynous factor in the implication set and builds classical field and segment field of the epigynous factorin the implication set based on extension type divided; the author also respectively builds the correlation function and extension goodness-of-fit model between targeted mining object and classical field of epigynous factorin the implication set, acquires comprehensive extension goodness-of-fit considering the weight of epigynous factors, which, in other words, determines the degree of closeness between targeted mining object and extension type and thus achieves the mining of implication type data. Finally, the author demonstrates the feasibility of this model by explaining and verifying the venture capital case of an enterprise.

16

Topic Discovery Algorithm Based on Mutual Information and Label Clustering under Dynamic Social Networks

Lin Cui, Dechang Pi, Caiyin Wang

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.169-180

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

In recent years, topic detection has become a hot research point of the social network, which can be very good to find the key factors from the massive information and thus discover the topics. The traditional label propagation-based topic discovery algorithm (LPA) is widely concerned because of its approximate linear time complexity and there is no need to define the target function. However, LPA algorithm has the uncertainty and the randomness, which affects the accuracy and the stability of the topic discovery. In this paper, a method for clustering label words based on mutual information analysis is presented to find the current topic. Firstly, through filtering the stop words and extracting keywords with TF-IDF, topic words are been extracted out, and then a common word matrix is built, a topic discovery algorithm based on mutual information and label clustering is put forward. Finally, extensive experiments on two real datasets validate the effectiveness of the proposed MI-LC (Mutual information-Label clustering) algorithm against other well-established methods LPA and LDA in terms of running time, NMI value and perplexity value.

17

Character based ASCII Encryption & Decryption on Cloud System

Richa Sharma, K. K. Parashar, Jitendra Singh Sengar

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.181-186

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

This paper proposed a algorithm for automatic key generation which generate from an integer value to create more secure communication over cloud architecture. Character based ASCII encryption scheme is used to satisfy the current requirement to establish secure communication. In this work we have a byte level security at each level of data transmission. It helps to create batter security to perform encryption & decryption over cloud infrastructure i.e., from this work we have technique through which we generate private key which help to establish secure communication between two or more node while they are communicating over cloud system.

18

Extracting Attributes of Named Entity from Unstructured Text with Deep Belief Network

Bei Zhong, Jin Liu, Yuanda Du, Yunlu Liaozheng, Jiachen Pu

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.187-196

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Entity attribute extraction is a challenging research topic with broad application prospects. Many researchers had proposed rule based or statistic based approaches to deal with the extraction task in a variety of application areas. Recently, deep learning had shown its capacity to model high-level abstractions in data by using multiple processing layers network with complex structures. However there has no research reported to conduct entity attribute extraction with deep learning method. In this paper, we propose a new approach to extract the entities’ attributes from unstructured text corpus that was gathered from Web. The proposed method is an unsupervised machine learning method that extracts the entity attributes utilizing deep belief network (DBN). Experiment results show that, with our method, entity attributes can be extracted accurately and manual intervention can be reduced when compared with tradition methods.

19

Research of Dynamic Access to Cloud Database Based on Improved Pheromone Algorithm

Yonqiang Li, Jin Pan

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.197-202

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

20

Analysis and Review the Data Using Big Data Hadoop

Ankit Jain, Subbulakshmi T.

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.203-212

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Big information is pool of huge and complicated information sets so it becomes tough to method information exploitation management tools. The term ‘Big Data’ illustrates innovative method and knowledge to capture, store, distribute, handle and evaluate petabyte or larger-sized datasets with high-speed and totally different structures. Huge knowledge may be structured, unstructured or semi-structured, leading to incapability of standard knowledge management ways. With the quick evolution of information, information storage and networking assortment capability, massive information area unit quickly growing altogether science and engineering domains. Knowledge is generated from numerous totally different sources and might arrive within the system at numerous rates. So as to method these giant amounts of information in a cheap and economical approach, similarity are employed. Huge knowledge may be knowledge whose scale, diversity, and quality need new design, techniques, algorithms, and analytics to manage it and extract price and hidden information from it. The analysis of huge information typically tough because it often involves assortment of mixed information supported completely different patterns or rules. The challenges embrace capture, storage, search, sharing, analysis, and visualization. The trend to massive information sets is owing to the additional info drawn from analysis of one large set of connected information, compared to separate smaller sets with constant total quantity of information. Massive data processing is that the ability of extracting helpful info from streams of information or datasets, that owing to its rate, variability and volume. This paper argues applications of huge processing model and conjointly massive data processing. Hadoop is that the core platform for structuring huge knowledge, and solves the matter of constructing it helpful for analytics functions. Hadoop is Associate in nursing open supply software system project that permits the distributed process of huge knowledge sets across clusters of goods servers. It’s designed to rescale from one server to thousands of machines, with a awfully high degree of fault tolerance.

21

Domain Knowledge Actively Recommendation System Based by Process-Driven and Rough Set

XinGang Wang, ChengHao Li, Tao Sun

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.213-220

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Recommend knowledge to staff who need in the work can improve work efficiency, knowledge application and innovation. Enterprise knowledge is described in three dimensions: knowledge attribute, process and domain. Based on this, domain knowledge actively recommendation architecture based by process-driven and rough set is construct, while domain knowledge and rough set active recommendation method is proposed. We will use this architecture to analyze employee, domain, and process. Then will using rough set to analyze the rules from logs of using knowledge. Combination with requirement of employees, process data and the rules to achieve recommend the accurate knowledge.

22

Evaluation of Input Output Efficiency in Higher Education Based on Data Envelope Analysis

Li Zhang, Yu Luo

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.221-230

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

The level of development of higher education is a sign of a country's comprehensive national strength and the level of economic and social development. In this paper, the author makes comprehensive evaluation of the resource utilization efficiency in higher education, by using large scale network data, the result shows that the main reason lead to non DEA effective in colleges and universities is caused by the low scale efficiency and the low technical efficiency. Therefore, the university should pay special attention to the effective use of investment resources, gradually build effective performance appraisal system, and make implementation of the organic combination of teaching and scientific research. Also, reduce the cost of education investment, so as to effectively improve resources utilization and running benefits in colleges.

23

Research on Parallel Computing Model and Classification Algorithm Based on Data Mining Process

Qiongshuai Lv, Haifeng Hu

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.231-240

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

In the big data era, with the parallel evolution of computer architecture, computing changes and modifications of industrial application mode resource expansion capability, we need to explore a new parallel computing model, to reflect the properties and large data applications form the current parallel machines, and a variety of mainstream big data processing system for unified theoretical analysis to guide large data applications tuning. Currently, despite the large data programming model study made many achievements, and is widely used in the TB level or even PB-class data processing and analysis, but the corresponding computational model study has just begun. From traditional parallel computing model, research big data programming model and large data computation model, summed up the three basic problems of large data model, in theory, need to be addressed: the three elements of the problem model, scalability and fault tolerance issues and performance optimization. Around these three questions, on the one hand and performance optimization model to calculate the theoretical study of data from a large, on the other hand these performance optimization methods in case of an actual big data.

24

An Improved ID3 Decision Tree Algorithm on Imbalance Datasets Using Strategic Oversampling

L. Surya Prasanthi, R. Kiran Kumar, Kudipudi Srinivas

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.241-250

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Data mining is the process of extracting useful information from the vast and complex databases. In real time scenario the data sources contain many varied data including imbalance data category. Imbalance data sets contain more percentage of instances from one class and are very less percentage of instances from other class. The traditional decision tree algorithm called Iterative Dichotomiser 3 (ID3) is built for not handling the imbalance datasets. To overcome the drawback of ID3 on imbalance datasets, an improved algorithms are needed. In this paper, propose extension of ID3 algorithm called Over Sampled ID3 (OSID3) for imbalance data learning. The proposed OSID3 approach uses the oversampling technique with unique statistical oversample strategy for removing less privileged instances in the early stage and later on oversampling the high privileged instances for approximate data balance. The experimental observation suggests that the proposed approach improves in terms of Accuracy, Area Under Curve (AUC) and Root Mean Square Error (RMSE) with the benchmark ID3 on 15 imbalance datasets from University of California, Irvine (UCI) repository.

25

An Improved Eclat Algorithm for Mining Association Rules Based on Increased Search Strategy

Zhiyong Ma, Juncheng Yang, Taixia Zhang, Fan Liu

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.251-266

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Although Eclat algorithm is an efficient algorithm for mining association rules, there are some disadvantages which limit the efficient of Eclat. In this paper, we proposed an improved Eclat algorithm called Eclat_growth which is based on the increased search strategy. There are three main steps in the Eclat_growth algorithm. First, it scans the database and stores it into a table using vertical data format. Then, it builds an increased two-dimensional pattern tree and the TID_sets of itemsets in the vertical data format table are added into the pattern tree row by row. New frequent itemsets are generated by combining the new added item data with the existing frequent itemsets in the pattern tree. Finally, all frequent itemsets can be found by picking up all nodes of the pattern tree. In the process of generating new frequent itemsets, the prior knowledge is used to fully clip the candidate itemsets. In the process of generating an intersection of two itemsets and calculating the support degree, we proposed a new method called BSRI (Boolean array setting and retrieval by indexes of transactions) to reduce the run time. By comparing Eclat_growth with Eclat, Eclat-diffsets, Eclat-opt and hEclat, it is indicated that Eclat_growth has the highest performance in mining associating rules from various databases.

26

Data Mining Technology Based on Bayesian Network Structure Applied in Learning

Chunhua Wang, Dong Han

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.267-274

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

27

Filtered Clustering Based on Local Outlier Factor in Data Mining

Vishal Bhatt, Mradul Dhakar, Brijesh Kumar Chaurasia

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.5 2016.05 pp.275-282

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

In this paper, the impact of k-means and local outliner factor on data set is studied. Outlier is the observation which is different from or inconsistent with the rest of the data. However, the main challenges of outlier detection are increasing complexity due to variety of datasets and size of dataset. To evaluate the outlierness and catch similar outliers as a group are also issues of this technique. The concept of LOF(Local Outlier Factor) is presented in this work. The paper describes comparative study of five different methodologies using K-means as the base algorithm along with the various distances method used in finding the dissimilarities between the objects hence to analyze the effects of the outliers on the cluster analysis of dataset in data mining.

 
페이지 저장