International Journal of Database Theory and Application Vol.6 No.6::보안공학연구지원센터(IJDTA)

1

Implementation of the Fuzzy C-Means Clustering Algorithm in Meteorological Data

Yinghua Lu, Tinghuai Ma, Changhong Yin, Xiaoyu Xie, Wei Tian, ShuiMing Zhong

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.6 No.6 2013.12 pp.1-18

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

An improved fuzzy c-means algorithm is put forward and applied to deal with meteorological data on top of the traditional fuzzy c-means algorithm. The proposed algorithm improves the classical fuzzy c-means algorithm (FCM) by adopting a novel strategy for selecting the initial cluster centers, to solve the problem that the traditional fuzzy c-means (FCM) clustering algorithm has difficulty in selecting the initial cluster centers. Furthermore, this paper introduces the features and the mining process of the open source data mining platform WEKA, while it doesn’t implement the FCM algorithm. Considering this shortcoming of WEKA, we successfully implement the FCM algorithm and the advanced FCM algorithm taking advantage of the basic classes in WEKA. Finally, the experimental clustering results of meteorological data are given, which can exactly prove that our proposed algorithm will generate better clustering results than those of the K-Means algorithm and the traditional FCM algorithm.

2

Mining Pairs-Trading Patterns: A Framework

Ghazi Al-Naymat

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.6 No.6 2013.12 pp.19-28

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Pairs trading is an investment strategy that depends on the price divergence between a pair of stocks. Essentially, this strategy involves choosing a pair of stocks that historically move together, then taking a long-short position if the pair’s prices diverge, and finally reversing the previous position when prices converge. The rationale of the pairs trading is to make a profit and avoid market risk. This review focuses on presenting researchers with the state-of-the-art techniques used in finding pairs trading. In addition, it shows the most important key issues that researchers need to consider while investigating or studying the financial data in finding pairs.

3

Base a EMD-Grey Model for Textile Export Time Series Prediction

Hua quanping, Yang xiaoyi

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.6 No.6 2013.12 pp.29-38

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

4

Research of Least Privilege for Database Administrators

Mou Shen, Mengdong Chen, Min Li, Lianzhong Liu

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.6 No.6 2013.12 pp.39-50

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Traditional database administrator (DBA) privileges are too high, which causes insider security threat problem. To solve this problem, an extended Role Based Access Control (RBAC) rights management model for DBA was brought out in this paper. Combined with the principle of least privilege security, this paper proposes a scheme which contains three management roles separation and dynamic constraints. It solved the problem that system administrator's privileges are too high and avoided the insider threats. Practice proves that this model has versatility, flexibility, and high security.

5

Semi-supervised Sentiment Classification using Ranked Opinion Words

Suke Li, Yanbing Jiang

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.6 No.6 2013.12 pp.51-62

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

This work proposes a semi-supervised sentiment classification method which is based on the co-training framework. The proposed method needs to construct three sentiment classifiers. We use common text features to construct the first classifier. We extract opinion words from consumer reviews, and then we ranked these opinion words according to their importance. We also employ extracted opinion words and the ranked co-occurrence opinion words of the extracted opinion words of each review to get the second sentiment classifier. A third sentiment classifier comes into being using non-opinion text features from each review. Based on co-training semi-supervised learning framework, we use the three sentiment classifiers to iteratively get the final sentiment classifier. Experimental results show that our proposed method has better performance than the Self-learning SVM method and the Naive co-training SVM method.

6

Mining Periodic Workload Patterns in Database Audit Trails

Marcin Zimniak, Janusz R. Getta, Wolfgang Benn

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.6 No.6 2013.12 pp.63-74

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Information about periodic processing of database operations has a pivotal importance for continuous physical database design and automated performance tuning of database systems. This work shows how to detect the oscillations of database workloads caused by the periodical invocations of user applications. In particular, we present an algorithm for discovering periodic patterns in the histories of processing of complex and elementary database operations. In our approach, information collected from the database audit trails is transformed into a sequence of syntax trees and later on it is compressed in a syntax tree table. The periodic patterns are discovered through nested iterations over a four dimensional space of syntax trees and positional parameters of the patterns. Transformations of the patterns are used to discover the overlaping periodic patterns.

7

Research on Multi-Input Complex System based on Phase Reconstruction

Jianming Sun

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.6 No.6 2013.12 pp.75-84

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

This paper uses phase space reconstruction as the basis of the multi – input nonlinear method. It is obvious that for a system with multiple variables, it is necessary to choose the involved variable before reconstruction, and identify the reconstruction parameter, after determining the reconstructed variable, in order to complete the basic reconstruction. Therefore, based on the selection of the nonlinear correlation, this paper introduces the method for choosing the correct input variable at first , and then introduces some commonly used methods to identify the reconstruction parameter, such as mutual information method, auto – correlation method and average displacement method etc.. Furthermore, it specially introduces the C – C method. By carrying out the multivariate combination forecasting simulation for time series of the Lorentz Equation and comparing the reconstruction phase diagram of the multivariate phase space, this paper verifies the accuracy of selecting reconstructed input vector based on nonlinear correlation and the effectiveness of using C- C method to decide the reconstruction parameter.

8

Predicting Age Range of Users over Microblog Dataset

Lizhou Zheng, Kaifan Yang, Yongbo Yu, Peiquan Jin

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.6 No.6 2013.12 pp.85-94

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

In this paper, we present the idea and methodologies on predicting the age span of users over microblog dataset. Given a user’s personal information such as user tags, job, education, self-description, and gender, as well as the content of his/her microblogs, we automatically classify the user’s age into one of four predefined ranges. Particularly, we extract a set of features from the given information about the user, and employ a statistic-based framework to solve this problem. The measurement shows that our proposed method incorporating selected features has an accuracy of around 71% on average over the training dataset.

9

An Algorithm of Association Rules Mining in Large Databases Based on Sampling

Zhi Liu, Tianhong Sun, Guoming Sang

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.6 No.6 2013.12 pp.95-104

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

In recent years, the amount of data into a geometric growth puts forward higher requirements on data mining algorithm. In the process of frequent itemsets of traditional Apriori algorithm produced, frequent itemsets' generation and storage are quite a waste of time and space. In this paper, we put forward a new Hash table and use the technology to improve the algorithm and get SamplingHT algorithm, through a lot of contrast experiments showed that the new algorithm enhances performance when frequent itemset is generated, and effectively reduce the database scan times, In order to achieve more optima.

Earticle

Issues

International Journal of Database Theory and Application