Earticle

현재 위치 Home

International Journal of Database Theory and Application

간행물 정보
  • 자료유형
    학술지
  • 발행기관
    보안공학연구지원센터(IJDTA) [Science & Engineering Research Support Center, Republic of Korea(IJDTA)]
  • pISSN
    2005-4270
  • 간기
    격월간
  • 수록기간
    2008 ~ 2016
  • 주제분류
    공학 > 컴퓨터학
  • 십진분류
    KDC 505 DDC 605
Vol.8 No.3 (33건)
No
1

Study of Multi-attribute Comprehensive Evaluation Method Based on Attribute Theory

Xu Guanglin, Liu Nianzu, Duan Xueyan

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.1-14

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

The study of Comprehensive Evaluation Model of Attribute Theory has gained fruitful achievements in both theory and practice. But the current preference curve of evaluator is assumed to be a smooth quadratic equation which fails to well reflect the change of evaluator’s preference when indicators increase. After analyzing the principle of comprehensive evaluation model, this paper does simulation experiments to prove the rationality of the preference curve. By increasing the interpolation points and comparing the evaluator’s preference curves respectively generated by the polynomial interpolation and cubic spline interpolation, a conclusion is reached that cubic spline interpolation is better than the polynomial interpolation, and 4 full-score hyperplanes are to be adopted to get the most rational curve to reflect the change of evaluator’s preference. The main contributions of this paper are the analysis of the rationality of different preference curves under comprehensive evaluation model based on the attribute theory and the finding that the most reasonable curve depend on the selection of different hyperplane S2.

2

Extracting Entity Relationship Diagram (ERD) From Relational Database Schema

Hala Khaled Al-Masree

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.15-26

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Database Reverse Engineering (DBRE) is an operation used to extract requirements from any system. The operation is implemented to facilitate the understanding of the system that has a little documentation about design and architecture. DBRE is a very important process used when database designers would like to expand the system or transition to the latest technology in DBRE fields. In the relational database model DBRE try to extract Entity Relationship Diagram (ERD) from relational database schema. Database designers find content of the data for a lot of attributes are not related with their names. In this paper, proposed methodology used to extract ERD from relational database schema with the attributes related with their names, both types of entities regular and weak entity, relationships and keys, which are found in the table that has extracted the relational database schema. The basic inputs of this approach are relational database schema that generated from database. The relational database schema used to extract the information about ERD. After that, obtain information that contain keywords help database designers to extract entities and their attributes semantics related with their names from relational database schemas. Then, Determine primary keys, foreign keys and constraints of the database system. In the final step, the ERD is successfully extracted

3

Improving Query Expansion for Information Retrieval Using Wikipedia

Lixin Gan, Huan Hong

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.27-40

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Query expansion (QE) is one of the key technologies to improve retrieval efficiency. Many studies on query expansion with relationships from single local corpus suffer from two problems resulting in low retrieval performance: term relationships are limited and unlisted query terms have no expansion terms. To address these problems, relationships between terms captured from Wikipedia are superimposed to the basic Markov network that pre-built using single local corpus. A new larger Markov network is formed with more and richer relationship for each term. Evaluation is performed on three standard information retrieval corpuses including ADI, CISI and CACM.Experimental results show that the proposed technique of superimposed Markov network is effective to select more and confident candidatesfor query expansion and it outperforms other state-of-the-art QE methods.

4

A Three-dimension Huge Data Extraction Algorithm for Visualization Based on Computational Meshes

Yuan Ye, Tian Zhongxu

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.41-48

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

In order to accelerate the visualization processes of huge engineering data based on computational meshes, the extraction of huge data are necessary. A data extraction algorithm of engineering visualization data based on meshes is presented. Based on ordinary four-node tetrahedron elements, ten -node tetrahedron elements, eight-node hexahedron elements, the algorithms are studied to extract and simplify data for huge engineering data visualization, which includes relationships judgment between objective points and elements and interpolation algorithm in elements. Experiments of huge data transportation are given here, which show that the algorithm is reliable, and the algorithm can be used to extract visualization data from huge data of engineering computational results.

5

Analytical Approach for Security of Sensitive Business Database

Anusha Gupta, Sanjay Kumar Dubey

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.49-56

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Sensitive database security is an integral part to meet the company's abeyance. Securing the sensitive data in mixed database environment has increased over the past few years. This paper conducts the literature review about providing solutions to secure all databases. This will help the users to know what all things are required to be protected when they are planning to protect the database. Sometimes in some of the business it requires highest level of security even if the performance is being compromised. So, this paper will help user to choose appropriate security mechanism to for the sensitive database according to their business requirement. A brief view of securing the network, server and operating system is also provided in the present paper. Aim of this research is to provide the security to sensitive data at all the three levels physical security, network security, information security from unauthorized user.

6

An Efficient Parallel Top-k Similarity Join for Massive Multidimensional Data Using Spark

Dehua Chen, Changgan Shen, Jieying Feng, Jiajin Le

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.57-68

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Top-k similarity join has been used in a wide range of applications that require calculating the most top-k similar pairs of data records in a given database. However, the time performance will be a challenging problem, as an increasing trend of applications that need to process massive data. Obviously, finding the top-k pairs in such vast amounts of data with traditional methods is awkward. In this paper, we propose the RDD-based algorithm to perform the top-k similarity join for massive multidimensional data over a large cluster built with commodity machines using Spark. The RDD- based algorithm consists of four steps, which loads a set of multidimensional records stored in HDFS and finally output an ordered list of top-k closest pairs into HDFS. Firstly, we develop an efficient distance function based on LSH(Locality Sensitive Hashing) to improve the efficiency in pairwise similarity comparison. Secondly, to minimize the amount of data during the RDD running- time, we split conceptually all pairs of LSH signatures into partitions. Moreover, we exploit a serial computation strategy to calculate all top-k closest pairs in parallel. Finally, all the local top-k pairs sorted by their Hamming distances will contribute to the global top-k pairs. In this paper, the performance evaluation between Spark and Hadoop confirms the effectiveness and scalability of our RDD-based algorithm.

7

Cloud-Enabled Data Center Organization using K-D Tree

Sandip Roy, Rajesh Bose, Tanaya Roy, Debabrata Sarddar

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.69-76

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

An efficient load balancing algorithm in the field of cloud computing is absolutely essential in order to have a cloud network with graceful performance satisfying user expectation. Inspite of the existence of handful lightly loaded data centers, numerous heavily overloaded data centers may lead to a performance degradation of overall cloud network. Proper workload distribution may improve the overall performance of the cloud system. Now a day’s eminent cloud division rules are highly demanding algorithm for distributing workloads among various cloud server nodes deployed in cloud-enabled data centers scattered over the geographical regions. For the researcher, cloud division rule and optimal cloud server node searching are the most demanding jobs in load balancing leading towards more efficient cloud network and improve users’ satisfaction. This paper presents an expeditious cloud division rule based on geographical location of the cloud-enabled data centers distributed over earth surface and builds a two-dimensional space partition k-d tree to partition them in order to search intended cloud server node efficiently. Our proposed organization scheme can be utilized for active monitoring load balancing algorithm to improve the resource utilization for high performance in present cloud computing environment.

8

Performance Comparison Between Hama and Hadoop

Shuo Li, Baomin Xu

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.77-84

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Massive scientific computations such as matrix, graph and network algorithms are very attractive when they come to modelling real-world data. Apache Hama is a pure BSP (Bulk Synchronous Parallel) distributed computing framework for massive scientific computations. In this paper, our experiments were conducted on a 4-node Hadoop cluster. We implement Monte Carlo algorithm of Pi in Hama and Hadoop under the same software and hardware environment. The experimental results show that Hama can achieve much higher performance than Hadoop in our testbed.

9

Design and Implementation of the Algorithm of QoS Virtual Queue

Jian Zhang

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.85-100

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

QoS is very important in current internet and NIT(next generation internet), with the development of application, the QoS function is also more and more complex. At present, the application of multi-core processor is becoming more and more popular, because the performance of multi-core is stronger, the QoS function is also implemented by software. This paper describes the background of QoS, discusses the traditional algorithm implementation with software, and describes the shortcomings of the traditional implementation, then presents and analyses the ideas of improvement, at last, proposes a new improved algorithm to solve the problems encountered. This algorithm is fully verified in network controller.

10

Implementation of Star Schemas from ER Model

Gunjan Chandwani, Veepu Uppal

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.111-130

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

A Star Schema is representation of a data warehouse which is used in strategic decision making and analysis. In this paper we present a method ER Diagram is converted into a star schema.

11

A Dependence Stability Bound based on the VC Dimension for Relational Classification

Xing Wang, Hui He, Bin-Xing Fang, Hong-Li Zhang

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.131-144

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Relational classification (RC) is concerned with the application of statistical learning to relational data. RC models do not have improved stability to smooth the perturbations generated by variations in the correlation between the relational data. Therefore, few studies have attempted to derive a bound and develop a stability learning framework for RC models. To solve this problem, we derive a learning bound with a new measure dependence stability and a limited Vapnik–Chervonenkis (VC) dimension. Based on the learning bound, we then design a stable learning framework that serves as a guideline for the development of new learning algorithms for a broad class of RC models. Applying a Markov logic network on synthesized and real-world datasets, our experimental results demonstrate that our bound can be tight if the RC model has appropriate dependence stability and limited VC dimension and our learning framework increases the stability of RC models while reducing the deviation between empirical risk and true risk.

12

A Hybrid Approach for Encrypting Data on Cloud to prevent DoS Attacks

Navdeep Singh, Pankaj Deep Kaur

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.145-154

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Any technology cannot be said perfect until it is free from any vulnerability. So whenever a new technology is introduced the security is the first feature that is countable. There are many famous technologies that are used for online data storage, accessing the data from any location and provide the online use of any software. Cloud computing is the same technology that provides the online data storage and the most important feature that it provides software on lease facility. If large data storage feature took into consideration then it must be said that if a user wants to store the data on cloud then the security of that data must be the first requirement of the user. In this paper an integrated approach is introduced to encrypt and decrypt the data before sending on cloud by using the two different techniques. And the performance analysis is done on the basis of different parameters to achieve the better performance and security.

13

A Metadata-based Method for Sharing Multiply Heterogeneous Information

Xiaotao Li, Xiaohui Hu, Weina Lu, Xi Liu

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.155-166

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

As users’ requirements for information integration enhance increasingly, how to integrate multiply heterogeneous data in a global sharing system has especially been a challenge for its large scale and diverse formats. To address the above problem, this paper proposes an information sharing approach for multiply heterogeneous data based on a two-layer metadata. Firstly, the architecture of the two-layer metadata is introduced. Secondly, the synchronization between different users for distributed heterogeneous data is realized by sharing table structures. Finally, Lucene search engine combined with the element GM-description of the two-layer metadata is presented to retrieve metadata, which reduces the response time compared to other retrieval methods. The experiment results illustrate the effectiveness of our approach and the conclusion is given.

14

Density-Based Heterogeneous Data Stream Clustering Algorithm with Mixed Distance Measure Methods

Chen Jin-yin, He Hui-hao

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.167-178

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Heterogeneous data stream clustering is an important issue in data stream mining, for the accuracy of the existing heterogeneous clustering algorithm is not high, and don’t have a common distance measure method, a heterogeneous data stream clustering algorithm based on the density with mixed distance measure method is proposed. HDSDen algorithm adopts an online/offline two-stage processing framework. According to the situation of dominant property, the online stage use corresponding distance measure method to define the core points among the arriving points, the purpose of the different distance calculation method is to reduce the influence of the non-dominant property on the whole clustering accuracy. All the density-reachable points form a cluster in the offline stage, and put all the not-clustered points into the reservoir, and the number of the reservoir exceeds the threshold value, we will re-cluster the points to improve the accuracy of clustering. Experiments on real data sets show that the algorithm can achieve better clustering results, and give the clustering results at any time, which can deal with the heterogeneous data stream efficiently.

15

A Study on Software Metrics based Software Defect Prediction using Data Mining and Machine Learning Techniques

Manjula.C.M. Prasad, Lilly Florence, Arti Arya

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.179-190

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Software quality is a field of study and practice that describes the desirable attributes of software products. The performance must be perfect without any defects.Software quality metrics are a subset of software metrics that focus on the quality aspects of the product, process, and project.The software defectprediction model helps in early detection of defects and contributes to their efficient removal and producing a quality software system based on several metrics. The main objective of paper is to help developers identify defects based on existing software metrics using data mining techniques and thereby improve the software quality.In this paper, variousclassification techniquesare revisitedwhich are employed for software defect prediction using software metrics in the literature.

16

Tailoring Fuzzy C-Means Clustering Algorithm for Big Data Using Random Sampling and Particle Swarm Optimization

Yang Xianfeng, Liu Pengfei

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.191-202

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

17

An Efficient Distributed Data Management Method based key Columns Partition Preprocessing

Xu Tao, Zhang Wei, Li Baolu

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.203-214

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

With the development of mobile internet and social network, the scale of structured data have been increasing to PB level and above rapidly, while the query performance is greatly reduce. The efficiency of query optimization on large-scale datasets is currently a research focus in both academia and industry. In this paper, we present a distributed data management method, designed to improve query performance, called KCSQ. KCSQ analyses historical SQL commands, deduces statistics using frequency and the coupling degree of tables and table columns, and confirms the key column based on statistical evidence. When importing new tables into the HDFS, the data are divided into different blocks according to their key column. Any query on these columns can reduce the amount of data to be queried and the number of working nodes and thus effectively improves the throughput rate of the system.

18

Unified Modeling Language and Enhanced Entity Relationship : An Empirical Study

Manal Mahmoud Alkoshman

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.215-227

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

A perfect design of the diagrammatic notations is necessary to communicate between the designer and the user at the requirements analysis stage. The Unified Modeling Language (UML) and Enhanced Entity Relationship (EER) are typically used for designing large systems and applications. On paper, diagrammatic notations can be used to develop or perform maintenance on the application. Any notations used, must represent business rules accurately. At the same time, the notations must be understandable to manager, user and programmer. ER and EER diagrams have been taught in colleges and universities for many years. In recent years, the Unified Modeling Language has appeared for the representation of relational databases. There are many articles in the literature discussing the efficacy of using the UML in modeling relational database systems. The choice between diagrammatic notations does not show any thoughtfulness for an understanding of the human reader. This study focuses on a comparison between the EER and the UML Class Diagram, the cause of the common notations and acceptance among system analysts and programmers. It was proven through an experiment on the sample of students participating in the experiment. The experiment takes the opinion of information technology students to determine which the best diagrammatic notation for them to use. These students will be programmers and system analysts in the future. The results identified through experiment are the favorite for graphic students, and the result can show any relational model closer to the user and manger.

19

Document Similarity Search Algorithm Based On Hierarchy Model

Zhu Ge

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.227-234

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Searching for similar documents from huge amounts of documents is an important and time consuming problem. Although the numerous precise models have been developed for the task, the traditional search algorithms are unable to meet the needs of users for quick search. Herein, a new document similarity calculation and search method with high efficiency is proposed. The calculation of the similarity is based on the total probability model and the efficient search is achieved via level n nodes and paths of citation graph. A special approach from the branch and bound limits the search scope and provide decision algorithm. With the increase in the number of documents, the efficiency of the proposed algorithm is dramatically promoted.

20

SVQL: A SQL Extended Query Language for Video Databases

Chenglang Lu, Mingyong Liu, Zongda Wu

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.235-248

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

With the rapid increasing of video data, video queries are becoming increasingly important. To better describe users’ video query requirements, developing a functional video query language has become a promising and interesting task. In this paper, we present a novel query language called SVQL for video databases, which is developed based on an extension of the traditional database query language SQL. In SVQL, we remain the clear and concise grammatical framework of SQL, thereby making SVQL easy to learn and use for traditional users, i.e., making SVQL with a user-friendly interface. Moreover, we extend the WHERE clause of SQL to introduce new conditional expressions, such as variable declaration, structure specification, feature specification and spatial-temporal specification, thereby, making SVQL with powerful expressiveness. In this paper, we first present the formal definitions of SVQL and illustrate its basic query capabilities using examples. Then, we discuss the SVQL query processing techniques. Finally, we evaluate SVQL through the comparison with other existing video query languages, and the evaluation results demonstrate the practicality and effectiveness of our proposed query language for video databases

21

μE – Automation Framework

Mohammad Almseidin, Khaled Alrfou’

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.249-258

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

The Entity-Relationship and relational database schema are the most popular methodology for designing database. Several tools have been developed to support drawing an ER - diagram or drawing Relational Database Schema (RDB). This paper aims to develop framework called μE that can help database designers for mapping ER-diagram to Relational database schema and vice versa. μE is an automation framework for mapping. μE framework improves reverse engineering techniques and useful with legacy system. Architecture and implementation of μE framework are present in this paper.

22

In order to enrich and improve the collection of library digital information resources, this paper puts forward an optimization method of library digital resources based on semantic information retrieval. The method collects related information automatically from the Internet using semantic information retrieval, and selects the relevance value meeting the preset threshold of the network information to expand and update library digital resources. The experiment result shows that these methods gets a good expectant performance and dramatically optimize the library digital resources and improve the efficiency of resource retrieval and utilization.

23

Resource Self-adaptive Allocation Method Based on Mixed Prediction Cloud Platform

Hong Qi, Honge Ren, Guanglei Zhang

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.269-278

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

There are some problems in the existing cloud platform resource allocation methods, such as low rates of resource utilization and the lack of accurate prediction of the trend changes of resources, etc. To solve these problems, MPRA(Mixed Prediction Based Resource Allocation)was proposed. According to the periodic and non-periodic characteristics of service resources demand, MPRA first adopts FFT(Fast Fourier Transform) to judge the periodic characteristics. For resource allocation without periodic characteristics, it uses Markov process to predict, and obtains the higher resource utilization and prediction accuracy, thus, to ensure the user experience. The experimental results show that MPRA can accurately predict the change trend of service resource requirements, and then can allocate the virtual machine resources self-adaptively according to the prediction results. Obviously, it has improved the virtual machine resources utilization, reduced the occupation number of physical machines and effectively reduced the violation times in SLA (Service-level Agreement).

24

An Efficient Algorithm for Approximate Frequent Intemset Mining

Veepu Uppal

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.279-288

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Frequent itemset mining is a focused theme in data mining research and an important step in the analysis of data arising in a broad range of applications. The traditional exact model for frequent itemset requires that every item occur in each supporting transaction. However, real application data is usually subject to random noise. The reasons for noise are human error and measurement error. These reasons pose new challenges for the efficient discovery of frequent Itemset from the noisy data. Approximate frequent itemset mining is the discovery of itemset that are present not exactly but approximately in transactions.Most known approximate frequent Itemset mining algorithms work by explicitly stating the insertion penalty value and weight threshold. This paper presents a new method for generating insertion penalty value and weight threshold using support count of an item.

25

A Novel Approach for Microblog Message Ranking Based on Trust Model and Content Similarity

Bei Li, Yanjie Liu

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.289-296

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

With the development of social network such as microblog, the number of microblog users increases rapidly. The problem of information overload caused by a large amount of data generated by users is becoming more and more serious. In order to mine the messages which specific users are interested in, we measure social relationship and interactive relationship of users respectively in this paper and propose the trust model based on the user’s direct trust and indirect trust. By means of the trust model, we select the specific user’s candidate user set from a large number of users. We measure the content similarity of messages in the candidate user set and propose a message ranking approach based on user trust model and content similarity. We analyze and compare the ranking results with users’ real behavior in microblog platform. The experiment results show that the approach can accurately rank the microblog messages which the specific users are interested in.

26

A Comparative Investigation on Implementation of RESTful versus SOAP based Web Services

Abhijit Bora, Tulshi Bezboruah

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.297-312

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

Investigations on web service performance metric based on RESTful architecture against conventional SOAP based architecture has importance in perspective of developers as well as for end users. As such we have developed and hosted two web services, one based on SOAP and the other based on RESTful architecture. Both the services are based on JAVA technology implemented with apache tomcat web server and MySQL as backend database server. A comparative evaluation of both the web services is carried out to study its scalability, efficiency and feasibility. Load and stress testing tool Mercury Load Runner is used to deploy both the services for testing the architecture. The statistical analysis on recorded performance metrics is carried out to study the effectiveness of the services. This paper presents in details the comparative analysis of the experimental results on performance aspects of the services.

27

Web Data Extraction Based on Ensemble Learning

Yongquan Dong, Qiang Chu, Ping Ling

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.311-322

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

With the rapid development of Internet technology, Web has become a huge information source with massive amounts of data. But these data are usually embedded in the semi-structured pages. In order to use these data effectively, the primary problem is to extract the data and store them in structured form. Most of current approaches use a single classifier to extract web data, but relying on a single classifier is not sufficient and different classifier has different performance for the same problem. In this paper, we use the method of ensemble learning for web data extraction. Firstly, we parse the page as a Dom tree, identify the main data regions, and construct feature sets of text nodes in the region. Secondly, we choose multiple kinds of base classifiers (SVM, KNN and Random Forest) to build classification models and then use the linear method to integrate results of each classification model. Finally, we combine integration results with heuristic rules to get the final extraction results. The experiment results show that our approach outperforms the baseline approaches and has a good robustness.

28

The Study on the Impact of Data Storage from Accounting Information Processing Procedure

Hengchang Jing

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.323-332

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

After the accounting informatization, the original vouchers, ledgers and statements have been translated into varieties of data to be stored in computers; however, the storage strategies of these data will be affected by the accounting information processing procedure, such as temporary storage, translation storage and back-up of the documents data, the generation and output of ledger data and the interfaces of other data transmission system. We should not only take the influences of relationship standardization principle into consideration, but also the influences of accounting information process design. This paper studies the impact of data storage from accounting information processing procedure and points out the problems during informatization.

29

Enhanced Extraction Clinical Data Technique to Improve Data Quality in Clinical Data Warehouse

AbubakerElrazi O. Mohammed, Samani A. Talab

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.333-342

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

ETL process represents a major part in the process of clinical data warehouse development, where the efficiency of DWH is mainly depending on ETL component and its architecture. In medical field there are a huge clinical data stored in several medical operational systems during receiving medical services. However, extracting of these data are complex, time consuming, and labor intensive task to ensure high data quality before all kinds of data analyses. Moreover, integration of clinical data from various sources is challenges; where these data have been integrate from heterogeneous data sources from multiple health institutions with incompatible structures. Furthermore, heterogeneous clinical data are stored dispersed and isolated from one another. Thus, these clinical data need to be extracted and integrated into the clinical data warehouse through a robust extraction technique. This paper introduces an enhanced ETL technique, which integrate clinical data form heterogeneous data source into staging area.

30

Cluster Analysis of E-Commerce Sites with Data Mining Approach

Yongyi Cheng, Yumian Yang, Jianhua Jiang, GaoChao Xu

보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.8 No.3 2015.06 pp.343-354

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

With the rapid development of E-Commerce, how to evaluate the E-Commerce sites accurately has become an important issue. However, to cluster E-Commerce sites correctly and accurately is not an easy thing based on characteristics of high dimensions and uneven density for E-Commerce sites. This leads to bad performance of the cluster result. To analyze 100 E-Commerce demonstration enterprises in 2013-2014 named by the Ministry of Commerce People’s Republic of China, this paper adopts a data mining approach of DBSCAN method. In the data preprocessing phase, it adopts factor analysis to reduce dimensionality. In the cluster phase, this paper implements an improved DBSCAN algorithm to process the uneven density data. Finally, this paper gives suggestions to these 100 E-Commerce enterprises based on experiment results.

 
1 2
페이지 저장