2016 (314)
2015 (176)
2014 (107)
2013 (62)
2012 (33)
2011 (24)
2010 (19)
2009 (18)
2008 (7)
The Big Data Applications in Film Industry Chain
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.1-8
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
Nowadays, the audiences' consumption attitudes, consumption patterns and consumer groups are all in the great changes, thus it is necessary to improve the film’s revenue by excellent script selecting, accurate market positioning, effective product marketing, and accurate forecasting of the box office. This paper introduced the application and benefit of big data in the film industry chain in terms of film making and investing, film publicity and distribution, film broadcasting and film audience, pointed out many challenges that big data encountered in China’s film industry and finally provided useful suggestions for the practitioners in the film industry of all aspects.
Applying Z-Curve Technique to Compute Skyline Set in Multi Criteria Decision Making System
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.9-22
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
The skyline queries are the best tools to be used in distributed multi criteria decision making of web based applications for user commendations. However, as the Data dimensions are increasing size of dominance set and skyline set is also increasing. Increasing dimensionality becomes the major problem with real word databases. In skyline computation major cost depends on finding dominance tests between high dimensional objects and the order in which they are accessing. Space filling Z-curve is the best suitable way to address the challenges in skyline computation. In this proposed work, we incorporated Z-curve with optimized skyline boundary detection algorithm to effective access and early pruning. In this paper efficient hybrid index structure was proposed which takes the advantage of sorting and partition approaches to improve the storage and search efficiency. Experimental results show that our propose approach is better than the previous static skyline computation techniques in terms of searching and finding skyline set.
Predicting Non Performing Loan of Business Bank with Data Mining Techniques
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.23-34
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
The non-performing loans (NPL) prediction plays an important role in business bank. However, there is still a large gap between the requirement of prediction performance and current techniques. In this paper data mining approaches is used to predict the NPL. Both macroeconomic and bank-specific variables are collected to form the feature set firstly. Based on selected features, the study firstly applies single basic classifiers such as decision tree, k nearest neighbors and support vector machine (SVM) to model the problem of NPL. Bagging and AdaBoost are described in this paper as two different method of multiple classifier fusion, to build prediction models. In this experiment, non-performing loans data with 96 features and 10415 instances of a business bank is collected. F-mean and The Area under the ROC Curve (AUC) are considered as metrics of classification. The results illustrate that multiple classifier fusion algorithms outperform single basic classifier. The model built by multiple classifiers fusion can produce better prediction results. Furthermore, the AdaBoost method performs much better than bagging method in processing NPL.
A Method of Plagiarism Source Retrieval and Text Alignment Based on Relevance Ranking Model
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.35-44
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
The problem of text plagiarism has increased because of the digital resources available on the World Wide Web. Source Retrieval and Text Alignment are two core tasks of plagiarism detection. A plagiarism source retrieval and text alignment system based on relevance ranking model is described in this paper. Not only the source retrieval task but also the text alignment task is all regarded as a process of information retrieval, and the relevance ranking is used to search the plagiarism sources and obtain the candidate plagiarism seeds. For source retrieval, BM25 model is used, while for text alignment, Vector Space Model is exploited. Furthermore, a plagiarism detection system named HawkEyes is developed based on the proposed methods and some demonstrations of HawkEyes are given.
Internet Traffic Classification Using Machine Learning
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.45-54
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
Internet traffic classification is one of the popular research interest area because of its benefits for many applications like intrusion detection system, congestion avoidance, traffic prediction etc. Internet traffic is classified on the basis of statistical features because port and payload based techniques have their limitations. For statistics based techniques machine learning is used. The statistical feature set is large. Hence, it is a challenge to reduce the large feature set to an optimal feature set. This will reduce the time complexity of the machine learning algorithm. This paper tries to obtain an optimal feature set by using a hybrid approach -An unsupervised clustering algorithm (K-Means) with a supervised feature selection algorithm (Best Feature Selection).
Research on Apriori Algorithm Based on Mapreduce Model
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.55-66
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
With manufacturing technology developing persistently, hardware manufacturing cost becomes lower and lower. More and more computers equipped with multiple CPUs and enormous data disk emerge. Existing programming modes make people unable to make effective use of growing computational resources. Hence cloud computing appears. With the utilization of Map Reduce parallelized model, existing computing and storage capabilities are effectively integrated and powerful distributed computing ability is provided. Firstly, transform Apriori algorithm to Map Reduce model; realize Apriori parallel transformation; then use the way of compressing original transaction sets to improve the performance of Apriori algorithm in Hadoop framework; lastly, Map Reduce-Apriori algorithm is realized which is highly scalable for running in cloud computing environment.
Big Data Acquisition and Analysis Platform for Intermodal Transport
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.67-78
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
This paper aims for the transparency and visualization of the international intermodal cargo transportation throughout its whole process, achieving a comprehensive monitoring on the multiple transportation means such as by ocean, by air, by land or by rail. Based on Internet-of-Things-based distributed data acquisition technology and the cloud-computing-based big data analysis technology, this paper gives out a Multimodal Monitoring technology that can uniformly solve the comprehensive management of multiple transportation vehicles, which includes a service functionality model, a network hierarchy model and a technology system model. By building a Generic Target Monitoring System, it proves the multimodal monitoring resolution is able to effectively monitor the multiple transportation means and provide a fair good database platform of later analysis, distribution and optimization of those vehicles.
Green Mining Algorithm for Big Data Based on Random Matrix
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.79-88
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
Due to big data with related multi-dimensional characteristics, the effective means how to build processing mechanisms and algorithms are still problems; so that the algorithms on big data processing huge resources and time cost of computing, resulting in wasting of energy; for this problem the present study proposes a large data processing algorithm of random matrix theory application, can effectively improve the processing efficiency, thereby increasing the utilization of energy. Results show that the proposed algorithm can effectively reduce the amount of calculation, thus saving and calculating the required energy.
The Design of the Multi-Scale Data Fusion Algorithm Based on Time Series Analysis
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.89-100
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
Time series is an indicator at different times on different values, arranged in chronological sequence. The basic idea of the multi-scale analysis by orthogonal transformation, and it is such as wavelet transform signal decomposition analysis on different scales. The timing analysis method is achieved through the model method. The process parameters of the dynamic data time-domain analysis method is a parametric model to fit the observed data, and then use this model to analyze the observational data and produce data system. The paper presents the design of the multi-scale data fusion algorithm based on time series analysis. Finally, the advantages of the new algorithm are elaborated from the estimation accuracy and simulation demonstrated the effectiveness of the new algorithm.
Trust Evaluation on Social Media based on Different Similarity Metrics
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.101-110
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
With advancement in internet era, the importance of social media is increasing day by day. It enables users to share their profile data, ideas, videos and any content they have with them. With benefits, it also has several issues related to it. One of the issue is “how to protect users from after effect of friendship over social media?”. This paper proposes a trust model to overcome it. The proposed model calculates trust to assist end users to take decision about accepting friend-request on social media. Trust evaluation is based upon profile similarity analysis. Trust computation uses preferred attribute among profile attributes to evaluate trust of users. The paper analyzes different trust evaluation methods based on the proposed model.
Discovery of Subject of Science and Technology Policy based on LDA Model
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.111-120
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
With the continuous increase of science and technology policy, how to find valuable information from the massive scientific and technological policies has becoming an urgent problem to be solved. Thus, this paper proposes a subject discovery method for large scale science and technology policy set. Based on LDA subject model for science and technology policy document subject modeling, this approach extracts time and geographical labels of science and technology policy, computes intensity of subjects under different time and geographical conditions, and obtains important subjects and analysis of theme change trend of the intensity of subjects under condition constraints. Experimental results demonstrate that this method can excavate and analyze the subjects form large scale science and technology policies quickly and effectively.
A Text Clustering Algorithm based on Weeds and Differential Optimization
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.121-130
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
Invasive weed optimization (IWO) is a swarm optimization algorithm with both explorative and exploitive power where the diverisity of the population is obtained by allowing the reproduction and mutation of individuals with poor fitness .Differential optimization algorithm is a random parallel algorithm according to a vector change that can make individuals change toward outstanding individuals with global convergence. For k-means algorithm , the traditional algorirhm is prone to get stuck at local optimum and is sensitive to random initialization. Based on the aforementiond background a novel optimization algorithm based hybriding DE and IWO which denoted IWODE-KM is employed to optimize the parameters of k-means and is further applied to chinese text clustering. Experiment results shows that the proposed method outperforms both of its ancestors.
Probability Based Virtual Machines Placement for Green Data Center
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.131-140
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
Virtual Machine Placement (VMP) is regarded as an important criterion to improve resource utilization and reduce energy consumption for cloud data centers. The existing VMP schemes simply set the VM resource requirements fixed values and ignore their fluctuation characteristics. Assuming normal distribution resource requirements, we firstly present a model for data centers based on a more accurate energy consumption model for single machine. Then, an effective genetic algorithm is adopted to solve this model. In the algorithm, some important issues, such as the number of population, fitness function and calculating method of energy consumption are discussed. In the end, we validate our method by experiments.
Research on the E-commerce Platform Performance and Green Supply Chain based on Data Mining and SVM
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.141-150
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
In the network environment, supply chain management has greatly reduced the product development cycle, reduce the inventory. With the continuous development of information technology, e-commerce logistics platform has become the main factor affecting the development of logistics industry. In this paper, the authors research on the E-commerce platform performance and green supply chain based on data mining and SVM. The green supply chain considers the environmental problems in every link of the supply chain, and promotes the coordinated development of economy and environment. The result shows that the most critical factor that affects the satisfaction of consumer to B2C e-commerce platform is the accurate, complete and reliable logistics service.
Introduction to Global Educational Database
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.151-172
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
Educational Data Mining is one of the major on-going research platforms now. Students’ records need to be maintained and analyzed in a manner so that it can be utilized to predict students’ behavior and learning methods. Although we know students’ records need to be processed and analyzed; the primary challenge is to gather individual academic student details. This paper proposes a global database of students irrespective of geographical boundaries. Academic performance of every student from every country will be updated in this platform. Students’ performance on major examinations will be available in the database. Supporting documents and performance details will be readily available and accessible to the evaluators from any geographic location. This will be helpful to standardize the evaluation process and analyze the performance of a student, irrespective of geographic boundaries. The following paper will discuss the available EDM tools and how data can be analyzed to extract information.
Improved PSO Research for Solving the Inverse Problem of Parabolic Equation
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.173-184
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
Parameter identification problem has important research background and research value, has become in recent years inverse problem of heat conduction of top priority. This paper studies the Parabolic Equation Inverse Problems of parameter identification problem, and applies PSO to solve research. Firstly, this paper establishes the model of the inverse problem of partial differential equations. The content and classification of the inverse problem of partial differential equations are explained. Frequently, the construction and solution of the finite difference method for parabolic equations are studied, and two stable schemes for one dimensional parabolic equation are given. And two numerical simulations were given. Partial differential equation discretization was with difference quotient instead of partial derivative. The partial differential equations with initial boundary value problem into algebraic equations, and then solving the resulting algebraic equations. Then, the basic principles of PSO and its improved algorithms are studied and compared. Particle swarm optimization algorithm program implementation. Finally, the Parabolic Equation Inverse Problems of particle swarm optimization algorithm performed three simulations. We use a set of basis functions gradually approaching the true solution, selection of initial value. The reaction is converted into direct problem question, then use difference method Solution of the direct problem. The solution of the problem with the additional conditions has being compared. The reaction optimization problem is transformed into the final particle swarm optimization algorithm to solve. Verify the Parabolic Equation Inverse Problems of particle swarm optimization algorithm correctness and applicability.
On Uncertain Probabilistic Data Modeling
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.185-194
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
Uncertainty in data is caused by various reasons including data itself, data mapping, and data policy. For data itself, data are uncertain because of various reasons. For example, data from a sensor network, Internet of Things or Radio Frequency Identification is often inaccurate and uncertain because of devices or environmental factors. For data mapping, integrated data from various heterogonous data sources is commonly uncertain because of uncertain data mapping, data inconsistency, missing data, and dirty data. For data policy, data is modified or hided for policies of data privacy and data confidentiality in an organization. But traditional deterministic data management mainly deals with deterministic data which is precise and certain, and cannot process uncertain data. Modeling uncertain data is a foundation of other technologies for further processing data, such as indexing, querying, searching, mapping, integrating, and mining data, etc. Probabilistic data models of relational databases, XML data and graph data are widely used in many applications and areas today, such as World Wide Web, semantic web, sensor networks, Internet of Things, mobile ad-hoc networks, social networks, traffic networks, biological networks, genome databases, and medical records, etc. This paper presents a survey study of different probabilistic models of uncertain data in relational databases, XML data, and graph data, respectively. The advantages and disadvantages of each kind of probabilistic modes are analyzed and compared. Further open topics of modeling uncertain probabilistic data such as semantic and computation aspects are discussed in the paper. Criteria for modeling uncertain data, such as expressive power, complexity, efficiency, extension are also proposed in the paper.
A Survey on Ontology based Web Usage Mining
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.195-202
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
The exponential increase in information, users and number of Websites on WWW has given rise to number of challenges. The most important challenge is the effective and systematic management of this massive Web data. For Web users, it is very difficult to access relevant information quickly and efficiently. And for Web site owners, it is very difficult to satisfy their users’ information needs effectively. Web Usage Mining has been used to deal with aforesaid issues. The Web Usage Mining techniques are solely based on knowledge acquired through the analysis of the users’ navigational behavior. Hence, quality of discovered patterns is low. Recent studies show that, semantically enriched Web Usage Mining enhances the quality of discovered patterns. The semantically enriched Web is called as Semantic Web, and this new form of Web Usage mining is called as Semantic Web Usage Mining. It is also called as Ontology based Web Usage Mining, as Ontologies act as backbone for conceptual description of semantic knowledge in Semantic Web. In this paper, we have presented brief overview of conventional Web usage mining and performed an extensive survey of research work done in ontology based web usage mining.
Research on an Improved Decision Tree Classification Algorithm
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.203-216
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
In the paper, with the introduction of data mining algorithm of the classification in detail, and then combining the classification algorithm and incremental learning technology, an incremental decision tree algorithm is proposed to solve the problem of incremental learning and analysis the experimental data for this algorithm. The paper used ID3 and C4.5 algorithm for detailed research. According to two algorithms, combining Bayesian classification algorithm’s incremental learning characteristic, the paper proposed an incremental decision tree algorithm , and by the analysis of experimental data. This algorithm can solve the incremental learning problem of the decision tree algorithm very well.
Research on Spatial Clustering Algorithm based on Data Mining
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.217-230
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
We extended the online learning strategy and scalable clustering technique to soft subspace clustering, and propose two online soft subspace clustering methods, OFWSC and OEWSC. The proposed evolving soft subspace clustering algorithms can not only reveal the important local subspace characteristics of high dimensional data, but also leverage on the effectiveness of online learning scheme, as well as the ability of scalable clustering methods for the large or streaming data. Furthermore, we apply our proposed algorithms to text clustering of information retrieval, gene expression data clustering, face image classification and the problem of predicting disulfide connectivity.
Hybrid Intrusion Detection Method to Increase Anomaly Detection by Using Data Mining Techniques
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.231-240
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
An Intrusion Detection System is an application which observes movements or action happen on the network and determine it for any kind of harmful activity that can disturb computer security policy. With progress of increase the usage rate of the internet, there is a widely increase in the number of internet attacks as well, so contests arise towards the network security due to the arrival of new approaches of attacks. To classify these attacks, a new hybrid method with the help of data mining based on decision tree C4.5 and Meta algorithm is planned. This method gives a classifier which expands the whole accuracy of detection. Many data mining techniques have been settled for detecting intrusion. For recognition of anomalies a hybrid technique based on decision tree C4.5 with Meta algorithm is offered that provides better accuracy and reduces the problem of high false alarm ratio. The assessment of the given approach is made with other data mining techniques. With this given approach detection rate is improved significantly. KDD Cup 1999 dataset use for experimental work.
Research On Mobile Medical Integration System for Children
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.241-252
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
In view of the valuable medical resources at home and abroad, especially considering the fact that children's medical resources can't meet the needs, the author designed the mobile medical integration system with the help of mobile Internet platform and cloud computing platform. The system is divided into two parts of the mobile terminal and cloud computing platform, which are used respectively by the guardian and the doctor. The health monitoring terminal designed for children are wearable watches whileAPP is developed for the guardian and the doctors. Cloud platform designed a platform for data storage, message processing, functional applications and other modules, forming a “cloud+client” service model. This system makes the children's disease prevention, emergency treatment and medical treatment behavior become more convenient and fast, protects the healthy growth of children, provides reference and solution for the future development of medical care.
Design of Intelligent Stadiums Management System Based on ASP.NET
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.253-266
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
This paper analyzes the application requirements of stadium, and B / S architecture is used to design of stadium with intelligent management system, including five modules: systematic management, site management, membership management, sparring management and device management. SQLServer2005 database management system is used to establish a database system so that we can achieve seamless and efficient combination of ASP.NET applications. Finally, the basic module functions of the system were tested and the result shows that the system functions achieve the desired goals. It performed well and could improve efficiency greatly.
Pig Vs. Hive Use Case Analysis
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.267-276
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
Corporations are changing their practices to data-driven big data initiatives, as big data analytics has provided companies with the ability to grow their businesses and increase competition. As the importance of data analytics grew, so accordingly did the size of the data to analyze, thus demanding a more powerful data platform. This paper shows a case study of two High Level Query Languages that are constructed on top of Hadoop MapReduce; Pig and Hive. By creating a query in each query language, both resulting in an identical output, and by running each query 30 times on 2 different sized files (120 runs total), this comparison provides a statistically significant conclusion.
Research on Online Sports Metadata Extraction System based on Video Processing Technology
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.277-288
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
ports video metadata extraction system based on the content of basic goal use an automated or semi-automated interactive means to obtain video data as complete features and attributes for efficient retrieval mechanism. For fast access to video information needed, sports video ornamental create conditions. Firstly, video-based layered metadata description model, we discuss the structure of the video processing technology, and an increase in the time domain and airspace video object motion information on this basis. Low-level visual features for video and high-level semantic features presents a particular field of video information for video implicit hierarchical division method. Video automated visual feature extraction, semantic feature places marked attracted achieve human-computer interaction. Focus on the sports information descriptors and visual content descriptors, descriptor structure video. Video data based on hierarchical structure model and video features standard video content description model.
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.289-298
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
Big Data is a term used to identify the datasets that due to their large size, is very difficult to manage with traditional techniques. This data may be in the order of magnitude of petabytes. It can be found easily on web, especially on social media in the form of customer blogs, reviews and comments. Generally it is unstructured data or semi-structured data. One can use this big data to generate values by calculating sentiment score. Map Reduce is one of the most popular algorithm in Hadoop environment to perform such task. The objective of present research is to automate the process of extracting sentiments expressed about specific features of a product. For this purpose three datasets generated by Amazon for different types of electronics product reviews has been used. The data sets used consists of reviews of the products Nikon Coolpix 4300 Camera, Nokia 6601 mobile and the Canon G3camera. Map Reduce algorithm on Hadoop environment that is considered faster, reliable and fault-tolerant for processing big amounts of data in-parallel on large clusters, has been used to extract sentiment score.
Implementation of Basketball Training Management System Based on Big Data Technology
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.299-310
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
The technology of large data analysis has important practical significance to players digging, tactics and training monitoring. In order to improve the performance of basketball training, the big data technology is applied in the training management system. The reform of basketball training is being carried out, and the research on the combination selection mode of the basketball training is being discussed. This is not only from the traditional technology to the combination of training to our physical education, but also from the tactical thinking to cultivate students. This can promote the performance and training interactive quality in basketball sports training and training.
FastMap Projection for High-Dimensional Data: A Cluster Ensemble Approach
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.311-330
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
High-dimensional data with many features present a significant challenge to current clustering algorithms. Sparsity, noise, and correlation of features are common properties of high-dimensional data. Another essential aspect is that clusters in such data often exist in various subspaces. Ensemble clustering is emerging as a leading technique for improving robustness, stability, and accuracy of high-dimensional data clusterings. In this paper, we propose FastMap projection for generating subspace component data sets from high-dimensional data. By using component data sets, we create component clusterings and provides a new objective function that ensembles them by maximizing the average similarity between component clusterings and final clustering. Compared with the random sampling and random projection methods, the component clusterings by FastMap projection showed high average clustering accuracy without sacrificing clustering diversity in synthetic data analysis. We conducted a series of experiments on real-world data sets from microarray, text, and image domains employing three subspace component data generation methods, three consensus functions, and a proposed objective function for ensemble clustering. The experiment results consistently demonstrated that the FastMap projection method with the proposed objection function provided the best ensemble clustering results for all data sets.
Design and Implementation of Hadoop-based Customer Marketing Big Data Processing System
보안공학연구지원센터(IJDTA) International Journal of Database Theory and Application Vol.9 No.12 2016.12 pp.331-340
※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.
The era of big data has become the core competitiveness of enterprises, which is an important business capital with supporting data. Companies can truly benefit undefeated. Multi data also have cultural change, organizational models and even business. This requires that companies use to analyze consumer data on consumer behavior. This article study and implementation of a distributed architecture platform Hadoop big data applications (Web data mining and processing platform), the use of Hadoop massive data processing capacity and strong elastic computing capacity expansion. Data processing and analysis will be raised to new heights. For this type of paper, at the same time focus on the distributed cluster data storage, processing capacity optimization and key technology research. The reliability and validity were analyzed to determine the data collected in this study. The questionnaire is effective, which can be used in subsequent studies. Then correlation analysis and regression analysis model and hypothesis were tested. Finally, the model of precision marketing strategy put forward recommendations.
0개의 논문이 장바구니에 담겼습니다.
선택하신 파일을 압축중입니다.
잠시만 기다려 주십시오.