Module/Course Title: Data Mining

Module course code

KOMS120504

Student Workload
119 hours

Credits

3 / 4.5 ETCS

Semester

Frequency

Odd Semester

Duration

Type of course

Field of Study Courses

Contact hours

37.50 hours of face-to-face (theoretical) class activity
8.50 hours of lab activities

Independent Study

45 hours of independent activity
45 hours of structured activities

Class Size

Prerequisites for participation (if applicable)

Learning Outcomes

Students can demonstrate systematic thinking in analyzing and designing intelligent system solutions
Students can apply effective methods in developing intelligent systems
Students can create and evaluate intelligent systems
Students can explain major issue in data mining
Students can apply machine learning, pattern recognition, statistics, visualization, algorithm, database technology and high-performance computing in data mining applications
Students can apply data mining techniques on datasets of realistic sizes using modern data analysis frameworks

Subject aims/Content

Data Mining course discusses data mining process includes data selection and cleaning, machine learning techniques to ``learn" knowledge that is ``hidden" in data, and the reporting and visualization of the resulting knowledge. This course will cover these issues and will illustrate the whole process by examples of practical applications from the life sciences, computer science, and commerce. Several machine learning topics including classification, prediction, and clustering will be covered.

Study Material

Introduction to Data Mining

Definition of data mining
Purpose of data mining
Data mining stages

Data

Data Type and Quality
Preprocessing
Data measurement technique

Data Exploration

Data Statistics
Data Visualization
Multi-dimensional data analysis & OLAP

Classification Method:

Basic concepts of classification
Decision Tree and Model Overfitting

Classification Technique:

K-Nearest Neighbor
Comparison with Decision Tree

Classification Technique:

Naive Bayes
Comparison with Decision Tree, and K-Nearest Neighbor

Association Method:

Association Analysis
FP-Growth . Algorithm
Techniques for evaluating association patterns
Frequent itemset generation
Rule generation, compact representation of frequent itemset

Association Technique

Handling categorical attributes and continuous attributes in association analysis
Sequential, subgraph and infrequent patterns

Clustering

Definition and basic concepts of clustering
K-Means Algoritma Algorithm

Clustering:

Hierarchical Clustering
DBSCAN algorithm

Data anomaly

Definition of data anomalies and statistical approaches to address data anomalies
Detection with proximity-based outliers, detection of density-based outliers & clustering-based technique

Data Mining Apps and Trends

Spatial & Multimedia Data Mining
Text & Web Mining

Data Mining Apps and Trends

Application of data mining in financial, industrial retail, telecommunications, biology, and science applications
Data mining system products

Clustering

Definition and basic concepts of clustering
K-Means Algoritma Algorithm

Teaching methods

Synchronous:

Face-to-face meetings/online meetings

Assesment Methods

Attendance and participation

This module/course is used in the following study programme/s as well

Computer Science Study Programme

Responsibility for module/course

I Nyoman Saputra Wahyu Wijaya, S.Kom., M.Cs
NIDN : 0826108901

Other Information

Introduction to Data Mining 2nd Edition, Tan, Pang-Ning; Steinbach, Michael; Kumar, Vipin, vi Pearson Education, Inc, 2015
Data Mining Concepts and Techniques 3rd edition, Han, Jiawei; Kamber, Micheline, and Jian Pei, , Morgan Kaufmann, 2011
Data Mining and Knowledge Discovery Handbook Second Edition,Maimon,Oded; Rocach, Lior, Springer, 2010
I. N. S. W. Wijaya, K. A. Seputra, and W. G. S. Parwita, “Comparison of the BM25 and rabinkarp algorithm for plagiarism detection,” J. Phys. Conf. Ser., vol. 1810, no. 1, 2021, doi: 10.1088/1742-6596/1810/1/012032.
S. V. Pandey and A. V. Deorankar, “A Study of Sentiment Analysis Task and It’s Challenges,” Proc. 2019 3rd IEEE Int. Conf. Electr. Comput. Commun. Technol. ICECCT 2019, pp. 1–5, 2019, doi: 10.1109/ICECCT.2019.8869160.
J. Oyelade et al., “Data Clustering: Algorithms and Its Applications,” Proc. - 2019 19th Int. Conf. Comput. Sci. Its Appl. ICCSA 2019, no. ii, pp. 71–81, 2019, doi: 10.1109/ICCSA.2019.000-1.
N. Besimi, B. Çiço, and A. Besimi, “Overview of data mining classification techniques: Traditional vs. parallel/distributed programming models,” 2017 6th Mediterr. Conf. Embed. Comput. MECO 2017 - Incl. ECYPS 2017, Proc., no. June, pp. 2–5, 2017, doi: 10.1109/MECO.2017.7977126.