Module/Course Title: Data Mining

Module course code

KOMS120504

Student Workload
119 hours

Credits

3 / 4.5 ETCS

Semester

5

Frequency

Odd Semester

Duration

16

1

Type of course

Field of Study Courses

Contact hours


37.50 hours of face-to-face (theoretical) class activity
8.50 hours of lab activities

Independent Study


45 hours of independent activity
45 hours of structured activities

Class Size

30

2

Prerequisites for participation (if applicable)

-

3

Learning Outcomes

  1. Students can demonstrate systematic thinking in analyzing and designing intelligent system solutions
  2. Students can apply effective methods in developing intelligent systems
  3. Students can create and evaluate intelligent systems
  4. Students can explain major issue in data mining
  5. Students can apply machine learning, pattern recognition, statistics, visualization, algorithm, database technology and high-performance computing in data mining applications
  6. Students can apply data mining techniques on datasets of realistic sizes using modern data analysis frameworks

4

Subject aims/Content

Data Mining course discusses data mining process includes data selection and cleaning, machine learning techniques to ``learn" knowledge that is ``hidden" in data, and the reporting and visualization of the resulting knowledge. This course will cover these issues and will illustrate the whole process by examples of practical applications from the life sciences, computer science, and commerce. Several machine learning topics including classification, prediction, and clustering will be covered.

Study Material

Introduction to Data Mining

  • Definition of data mining
  • Purpose of data mining
  • Data mining stages

Data

  • Data Type and Quality
  • Preprocessing
  • Data measurement technique

Data Exploration

  • Data Statistics
  • Data Visualization
  • Multi-dimensional data analysis & OLAP

Classification Method:

  • Basic concepts of classification
  • Decision Tree and Model Overfitting

Classification Technique:

  • K-Nearest Neighbor
  • Comparison with Decision Tree

Classification Technique:

  • Naive Bayes
  • Comparison with Decision Tree, and K-Nearest Neighbor

Association Method:

  • Association Analysis
  • FP-Growth . Algorithm
  • Techniques for evaluating association patterns
  • Frequent itemset generation
  • Rule generation, compact representation of frequent itemset

-

Association Technique

  • Handling categorical attributes and continuous attributes in association analysis
  • Sequential, subgraph and infrequent patterns

Clustering

  • Definition and basic concepts of clustering
  • K-Means Algoritma Algorithm

Clustering:

  • Hierarchical Clustering
  • DBSCAN algorithm

Data anomaly

  • Definition of data anomalies and statistical approaches to address data anomalies
  • Detection with proximity-based outliers, detection of density-based outliers & clustering-based technique

Data Mining Apps and Trends

  • Spatial & Multimedia Data Mining
  • Text & Web Mining

Data Mining Apps and Trends

  • Application of data mining in financial, industrial retail, telecommunications, biology, and science applications
  • Data mining system products 

-

Clustering

  • Definition and basic concepts of clustering
  • K-Means Algoritma Algorithm

5

Teaching methods

Synchronous:

Face-to-face meetings/online meetings

6

Assesment Methods

Attendance and participation

7

This module/course is used in the following study programme/s as well

Computer Science Study Programme

8

Responsibility for module/course

  • I Nyoman Saputra Wahyu Wijaya, S.Kom., M.Cs
  • NIDN : 0826108901

9

Other Information

  1. Introduction to Data Mining 2nd Edition, Tan, Pang-Ning; Steinbach, Michael; Kumar, Vipin, vi Pearson Education, Inc, 2015
  2. Data Mining Concepts and Techniques 3rd edition, Han, Jiawei; Kamber, Micheline, and Jian Pei, , Morgan Kaufmann, 2011
  3. Data Mining and Knowledge Discovery Handbook Second Edition,Maimon,Oded; Rocach, Lior, Springer, 2010
  4. I. N. S. W. Wijaya, K. A. Seputra, and W. G. S. Parwita, “Comparison of the BM25 and rabinkarp algorithm for plagiarism detection,” J. Phys. Conf. Ser., vol. 1810, no. 1, 2021, doi: 10.1088/1742-6596/1810/1/012032.
  5. S. V. Pandey and A. V. Deorankar, “A Study of Sentiment Analysis Task and It’s Challenges,” Proc. 2019 3rd IEEE Int. Conf. Electr. Comput. Commun. Technol. ICECCT 2019, pp. 1–5, 2019, doi: 10.1109/ICECCT.2019.8869160.
  6. J. Oyelade et al., “Data Clustering: Algorithms and Its Applications,” Proc. - 2019 19th Int. Conf. Comput. Sci. Its Appl. ICCSA 2019, no. ii, pp. 71–81, 2019, doi: 10.1109/ICCSA.2019.000-1.
  7. N. Besimi, B. Çiço, and A. Besimi, “Overview of data mining classification techniques: Traditional vs. parallel/distributed programming models,” 2017 6th Mediterr. Conf. Embed. Comput. MECO 2017 - Incl. ECYPS 2017, Proc., no. June, pp. 2–5, 2017, doi: 10.1109/MECO.2017.7977126.