This course discusses basics of the knowledge discovery process, data mining, and provides a basic introduction to data science. The course focuses on two main aspects of knowledge discovery: mathematical tools and a well-defined and structured knowledge discovery process consisting of a number of interactive and iterative steps. The accompanying project (KU) concentrates on programming infrastructure for manipulating large-scale data. The examples that we use in the course are mainly text-mining and recommender examples

In recent years the amount of data that we produce increased dramatically. We already produce more data than we are able to store with the current technological solutions. Therefore, making sense out of these huge amount of data, or extracting useful, valid, understandable, and novel patterns from this data is of cruicial importance. Knowledge discovery, data mining, and data science are one of the approaches to tackle this problem. The other similar, but somewhat different approaches include database technology, machine learning, or statistics.

In this course we will investigate, analyze, and discuss a well-defined process for knowledge discovery in such a large data. Apart from the process we will also discuss the mathematics needed for data mining. The main online resource for this course is TeachCenter page. Here we just provide a few adiitional resources.

Course topics include:

- Review of mathematics needed in data mining
- Knowledge discovery process
- Text classification and clustering
- Semantic analysis of text documents
- Recommender systems

In this course the students will:

- Learn about the mathematical basics of data mining algorithms
- Learn about the steps from a knowledge discovery process
- Learn about selected data mining algorithms

At the end of this course the students will know how to:

- Analyze and design a typical knowledge discovery project.

- Feature Extraction
- Feature Engineering
- Data Matrices
- Review of linear algebra
- Principal Component Analysis
- SVD and Latent Semantic Analysis
- Recommender Systems: Matrix Factorization
- Non-negative Matrix Factorization
- Clustering
- Classification
- Evaluation

- TeachCenter Course
- Review of probability theory, linear algebra, and eigenvectors
- Mining massive datasets
- Advanced Data Analysis from an Elementary Point of View
- Probability primer YouTube series
- Lecture slides "Mining Massive Datasets" from Stanford University
- Introduction to Information Retrieval
- Machine Learning Course by Andrew Ng
- Lecture Notes from 2017

- Probability Essentials by Jacod and Protter
- Machine Learning by Tom Mitchell