The Elements of Statistical Learning: Data Mining, Inference, and Prediction 2nd Ed., Hastie, Tibshirani, and Friedman.
Matlab a practical introduction to programming and problem solving 3rd Ed., Stormy Attaway.
Course Information
Data mining is the process for discovering patterns in big data using techniques from mathematics, computer science and statistics with applications ranging from biology and neuroscience to history and economics. The goal of the course is to teach students fundamental data mining techniques that are commonly used in practice. Students will learn advanced data mining techniques (including linear classifiers, support vector machines, clustering, dimension reduction, transductive learning and topic modeling). The course is designed to be applicable and fulfilling for both strongly motivated students who have taken Linear Algebra and advanced mathematics or computer science majors. Prerequisite: MATH 60 (Linear Algebra); and a proof based course above MATH 100 or CSCI 62 (Data Structures and Advanced Programming); or consent of the instructor.
Note : Familiarity with a computer programming language (C++, Java, Matlab, Python or R) or CSCI 51 (Intro to Computer Science) is suggested but not required.
Topic modeling and non-negative matrix factorization
More on topic modeling / natural language processing
makeup, (if time allows) Transductive learning, diffusion, motion by mean curvature (MBO)
Student presentations
Final Exam
Final Projects
Each student will work in a group of 3-5 students, on a final project related to the material covered in the course. The goal of the final project is to have each student demonstrate their knowledge of the material by applying and extending the algorithms presented in class to an application of their choice (in consultation with the course instructors). Each group will submit a written proposal, progress report, final report and deliver a 20 minute oral presentation of their results. Students are expected to meet regularly with the course instructor to discuss project development.
Final Project Topics
Enron email - email text mining and time series analysis
Yahoo finance - stock and market analysis, clustering, visualization, auto-summary
Hyperspectral Imaging - pixel classification and object detection
Audio - voice command classification (OK Google / Siri)
Enron Social Network - Email network, social network analysis
MRI brain images - image noise statistical analysis and reduction
Video - Scene detection and object recognition
LAPD field interview cards - community detection
Robot survey data - object classification (xBox kinect data)
Google election polling data - trend detection and topic model fitting
data acquired from outside the course - from other classes, labs, internet or industry
Homework
There will be 8 bi-weekly homework assignments. The homework will be made of three parts including
- a math component
- a computational component
- an advanced theory component. You can choose between
---- a theoretical math option (requiring MATH 100+)
---- or advanced computer science option (requiring CSCI 62).