Science Based Data Science

Course Number: 
Hours - Lecture: 
Hours - Lab: 
Hours - Recitation: 
Hours - Total Credit: 
Typical Scheduling: 
Not typically scheduled

The lectures will focus on an introduction of modern data science techniques and the foundational mathematical concepts in linear algebra, probability, and basic optimization related with these techniques. Sufficient case studies with real-world data sets will be provided to illustrate how to use the learned techniques and how to choose an appropriate model. To gain hands-on experience of analyzing real-world datasets, students will be given programming exercises to practice and improve their programming skills while applying the newly acquired knowledge to those exercises. Homework assignments (assigned during the first 9 weeks) will test students' ability to generalize the techniques covered in each lecture to different examples. Projects will be assigned in Week 10 and students will work in groups of 3-4. The projects will have weekly milestones in weeks 11-14 culminating in a final presentation and report due at the end of the class.  


Calculus I and II: Math 1551 and Math 1552; and Linear algebra, such as 1553 or 1554 or 1564. 

Course Text: 
  • The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedma 
  • Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning by Alan J. Izenman. 
  • A tutorial on spectral clustering by Ulrike Von Luxburg, Statistics and computing 17.4 (2007): 395-416. 
Topic Outline: 

1. Linear and logistic regression;  

2. kernel methods;  

3. Regression trees and ensemble methods;  

4. Clustering methods, such as k-means, k-flats and spectral clustering;  

5. Dimension reduction techniques, such as principal component analysis, multidimensional scaling and graph-based dimension reduction.