

Course: MATH 3080

部门: Natural Science and 数学
Title: Foundations of Data Science

Semester Approved: 2020年秋季
Five-Year Review Semester: 2025年秋季

Catalog Description: Students will get an introduction to Python programming, data analysis 工具, and the necessary statistics to acquire, 清洁, 分析, 探索, and visualize data real-life data sets. Using statistics, students will learn to make data-driven inferences and decisions, and to communicate those results effectively.

Semesters Offered:春天
Credit/Time Requirement: Credit: 3; 讲座: 3; Lab: 0

先决条件: 数学 1210 and (either 数学 2040 or 数学 3040) with a C or better in each course

的理由: Data collection and the analysis of data is ubiquitous and fast becoming a prerequisite to economic success for businesses. This course provides a subset of the 工具 necessary to leverage data for prediction. This course will support the bachelor’s in software engineering degree by providing relevant mathematics coursework.

Student Learning Outcomes:
Students will acquire data through we-scraping and data APIs.  Students will be assessed through assignments, 小测验, exams and/or class discussion – instructor will provide feedback.

Students will 清洁 and reshape messy datasets. Students will be assessed through assignments, 小测验, exams and/or class discussion, and projects – instructor will provide feedback.

Students will learn to use statistical software to deploy statistical methods including generalized linear regression, 聚类分析, and classification. Students will be assessed through assignments, 类项目, 小测验, exams and/or class discussion – instructor will provide feedback.

Students will apply dimensionality reduction and perform basic analysis of network data.  Students will be assessed through assignments, 小测验, exams and/or class discussion – instructor will provide feedback.

Students will evaluate outcomes, make decisions based on data, and effectively communicate those results. Students will be assessed through assignments, 类项目, 小测验, exams and/or class discussion – instructor will provide feedback.

Students will understand and be able to apply the theoretical foundations underlying the methods applied throughout the course. Students will be assessed through assignments, 类项目, 小测验, exams and/or class discussion – instructor will provide feedback.

This course will include introduction to data analysis 工具 in Python, descriptive statistics, data structures with Numpy & Pandas, introductory hypothesis testing & statistical inference, web scraping and data acquisition via APIs, generalized linear regression, classification methods including logistic regression; k-nearest neighbors; decision trees; support vector machines; and neural networks, data visualization, clustering methods, dimensionality reduction; including principle component analysis; network analysis; rating, 排名, 和选举, 清洁ing and reformatting messy datasets using regular expression or dedicated 工具 such as open refine; natural language processing; ethics of big data.

This course supports an inclusive learning environment where diverse perspectives are recognized, respected and seen as a source of strength. The consideration of a diverse set of problems using real data will help support this goal.

Key Performance Indicators:
Student learning will be evaluated through:

Attendance / Participation 0 to 15%

Class Group Activities 10 to 15%

Computer Projects 20 to 50%

测验0 - 20%

首页work 5 to 25%

Midterm Exams / Tests 20 to 40%

Final Exam 15 to 35%

Representative Text and/or Supplies:
麦金尼,W. (current edition). Python for data analysis: Data wrangling with pandas, NumPy, and IPython. Sebastopol, CA: O'Reilly Media.

Geron,. (current edition). Hands-on machine learning with Scikit-Learn and TensorFlow: Concepts, 工具, and techniques to build intelligent systems. Beijing; Boston; Farnham; Sebastopol; Tokyo: O'Reilly.

A computer and statistical software are required for this course. Free software such as Python or R are recommended, but subscription software (e.g., SAS, SPSS) may be used at the discretion of the instructor.

Pedagogy Statement:
John Dewey stated that “education should not revolve around the acquisition of a pre-determined set of skills, but rather the realization of one’s full potential and the ability to use those skills for the greater good.” Applying this idea to the pedagogy of this course, the teacher will help students learn both theory and application in a modern curriculum. By the end of the course, students should know how to use technology to apply specific skills and to 分析 the results of their work.

This course supports an inclusive learning environment where diverse perspectives are recognized, respected and seen as a source of strength. This environment is supported by activities that consider data from a diverse set of sources. 此外, students will interact in groups and will be encouraged to think critically in the face of data that may disagree with their own beliefs.

Instructional Mediums:


Maximum Class Size: 25
Optimum Class Size: 20