Course Descriptions (Spring 2024)


Intermediate Python

Andy Bowman

Summary

Learning Python is important for any aspiring data scientist who is interested in programming as a skill, a discipline, and a profession. For those students attending the Introduction to Python class on Monday and Tuesday, we will continue working with hands-on programming and managing large and complex datasets.

Why Take This Course?

This course moves beyond the introductory concepts presented in the two-day course, “Introduction to Programming in Python” and presents more programming-focused content with data applications. Students will get hands-on practice with creating and manipulating data sets, performing error handling, and storing and retrieving data using relational databases.

What Will Participants Learn?

Students will learn how to

  • create programs that manipulate data
  • perform error handling
  • store and retrieve data using relational databases
  • use the pandas library, the de facto standard to work with tabular data in Python

Prerequisites and Requirements

An introductory knowledge of Python is required.  You will need to also be skilled in basic mathematics and algebra as well as e-mail and Web usage. Please bring a laptop to the course.


Basics in R for Data Science and Statistics

Santiago Olivella

Summary

This course introduces the use of the popular R statistical programming language for conducting and sharing your data science projects. R is a mature and vibrant programming language, and one of the major platforms for doing data science.  This course covers how to use RStudio effectively, frequently used data structures, importing data, common data manipulations, summary statistics, data visualizations, and basic regression modeling through the suite of packages called the “tidyverse.” 

Why Take This Course?

R is an extremely versatile programming language that has the capability to fit a fantastic array of statistical and machine learning models. It is open-source, free, and gives you the ability to easily and widely share your analyses. The “tidyverse” offers an ecosystem of tools that help you harness R’s versatility and power, allowing you to easily import data, wrangle it, and use statistics to gain insights from it. In turn, RStudio offers a one-stop-shop development environment for your data analysis projects. By introducing you to some of the most useful features of RStudio and the tidyverse, this course will help you build a solid foundation in the use of R to complete the common tasks associated with a typical data science pipeline.

What Will Participants Learn?

The course provides a modern introduction to using R for data science through the extremely popular suite of packages called the “tidyverse.”  More specifically, the course will cover: 

Day 1:

  • Working with the RStudio IDE
  • R object types and data storage
  • Reading data from common formats into R
  • Piping multiple operations
  • Common data manipulations and reshaping  

Day 2:

  • Numerical data summaries
  • Graphical exploratory data analysis
  • Basic regression and classification models 
  • Sharing your work in reproducible ways using RMarkdown 

Prerequisites and Requirements

This course will make heavy use of hands-on programming in R. Thus, some prior familiarity with the basics of the R language is required. We will generally introduce a topic and then have hands-on exercises to practice and explore that topic.  Participants must have access to the internet and the ability to install programs and download files.  This course assumes a strong working knowledge of computers and, although not required, it would be beneficial to have past experience with the logic of programming and/or conducting statistical analyses. Participants are provided instructions and resources in advance to assist with installing R, RStudio, and the tidyverse. 


Overview to AI and Deep Learning

Amy Hemmeter

Summary

There has been tremendous growth in AI over the past 10 years. Everyday we hear of challenging problems that are being solved using AI. At the same time we hear about the lack of explainability of decisions made by AI and about biases in AI models. Many of the key advances in AI are due to the advances in machine learning, especially deep learning. Natural language processing, computer vision, speech translation, biomedical imaging, and robotics are some of the areas that have benefited from deep learning methods. This course is designed to provide an overview of AI and in particular deep learning. We will look at the history of neural networks, how advances in data collection and computing have caused the revival in neural networks, the different types of deep learning networks and their applications, and tools and software available to design and deploy deep networks. The objective is to provide you with an overview, not necessarily to start coding and creating AI systems.

Why Take This Course?

This course is for those who are interested in understanding more about AI and deep learning, how they are used in different applications, and their advantages and disadvantages.  It is not meant to teach you about any of the deep learning frameworks or developing deep learning models.

What Will Participants Learn?

The course will focus on the following topics:

  • History of neural networks
  • Neural networks as universal approximators
  • Training neural networks as an optimization problem
  • Deep neural networks
  • Why deep learning now?
  • Types of neural networks
  • Applications of deep learning and neural networks in image processing, natural language processing, robotics, computer vision, biomedical, and health care.

Prerequisites and Requirements

This course does not have any prerequisites.


Intermediate Data Visualization

Eric Monson

Summary

Participants will see and experience how commonly available software can be used to create compelling visualizations and learn effective visualization principles so you can go beyond the defaults to create eye-catching and impactful figures!

Why Take This Course?

Visualization is a powerful way to reveal patterns in data, attract attention, and get your message across to an audience quickly and clearly. However, there are many steps in that journey from information to influence, and many choices to be made when creating visuals to tell your story. This course will start with a presentation of effective visualization principles. It will then walk participants through a variety of chart types to help expand their imaginations when trying to decide how to plot their data. At each stage we’ll practice a group critique process and see how changes to charts affect not only their visual form, but their message. We will also push past the initial stage of chart creation, to annotate and polish a figure so it has a professional look and feel that most people don’t know how to achieve using simple tools.

What Will Participants Learn?

Students will learn and practice:

  • Effective data visualization principles
  • Chart types and how to match them to the type of data and the story you’re trying to tell
  • Critiques of existing visualizations to guide choices and revisions
  • A general and useful, but slightly advanced technique for making horizontal symbol plots in Excel (dumbbell, lollipop, or forest plots)
  • How to give your visualization a professional polish by adding graphical elements and annotations in PowerPoint

Prerequisites and Requirements

This course is designed at an intermediate level, which means we will go into a few topics in more depth than I typically would in an introductory session, but it will still be accessible to beginners. It assumes some comfort with spreadsheets, as well as some basic experience making charts in Excel and creating slides in PowerPoint. No programming will be necessary.

Before class, please install fairly recent (not more than a year or two old) versions of Microsoft Excel and PowerPoint if you want to participate in the hands-on, computer-based activities. Please have software installed before class – we will not be spending time on that during the day.


Deep Learning in Python

Edgar Lobaton

Summary

In the past few years, deep learning (DL) has emerged as a powerful machine learning method that has found applications in areas such as object recognition, image classification, video analysis, and natural language processing. This course will discuss where and how deep learning is used and how to get people to use it. The approach will be to minimize the math and concentrate instead on the underlying ideas and principles. We will concentrate on Tensorflow/Keras as the underlying computational platforms and use Python to create the DL codes. Much of the course will be driven by a number of hands-on exercises that will help you build simple networks in Keras/Tensorflow. These exercises will cover analysis of tabular data, images, videos, and text. At the end of the course you will have a basic understanding of DL, Tensorflow/Keras as a DL platform, and example applications.

Why Take This Course

Deep Learning appears to be a magical algorithm that can solve difficult problems in a variety of domains. The coming together of a number of trends underlie the success of DL. In this course, you will get a chance to look under the hood of the DL hype and see the underlying architectures, and you will be exposed to a set of tools that you can use to create your own deep learning models.

What Participants Will Learn

The course will focus on the following topics:

  • Neural networks
  • Deep neural networks
  • Training of deep neural networks
  • Tensors, Tensorflow and Keras
  • Convolutional neural networks
  • Transformers
  • Autoencoders
  • Generative adversarial networks
  • Transfer learning
  • Text as data

Participants will complete a number of computer exercises using Python and Keras/Tensorflow.

Prerequisite and Requirements

This course will assume an understanding of statistics and calculus at the undergraduate level and programming experience with Python to get full benefits from the class.


Introduction to Programming in R

Jonathan Duggins

Summary

Statistical programming is an integral part of many data-intensive careers and data literacy, and programming skills have become a necessary component of employment in many industries. This course begins with necessary concepts for new programmers—both general and statistical—and explores some necessary programming topics for any job that utilizes data. The R software package will be used as it is one of the most popular programs in statistics and data science. R programming is done in the RStudio integrated development environment.

Why Take This Course

Data is everywhere. Every industry, and most careers, now involve working with data at some point. As data-driven decisions become the norm in many careers, individuals need to add to their professional skills by learning the essentials of programming.

Whether their goal is to better communicate with coworkers who program or to integrate programming in their day-to-day jobs, this course benefits individuals who are looking to understand general principles behind statistical programming and to develop programming skills necessary to work in data-related areas.

Even though computers are “literal,” programming offers many techniques for solving a particular problem. The combination of necessary precision and freedom of choice regarding techniques can be frustrating for beginners.

What Participants Will Learn

Covering both general computing concepts and an introduction to R, a general outline of the course is given below:

Day 1:

  • General computing concepts
  • Fundamental programming concepts
  • Introduction to R and RStudio
  • R functions
  • Common R data structures

Day 2:

  • R packages
  • Reading and writing data files (readr package)
  • Subsetting data by rows, columns, or both (dplyr package)
  • Deriving new variables unconditionally and conditionally (dplyr package)
  • Introduction to summarizing data

Prerequisite and Requirements

This course aims to support individuals with little to no previous programming experience as they gain the fundamentals necessary to be successful programmers. No prior programming experience is expected in R or any other language. However, this course relies on active participation of the participants via hands-on programming. R and RStudio should be installed in advance so that participants can devote their time to practicing the skills from the course. Participants are provided instructions and resources in advance to assist with the installation process.