- Introduction to Effective Information Visualization
- Basics of R for Data Science and Statistics
- Introduction to Python
- Visualization for Data Science in R
Introduction to Effective Information Visualization
Eric Monson
Summary
Participants will see how freely and commonly available software can be used to create effective visualizations; learn how to clean and re-structure data; and learn basic effective visualization principles so you can go beyond the defaults and create eye-catching and impactful figures!
Why Take This Course?
Visualization is a powerful way to reveal patterns in data, attract attention, and get your message across to an audience quickly and clearly. However, there are many steps in that journey from information to influence, and many questions – what visualization tools to use, how to get data into the right format, and which choices to make when putting it all together to tell your story? This course will quickly walk participants through a wide variety of data and chart types to help even beginners feel comfortable embarking on a new visualization project.
What Will Participants Learn?
The course will cover five major topic areas during the eight time blocks over two days. Sessions will combine lecture and hands-on activities in data cleaning, transformation, and visualization.
- Tips for effective data visualization and graphic design (2 course modules)
- Charts, maps, and interactive dashboards with Tableau (3 course modules)
- Cleaning and re-structuring data using OpenRefine (1 course module)
- Free web-based tools such as RAW for less-common visualization types
- Critique of existing visualizations and information graphics
Prerequisites and Requirements
This course is designed at an introductory level but will assume a basic understanding of spreadsheets as a way of storing and processing data. No programming will be necessary. Note that the Tableau training starts at a very introductory level, so if you are already an advanced user, that particular content may feel basic to you.
Before class, please install recent versions of Tableau (Public or Desktop), a web browser, and OpenRefine in order to participate in the hands-on computer-based activities. Please have software installed before class. We will not be spending time on that during the day.
Basics of R for Data Science and Statistics
Justin Post
Summary
This course introduces the use of the popular R statistical programming language for conducting and sharing your data science projects. R is a mature and vibrant programming language, and one of the major platforms for doing data science. This course covers how to use RStudio effectively, frequently used data structures, importing data, common data manipulations, summary statistics, data visualizations, and basic regression modeling through the suite of packages called the “tidyverse.”
Why Take This Course?
R is an extremely versatile programming language that has the capability to fit a fantastic array of statistical and machine learning models. It is open-source, free, and gives you the ability to easily and widely share your analyses. The “tidyverse” offers an ecosystem of tools that help you harness R’s versatility and power, allowing you to easily import data, wrangle it, and use statistics to gain insights from it. In turn, RStudio offers a one-stop-shop development environment for your data analysis projects. By introducing you to some of the most useful features of RStudio and the tidyverse, this course will help you build a solid foundation in the use of R to complete the common tasks associated with a typical data science pipeline.
What Will Participants Learn?
The course provides a modern introduction to using R for data science through the extremely popular suite of packages called the “tidyverse.” More specifically, the course will cover:
Day 1:
- Working with the RStudio IDE
- R object types and data storage
- Reading data from common formats into R
- Piping multiple operations
- Common data manipulations and reshaping
Day 2:
- Numerical data summaries
- Graphical exploratory data analysis
- Basic regression and classification models
- Sharing your work in reproducible ways using RMarkdown
Prerequisites and Requirements
This course will make heavy use of hands-on programming in R. Thus, some prior familiarity with the basics of the R language is required. We will generally introduce a topic and then have hands-on exercises to practice and explore that topic. Participants must have access to the internet and the ability to install programs and download files. This course assumes a strong working knowledge of computers and, although not required, it would be beneficial to have past experience with the logic of programming and/or conducting statistical analyses. Participants are provided instructions and resources in advance to assist with installing R, RStudio, and the tidyverse.
Introduction to Python
Laura Tateosian
Summary
Learning Python is important for any aspiring data scientist. This course is designed for students with some prior exposure to computer programming, but no Python experience. Participants will be introduced to core Python elements for working with data.
Why Take This Course?
Python is a consistently top ranking programming language. Python syntax is easy to learn and the language is well-suited for rapid data exploration, as well as larger data science projects. This course will help you add basic Python skills to your data science tool belt, so that you can then go on to explore some of the vast number of libraries written in Python.
What Will Participants Learn?
The course presents essential foundational Python elements for manipulating and exploring data. Participants will gain hands-on practice working in notebooks and stand-alone scripts. Topics will include:
- Python data structures and built-ins.
- Pythonic process flow.
- File handling for exploring and manipulating data with Python’s Pandas library.
- Perform error handling and debugging.
- Organizing code with user-defined functions and modules.
Prerequisites and Requirements
This course assumes some experience with programming, but no experience with Python. Participants will need a computer to participate in the hands-on exercises. Additional instructions for software installation will be provided in advance of the course.
Visualization for Data Science in R
Angela Zoss
Summary
This course is designed for two audiences: experienced visualization designers looking to apply open data science techniques to their work, and data science professionals who have limited experience with visualization. Participants will develop skills in visualization design using R, a tool commonly used for data science. Basic familiarity with R is required.
Why Take This Course
Data science skills are increasingly important for research and industry projects. With complex data science projects, however, come complex needs for understanding and communicating analysis processes and results. Ultimately, an analyst’s data science toolbox is incomplete without visualization skills. Incorporating effective visualizations directly into the analysis tool you are using can facilitate quick data exploration, streamline your research process, and improve the reproducibility of your research.
What Participants Will Learn
The course will take a project-based approach to learning best practices for visualization for data science. Participants will be guided through several sample analysis and visualization projects that will highlight different types of visualization, different features of R and its visualization capabilities, and different challenges that arise when trying to apply an open data science philosophy to visualization.
- Introduction to visualization in R
- Using ggplot2 for publication-ready graphics
- Applying common graphic design principles to ggplot2 visualizations
- Adding interactivity to visualizations through R Markdown and HTML widgets
Prerequisite and Requirements
As indicated above, this course assumes basic familiarity with R—e.g., R syntax, data structures, development environments. Participants with no knowledge of R should consider taking an introductory R short course prior to this class.
We will use RStudio to interact with R, and all exercises will be distributed in R Markdown files (rather than simple R script files). This allows us to combine R code with non-code elements and promotes a literate programming approach to research.
A significant portion of the course will use ggplot2 and other tidyverse packages to create visualizations, but prior experience with those packages is not required. In order to participate in class exercises, participants should have installed current versions of R, RStudio, and the following packages: tidyverse, markdown, knitr, readxl, plotly, maps, mapproj, and sf. Permissions to install packages on the fly will be useful.