Data Manipulation With Dplyr in R
- Description
- Curriculum
- FAQ
- Reviews
Data manipulation is a vital data analysis skill – actually, it is the foundation of data analysis. This course is about the most effective data manipulation tool in R – dplyr!
As a data analyst, you will spend a vast amount of your time preparing or processing your data. The goal of data preparation is to convert your raw data into a high quality data source, suitable for analysis. More often than not, this process involves a lot of work. The dplyr package contains the tools that can make this work much easier.
dplyr has a few important advantages over other data data manipulation tools or functions:
-
it’s much faster (25-30 times faster)
-
its code is easier to write and understand
-
it can use chaining to build sequences of commands, thus making the code even cleaner and faster to execute
For these reasons, dplyr quickly began the most popular data manipulation tool among R data scientists. When you finish this course, you will be able to
It is a short course, but it is focused on the most essential commands and functions of the dplyr package, those commands that you will likely use most often.
So let’s see what you are going to learn in this course.
The first section covers the five core dplyr commands. These commands are: filter, select, mutate, arrange and summarise. You will need this commands practically every time when you work with dplyr. They are used to subset data frames, compute new variables, sort data frames, compute statistical indicators and so on. Here’s a few real life scenarios of their utilization:
-
you need to extract from your respondents data set the male subjects with an income greater than $30,000
-
you need to compute each respondent’s income per family member, knowing the total income and the number of family members
-
you have a data set with 27 variables, but you only need 6 for your analysis (so you want to remove the extra variables)
-
you have to sort your employees data set by salary
-
you need to compute the average satisfaction towards a product, knowing each individual customer satisfaction etc.
The second section approaches other important dplyr commands and functions. In this section you’ll learn:
-
how to count the observation in a certain group
-
how to extract a random sample from your data frame
-
how to extract the top entries from your data frame, based on a given variable
-
how to visualize the structure of your data set
-
how to use the set operations in dplyr (if you have used these operations in base R, you’ll see that dplyr takes them to a whole new level).
In the third section you’ll start to take advantage of the true power of dplyr. Here we’ll talk about chaining – creating sequences of dplyr commands that accomplish multiple tasks with one click only.
The fourth section is about joining data frames with dplyr. This is a very important topic, because many times your data will be found in several data frames. So you will need to join these data frames into only one, suitable for your analyses. We are going to look at five join types available in dplyr: inner_join, semi_join, left_join, anti_join and full_join. We are going to examine the output of each join type using a simple example.
In the fifth section we’ll learn how to combine the dplyr and ggplot2 (using chaining) commands to build expressive charts and graphs. For example, if you want to represent the income distribution for the subjects with a higher education only, or the relationship between income and education level for the female subjects only, in this section you will learn exactly how to do it.
Every command is illustrated with video, both the syntax and the output being explained in detail. At the end of the course, a big number of practical exercises are proposed. By doing these exercises you’ll actually apply in practice what you have learned.
Join this course right now and acquire a critical data analysis ability – data manipulation!
-
2Overview of the Basic CommandsVideo lesson
Which are the main dplyr commands (or verbs) and what they are used for for.
-
3The filter() CommandVideo lesson
How to select entries (observations) in your data frame using various filtering conditions.
-
4The select() CommandVideo lesson
Select or remove columns (variables) in your data frame.
-
5The mutate() CommandVideo lesson
Add new variables to your data frame - either from scratch or using existing variables.
-
6The arrange() CommandVideo lesson
Sort your data frame by variable values.
-
7The summarise() CommandVideo lesson
Compute various statistical indicators for the numeric variables in your data frame.
-
8The group_by() CommandVideo lesson
You can apply the dplyr commands to your data fame by groups or segments. In this lecture you will learn how.
-
9The count() CommandVideo lesson
Let's see how you can count entries in your data frame, by group.
-
10The tally() CommandVideo lesson
Another useful way to count observations in your data frame.
-
11The n_distinct() CommandVideo lesson
A quick way to extract the unique values from a variable.
-
12The sample() CommandVideo lesson
More often than not, you may need to extract a random sample from your data. Let's see how to do that using the sample() command.
-
13The top_n() CommandVideo lesson
How to select the top entries in your data frame, based on variable values.
-
14The bind() CommandVideo lesson
How to easily merge two data frames that have the same number of rows or columns.
-
15The glimpse() CommandVideo lesson
Do you need to take a rapid look at your data frame? This command is exactly what you need.
-
16Set Operations (1)Video lesson
If you already know the set operations in base R, let me tell you that dplyr contains powerful extensions of these operations, that allow their use for data frames (not only for vectors). In this lecture we learn two set operations, union() and intersect(). We will exemplify them on data frames, of course.
-
17Set Operations (2)Video lesson
Other two useful set operations for data frames: setdiff(0 and setequal().
-
18What Is Chaining?Video lesson
The concept of chaining (or piping) in a nutshell.
-
19Simple Chaining ExamplesVideo lesson
Let's start with a few easy to understand examples of chaining, using the main dplyr verbs, to get a better view of the procedure.
-
20More Chaining ExamplesVideo lesson
Getting to more challenging examples, using other dplyr commands as well.
-
21Even More Chaining ExamplesVideo lesson
An example is worth 1000 words, so let's examine a few more complicated chaining examples.
-
22The Main Joining Commands in dplyrVideo lesson
A short presentation of five joining functions available in the dplyr package.
-
23The inner_join() CommandVideo lesson
Joining two data frames that present a common variable using the inner_join() command.
-
24The semi_join() CommandVideo lesson
Joining two data frames that present a common variable using the semi_join() command.
-
25The left_join() CommandVideo lesson
Joining two data frames that present a common variable using the left_join() command.
-
26The anti_join() CommandVideo lesson
Joining two data frames that present a common variable using the anti_join() command.
-
27The full_join() CommandVideo lesson
Joining two data frames that present a common variable using the full_join() command.
-
28How It WorksVideo lesson
A few explanations on how (and why) to bind dplyr commands with ggplot2 commands through chaining.
-
29Building Column ChartsVideo lesson
How to create a column chart on a subset of your data using both dplyr and ggplot2 commands.
-
30Building Mean Plot ChartsVideo lesson
How to create a mean plot chart on a subset of your data using both dplyr and ggplot2 commands.
-
31Building Scatterplot ChartsVideo lesson
How to create a scatterplot chart on a subset of your data using both dplyr and ggplot2 commands.
-
32Building HistogramsVideo lesson
How to create a histogram chart on a subset of your data using both dplyr and ggplot2 commands.
-
33Building Boxplot ChartsVideo lesson
How to create a boxplot chart on a subset of your data using both dplyr and ggplot2 commands.
External Links May Contain Affiliate Links read more