Following my previous post, we have another few functions on dplyr package which will be covered in this post.
As part of data modelling, we need to sort the data to analyze further.
In T-SQL we can easily perform the sort on the data as like below. It uses “Order by” clause to sort the data.
In R, we need to use arrange function.
#Arrange
DF_Arrange<- DF_select %>%
group_by(State) %>%
summarise(Total = sum(Price)) %>%
arrange(desc(Total))
DF_Arrange
The next, very common task is to build the calculated column to satisfy the business logic. This can be easily done with the help of “mutate” function.
#mutate
DF_mutate <- DF_select %>%
group_by(State) %>%
summarise(Total = sum(Price)) %>%
mutate(“10%of Total” = Total/10) %>%
arrange(desc(Total))
DF_mutate
R is one of the very famous tools to handle the data science projects because it has all the capabilities right from the extracting the data from different sources, data modelling and transformation, data visualization and finally building machine learning models using the data.
This post explains how the data modelling can be done with R using “dplyr” package.
To make it easy, let me compare this dplyr function with T-SQL.
First, let us load the data into R studio.
fulldata<-read.csv(“D:\\Projects 2018\\DSKA\\DataWranglingDemo.csv”)
fulldata
When we have a large dataset with more than 1000 columns, if we need only certain columns then we can use “SELECT” option to choose the specific columns. Check the below example,
The same option is available in R with dplyr package.
#install.packages(“dplyr”)
require(“dplyr”)
#select
DF_select <- fulldata %>% select (State,Price)
DF_select
In T-SQL, we have where condition to filter any data with different conditions.
The same filter option is available with dplyr functions.
#filter
DF_select %>% filter(State==”Alabama”)
Same as T-SQL, we have group by, summarise functions are available in R
#Group by
DF_Group <- DF_select %>% group_by(State)
DF_Group
#Summarise
DF_Sum<- DF_select %>% group_by(State) %>%
summarise(Total = sum(Price))
DF_Sum