Tag Archives: dplyr

ByHariharan Rajendran

R Package dplyr Function vs T-SQL – Part 2

Following my previous post, we have another few functions on dplyr package which will be covered in this post.

Arrange Function

As part of data modelling, we need to sort the data to analyze further.

In T-SQL we can easily perform the sort on the data as like below. It uses “Order by” clause to sort the data.

In R, we need to use arrange function.

#Arrange

DF_Arrange<- DF_select %>%

group_by(State) %>%

summarise(Total = sum(Price)) %>%

arrange(desc(Total))

DF_Arrange

Mutate Function

The next, very common task is to build the calculated column to satisfy the business logic. This can be easily done with the help of “mutate” function.

#mutate

DF_mutate <- DF_select %>%

group_by(State) %>%

summarise(Total = sum(Price)) %>%

mutate(“10%of Total” = Total/10) %>%

arrange(desc(Total))

DF_mutate

ByHariharan Rajendran

R Package dplyr Function vs T-SQL

R is one of the very famous tools to handle the data science projects because it has all the capabilities right from the extracting the data from different sources, data modelling and transformation, data visualization and finally building machine learning models using the data.

This post explains how the data modelling can be done with R using “dplyr” package.

To make it easy, let me compare this dplyr function with T-SQL.

First, let us load the data into R studio.

fulldata<-read.csv(“D:\\Projects 2018\\DSKA\\DataWranglingDemo.csv”)

fulldata

Select Function

When we have a large dataset with more than 1000 columns, if we need only certain columns then we can use “SELECT” option to choose the specific columns. Check the below example,

 

The same option is available in R with dplyr package.

#install.packages(“dplyr”)

require(“dplyr”)

#select

DF_select <- fulldata %>% select (State,Price)

DF_select

Filter Function

In T-SQL, we have where condition to filter any data with different conditions.

The same filter option is available with dplyr functions.

#filter

DF_select %>% filter(State==”Alabama”)

Same as T-SQL, we have group by, summarise functions are available in R

Group by Function

#Group by

DF_Group <- DF_select %>% group_by(State)

DF_Group

Summarise Function

#Summarise

DF_Sum<- DF_select %>% group_by(State) %>%

summarise(Total = sum(Price))

DF_Sum

1