R is one of the very famous tools to handle the data science projects because it has all the capabilities right from the extracting the data from different sources, data modelling and transformation, data visualization and finally building machine learning models using the data.
This post explains how the data modelling can be done with R using “dplyr” package.
To make it easy, let me compare this dplyr function with T-SQL.
First, let us load the data into R studio.
When we have a large dataset with more than 1000 columns, if we need only certain columns then we can use “SELECT” option to choose the specific columns. Check the below example,
The same option is available in R with dplyr package.
DF_select <- fulldata %>% select (State,Price)
In T-SQL, we have where condition to filter any data with different conditions.
The same filter option is available with dplyr functions.
DF_select %>% filter(State==”Alabama”)
Same as T-SQL, we have group by, summarise functions are available in R
DF_Group <- DF_select %>% group_by(State)
DF_Sum<- DF_select %>% group_by(State) %>%
summarise(Total = sum(Price))