Splitting Text in R

You, R
Back

Splitting Text in R

If you’ve worked in a spreadsheet application before, you’re likely familiar with the "text-to-columns" tool. This tool allows you to split one column of data into multiple columns based on a delimiter. This same functionality is also achievable in R through functions such as the "separate" function from the "tidyr" library.

To test this function out, let’s first require the "tidyr" library and then create a test dataframe for us to use.

library(tidyr)
df <- data.frame(person = c("John_Doe", "Jane_Doe"))

We now have a dataframe with one column which contains a first name and a last name combined by an underscore. Let’s now split the two names into their own separate columns.

df <- df %>% separate(person, c("first_name", "last_name"), "_")

Let’s break down what just happened. We first declared that "df" was going to be equal to the output of the function that followed by typing df <-. Next we told the separate function that it would be altering the existing dataframe called "df" by typing "df %>%".

We then gave the separate function three arguments. The first argument was the column we were going to be editing, "person". The second argument was the names of our two new columns, "first_name" and "last_name". Finally, the third argument was our desired delimiter, "_".

© Trevor French.RSS