R Data Structures

R statistics, r variables, creating variables in r.

Variables are containers for storing data values.

R does not have a command for declaring a variable. A variable is created the moment you first assign a value to it. To assign a value to a variable, use the <- sign. To output (or print) the variable value, just type the variable name:

From the example above, name and age are variables , while "John" and 40 are values .

In other programming language, it is common to use = as an assignment operator. In R, we can use both = and <- as assignment operators.

However, <- is preferred in most cases because the = operator can be forbidden in some context in R.

Character variables can be declared by either using single or double quotes:

Print / Output Variables

Compared to many other programming languages, you do not have to use a function to print/output variables in R. You can just type the name of the variable:

However, R does have a print() function available if you want to use it. This might be useful if you are familiar with other programming languages, such as Python , which often use a print() function to output variables.

And there are times you must use the print() function to output code, for example when working with for loops (which you will learn more about in a later chapter):

Conclusion: It is up to your if you want to use the print() function or not to output code. However, when your code is inside an R expression (for example inside curly braces {} like in the example above), use the print() function if you want to output the result.

Get Certified

COLOR PICKER

colorpicker

Contact Sales

If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail: [email protected]

Report Error

If you want to report an error, or if you want to make a suggestion, send us an e-mail: [email protected]

Top Tutorials

Top references, top examples, get certified.

Statology

Statistics Made Easy

How to Use the assign() Function in R (3 Examples)

The assign() function in R can be used to assign values to variables.

This function uses the following basic syntax:

assign(x, value)

  • x : A variable name, given as a character string.
  • value : The value(s) to be assigned to x.

The following examples show how to use this function in practice.

Example 1: Assign One Value to One Variable

The following code shows how to use the assign() function to assign the value of 5 to a variable called new_variable:

When we print the variable called new_variable , we can see that a value of 5 appears.

Example 2: Assign Vector of Values to One Variable

The following code shows how to use the assign() function to assign a vector of values to a variable called new_variable:

When we print the variable called new_variable , we can see that a vector of values appears.

Example 3: Assign Values to Several Variables

The following code shows how to use the assign() function within a for loop to assign specific values to several new variables:

By using the assign() function with a for loop, we were able to create four new variables.

Additional Resources

The following tutorials explain how to use other common functions in R:

How to Use the dim() Function in R How to Use the table() Function in R How to Use sign() Function in R

' src=

Published by Zach

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

logo

Stats and R

Data manipulation in r, introduction, concatenation, seq() and rep(), elements of a vector, type and length, finding the vector type, modifications of type and length, numerical operators, logical operators, all() and any(), operations on character strings vector, orders and vectors, creating factors, creating lists, getting details on an object, line and column names, first or last observations, random sample of observations, based on row or column numbers, based on variable names, based on one or multiple criterion, transform a continuous variable into a categorical variable, sum and mean in rows, sum and mean in column, recode categorical variables, change reference level, rename variable names, create a data frame manually, merging two data frames, add new observations from another data frame, add new variables from another data frame, extraction from dates, exporting and saving, looking for help.

variable assignment in r

Note that this article is inspired from a workshop entitled “Introduction to data analysis with R”, given by UCLouvain’s Statistical Methodology and Computing Service. See all their workshops on their website .

Not all data frames are as clean and tidy as you would expect. Therefore, after importing your data frame into RStudio , most of the time you will need to prepare it before performing any statistical analyses. Data manipulation can even sometimes take longer than the actual analyses when the quality of the data is poor.

Data manipulation include a broad range of tools and techniques. We present here in details the manipulations that you will most likely need for your projects in R. Do not hesitate to let me know (as a comment at the end of this article for example) if you find other data manipulations essential so that I can add them.

In this article we show the main functions to manipulate data in R. We first illustrate these functions on vectors, factors and lists. We then illustrate the main functions to manipulate data frames and dates/times in R.

For those who are interested in going further, see also an introduction to data manipulation in R with the {dplyr} package .

We can concatenate (i.e., combine) numbers or strings with c() :

Note that by default R displays 7 decimals. You can modify it with options(digits = 2) (two decimals).

It is also possible to create a sequence of consecutive integers :

seq() allows to make a vector defined by a sequence. You can either choose the increment:

or its length:

On the other hand, rep() creates a vector which is the repetition of numbers or strings:

You can also create a vector which is the repetition of numbers and strings:

but in that case, the number 2 will be considered as a string too (and not as a numeric ) since there is at least one string in the vector.

There are three ways to assign an object in R:

You can also assign a vector to another vector, for example:

We can select one or several elements of a vector by specifying its position between square brackets:

Note that in R the numbering of the indices starts at 1 (and no 0 like other programming languages) so x[1] gives the first element of the vector x .

We can also use booleans (i.e., TRUE or FALSE ) to select some elements of a vector. This method selects only the elements corresponding to TRUE :

Or we can give the elements to withdraw:

The main types of a vector are numeric , logical and character . For more details on each type, see the different data types in R .

class() gives the vector type:

As you can see above, the class of a vector will be numeric only if all of its elements are numeric. As soon as one element is a character, the class of the vector will be a character.

length() gives the length of a vector:

So to select the last element of a vector (in a dynamic way), we can use a combination of length() and [] :

We can find the type of a vector with the family of is.type functions:

Or in a more generic way with the is() function:

We can change the type of a vector with the as.numeric() , as.logical() and as.character() functions:

It is also possible to change its length:

As you can see, the first elements of the vector are conserved while all others are removed. In this case, the first 4 since we specified a length of 4.

The basic numerical operators such as + , - , * , / and ^ can be applied to vectors:

It is also possible to compute the minimum, maximum , sum, product, cumulative sum and cumulative product of a vector:

The following mathematical operations can be applied too:

  • sqrt() (square root)
  • cos() (cosine)
  • sin() (sine)
  • tan() (tangent)
  • log() (logarithm)
  • log10() (base 10 logarithm)
  • exp() (exponential)
  • abs() (absolute value)

If you need to round a number, you can use the round() , floor() and ceiling() functions:

The most common logical operators in R are:

  • Negation: !
  • Comparisons: < , <= , >= , > , == (equality), != (difference)

As the names suggest, all() return TRUE if conditions are met for all elements, whereas any() returns TRUE if conditions are met for any of the element of a vector:

You can paste two vectors (or more) together:

The argument sep stands for separator and allows to specify the character(s) or symbol(s) used to separate each character strings.

If you do not want to specify a separator, you can use sep = "" or the paste0() function:

To find the positions of the elements containing a given string, use the grep() function:

To extract a character string based on the beginning and the end positions, we can use the substr() function:

Replace a character string by another one if it exists in the vector by using the sub() function:

Split a character string based on a specific symbol with the strsplit() function:

To transform a character vector to uppercase and lowercase:

We can sort the elements of a vector from smallest to largest, or from largest to smallest:

order() gives the permutation to apply to the vector in order to sort its elements:

As you can see, the third element of the vector is the smallest and the second element is the largest. This is indicated by the 3 at the beginning of the output, and the 2 at the end of the output.

Like sort() the decreasing = TRUE argument can also be added:

In this case, the 2 in the output indicates that the second element of the vector is the largest, while the 3 indicates that the third element is the smallest.

rank() gives the ranks of the elements:

The two last elements of the vector have a rank of 2.5 because they are equal and they come after the first but before the fourth rank.

We can also reverse the elements (from the last one to the first one):

Factors in R are vectors with a list of levels, also referred as categories. Factors are useful for qualitative data such as the gender, civil status, eye color, etc.

We create factors with the factor() function (do not forget the c() ):

We can of course create a factor from an existing vector:

We can also specify that the levels are ordered by adding the ordered = TRUE argument:

Note that the order of the levels will follow the order that is specified in the labels argument.

To know the names of the levels:

For the number of levels:

In R, the first level is always the reference level. This reference level can be modified with relevel() :

You see that “T3” is now the first and thus the reference level. Changing the reference level has an impact on the order they are displayed or treated in statistical analyses. Compare, for instance, boxplots with different reference levels.

To know the frequencies for each level:

Note that the relative frequencies (i.e., the proportions) can be found with the combination of prop.table() and table() or summary() :

Remember that a factor is coded in R as a numeric vector even though it looks like a character one. We can transform a factor into its numerical equivalent with the as.numeric() function:

And a numeric vector can be transformed into a factor with the as.factor() or factor() function:

The advantage of factor() is that it is possible to specify a name for each level:

A list is a vector whose elements can be of different natures: a vector, a list, a factor, numeric or character, etc.

The function list() allows to create lists:

There are several methods to extract elements from a list:

To transform a list into a vector:

attributes() gives the names of the elements (it can be used on every R object):

str() gives a short description about the elements (it can also be used on every R object):

Data frames

Every imported file in R is a data frame (at least if you do not use a package to import your data in R ). A data frame is a mix of a list and a matrix: it has the shape of a matrix but the columns can have different classes.

Remember that the gold standard for a data frame is that:

  • columns represent variables
  • lines correspond to observations and
  • each value must have its own cell

Structure of a data frame. Source: R for Data Science by Hadley Wickham & Garrett Grolemund

In this article, we use the data frame cars to illustrate the main data manipulation techniques. Note that the data frame is installed by default in RStudio (so you do not need to import it) and I use the generic name dat as the name of the data frame throughout the article (see here why I always use a generic name instead of more specific names).

Here is the whole data frame:

This data frame has 50 observations with 2 variables ( speed and distance ).

You can check the number of observations and variables with nrow() and ncol() respectively, or both at the same time with dim() :

Before manipulating a data frame, it is interesting to know the line and column names:

To know only the column names:

And to know only the row names:

Subset a data frame

  • To keep only the first 10 observations:
  • To keep only the last 5 observations:
  • To draw a sample of 4 observations without replacement:

If you know what observation(s) or column(s) you want to keep, you can use the row or column number(s) to subset your data frame. We illustrate this with several examples:

  • keep all the variables for the \(3^{rd}\) observation:
  • keep the \(2^{nd}\) variable for all observations:
  • You can mix the two above methods to keep only the \(2^{nd}\) variable of the \(3^{rd}\) observation:
  • keep several observations; for example observations \(1\) to \(5\) , the \(10^{th}\) and the \(15^{th}\) observation for all variables:
  • remove observations 5 to 45:
  • tip: to keep only the last observation, use nrow() instead of the row number:

This way, no matter the number of observations, you will always select the last one. This technique of using a piece of code instead of a specific value is to avoid “hard coding”. Hard coding is generally not recommended (unless you want to specify a parameter that you are sure will never change) because if your data frame changes, you will need to manually edit your code.

As you probably figured out by now, you can select observations and/or variables of a dataset by running dataset_name[row_number, column_number] . When the row (column) number is left empty, the entire row (column) is selected.

Note that all examples presented above also work for matrices:

To select one variable of the dataset based on its name rather than on its column number, use dataset_name$variable_name :

Accessing variables inside a data frame with this second method is strongly recommended compared to the first if you intend to modify the structure of your database. Indeed, if a column is added or removed in the data frame, the numbering will change. Therefore, variables are generally referred to by its name rather than by its position (column number). In addition, it is easier to understand and interpret code with the name of the variable written (another reason to call variables with a concise but clear name). There is only one reason why I would still use the column number; if the variables names are expected to change while the structure of the data frame will not change.

To select variables, it is also possible to use the select() command from the powerful dplyr package (for compactness only the first 6 observations are displayed thanks to the head() command):

This is equivalent than removing the distance variable:

Instead of subsetting a data frame based on row/column numbers or variable names, you can also subset it based on one or multiple criterion:

  • keep only observations with speed larger than 20. The first argument refers to the name of the data frame, while the second argument refers to the subset criteria:
  • keep only observations with distance smaller than or equal to 50 and speed equal to 10. Note the == (and not = ) for the equal criteria:
  • use | to keep only observations with distance smaller than 20 or speed equal to 10:
  • to filter out some observations, use != . For instance, to keep observations with speed not equal to 24 and distance not equal to 120 (for compactness only the last 6 observations are displayed thanks to the tail() command):

Note that it is also possible to subset a data frame with split() :

The above code will split your data frame into several lists, one for each level of the factor variable.

Create a new variable

Often, a data frame can be enhanced by creating new variables based on other variables from the initial data frame, or simply by adding a new variable manually.

In this example, we create two new variables; one being the speed times the distance (which we call speed_dist ) and the other being a categorization of the speed (which we call speed_cat ). We then display the first 6 observations of this new data frame with the 4 variables:

Note than in programming, a character string is generally surrounded by quotes (e.g., "character string" ) and R is not an exception.

To transform a continuous variable into a categorical variable (also known as qualitative variable ):

This transformation is for example often done on age, when the age (a continuous variable) is transformed into a qualitative variable representing different age groups.

In survey with Likert scale (used in psychology, among others), it is often the case that we need to compute a score for each respondents based on multiple questions. The score is usually the mean or the sum of all the questions of interest.

This can be done with rowMeans() and rowSums() . For instance, let’s compute the mean and the sum of the variables speed , dist and speed_dist (variables must be numeric of course as a sum and a mean cannot be computed on qualitative variables!) for each row and store them under the variables mean_score and total_score :

It is also possible to compute the mean and sum by column with colMeans() and colSums() :

This is equivalent than:

but it allows to do it for several variables at a time.

Categorical variables and labels management

For categorical variables, it is a good practice to use the factor format and to name the different levels of the variables.

  • for this example, let’s create another new variable called dist_cat based on the distance and then change its format from numeric to factor (while also specifying the labels of the levels):
  • to check the format of a variable:

This will be sufficient if you need to format only a limited number of variables. However, if you need to do it for a large amount of categorical variables, it quickly becomes time consuming to write the same code many times. As you can imagine, it possible to format many variables without having to write the entire code for each variable one by one by using the within() command:

Alternatively, if you want to transform several numeric variables into categorical variables without changing the labels, it is best to use the transform() function. We illustrate this function with the mpg data frame from the {ggplot2} package:

It is possible to recode labels of a categorical variable if you are not satisfied with the current labels. In this example, we change the labels as follows:

  • “small distance” becomes “short distance”
  • “big distance” becomes “large distance”

For some analyses, you might want to change the order of the levels. For example, if you are analyzing data about a control group and a treatment group, you may want to set the control group as the reference group. By default, levels are ordered by alphabetical order or by its numeric value if it was transformed from numeric to factor.

  • to check the current order of the levels (the first level being the reference):

In this case, “short distance” being the first level it is the reference level. It is the first level because it was initially set with a value equal to 1 when creating the variable.

  • to change the reference level:

Large distance is now the first and thus the reference level.

To rename variable names as follows:

  • dist \(\rightarrow\) distance
  • speed_dist \(\rightarrow\) speed_distance
  • dist_cat \(\rightarrow\) distance_cat

use the rename() command from the dplyr package:

Although most analyses are performed on an imported data frame, it is also possible to create a data frame directly in R:

By default, the merge is done on the common variables (variables that have the same name). However, if they do not have the same name, it is still possible to merge the two data frames by specifying their names:

We want to merge the two data frames by the subject number, but this number is referred as person in the first data frame and patient in the second data frame, so we need to indicate it:

In order to add new observations from another data frame, the two data frames need to have the same column names (but they can be in a different order):

As you can see, data for persons 5 to 8 have been added at the end of the data frame dat1 (because dat1 comes before dat3 in the rbind() function).

It is also possible to add new variables to a data frame with the cbind() function. Unlike rbind() , column names do not have to be the same since they are added next to each other:

If you want to add only a specific variable from another data frame:

or more simply with the data.frame() function:

Missing values

Missing values (represented by NA in RStudio, for “Not Applicable”) are often problematic for many analyses because many computations including a missing value has a missing value for result.

For instance, the mean of a series or variable with at least one NA will give a NA as a result. The data frame dat created in the previous section is used for this example:

The na.omit() function avoids the NA result, doing as if there was no missing value:

Moreover, most basic functions include an argument to deal with missing values:

is.na() indicates if an element is a missing value or not:

Note that “NA” as a string is not considered as a missing value:

To check whether there is at least one missing value in a vector or data frame:

Nonetheless, data frames with NAs are still problematic for some types of analysis. Several alternatives exist to remove or impute missing values.

A simple solution is to remove all observations (i.e., rows) containing at least one missing value. This is done by keeping only observations with complete cases:

Be careful when removing observations with missing values, especially if missing values are not “missing at random”. It is not because it is possible (and easy) to remove them, that you should do it in all cases. This is, however, beyond the scope of the present article.

Instead of removing observations with at least one NA, it is possible to impute them, that is, replace them by some values such as the median or the mode of the variable. This can be done easily with the command impute() from the package Hmisc :

When the median/mode method is used (the default), character vectors and factors are imputed with the mode. Numeric and integer vectors are imputed with the median. Again, use imputations carefully. Other packages offer more advanced imputation techniques. However, we keep it simple and straightforward for this article as advanced imputations is beyond the scope of introductory data manipulations in R.

Scaling (also referred as standardizing) a variable is often used before a Principal Component Analysis (PCA) 1 when variables of a data frame have different units. Remember that scaling a variable means that it will compute the mean and the standard deviation of that variable. Then each value (so each row) of that variable is “scaled” by subtracting the mean and dividing by the standard deviation of that variable. Formally:

\[z = \frac{x - \bar{x}}{s}\]

where \(\bar{x}\) and \(s\) are the mean and the standard deviation of the variable, respectively.

To scale one or more variables in R use scale() :

Dates and times

In R the default date format follows the rules of the ISO 8601 international standard which expresses a day as “2001-02-13” (yyyy-mm-dd). 2

Date can be defined by a string of characters or a number. For example, October 1st, 2016:

An example with date and time vectors:

Find more information on how to express a date and time format with help(strptime) .

We can extract:

If a copy-paste is not sufficient, you can save an object in R format with save() :

or using write.table() , write.csv() or write.xlsx() :

If you need to send every results into a file instead of the console:

(Don’t forget to stop it with sink() .)

You can always find some help about:

  • a function: ?function or help(function)
  • a package: help(package = packagename)
  • a concept: help.search("concept") or apropos("concept")

Otherwise, Google is your best friend!

Thanks for reading.

I hope this article helped you to manipulate your data in RStudio. For those who are interested in going further, see also an introduction to data manipulation in R with the {dplyr} package .

Now that you know how to import a data frame into R and how to manipulate it, the next step would probably be to learn how to perform descriptive statistics in R . If you are looking for more advanced statistical analyses using R, see all articles about R .

As always, if you have a question or a suggestion related to the topic covered in this article, please add it as a comment so other readers can benefit from the discussion.

Principal Component Analysis (PCA) is a useful technique for exploratory data analysis, allowing a better visualization of the variation present in a data frame with a large number of variables. When there are many variables, the data cannot easily be illustrated in their raw format. To counter this, the PCA takes a data frame with many variables and simplifies it by transforming the original variables into a smaller number of “principal components”. The first dimension contains the most variance in the data frame and so on, and the dimensions are uncorrelated. Note that PCA is done on quantitative variables. ↩︎

For your information, note that this date format is not the same for every software! Excel, for instance, uses a different format. ↩︎

Related articles

  • Data types in R
  • How to import an Excel file in RStudio?
  • How to install R and RStudio?
  • Introduction to data manipulation in R with {dplyr}
  • Top 10 errors in R and how to fix them

Liked this post?

  • Get updates every time a new article is published (no spam and unsubscribe anytime):

Yes, receive new posts by email

  • Support the blog

Consulting FAQ Contribute Sitemap

Working with Data in R

Chapter 5 creating variables.

This section focuses on the creation of new variables for your analysis as part of an overall strategy of cleaning your data. Often times, data will come to you coded in a certain way, but you want to transform it to make it easier to work with. Our analyses often focus on linear changes in a given variable, but data is typically not coded in such a way to represent those types of differences. We’ll explore that more below.

Some of the tools we need to practice with aren’t directly related to creating new variables or cleaning the data we have, but just understanding what issues there might be. Before you start analysis it’s necessary to step back and observe your data. Understand how the different values are coded and what you might need to do. So, each step below both focuses on figuring out what to do with our data, as well as doing it .

We’ll use the same file I was using in the earlier chapter on loading data , but we’ll load it in from GitHub so that it’s straightforward for anyone to access and use. Notice that we’re using the same read.csv() command, because that is how the file is saved even thought it’s located on the interwebs.

5.1 Using ifelse

We’ll save the file as a new object called ‘dat’ and start by taking a look at the top few lines with head() to get a feel for whats in the data.

The first few columns are what could be called administrative. They’re unique identifiers for the different observations, the timing of the survey they’ve taken, and a bit of other information. So for now we don’t need to pay much attention to MONTH, HWTFINL, CPSID, PERNUM, WTFINL, or CPSIDP.

The next few columns are concerned with different geographies we have for the observations. This data is for individuals, but we also know the individuals region (REGION), state (STATEFIP and STATECENSUS), and metropolitan area (METRO and METAREA). We’ll talk about those more in a later chapter on AGGREGATION, but we don’t need to worry about them at the moment.

So let’s start looking by looking at AGE. Age is the rare variable that typically comes “finished” for us. Sometimes, we don’t need to make any changes to get it ready for analysis.

We can use the command is.numeric() to check whether the column AGE is numeric, and TRUE tells us that is correct. Age is numeric, which is what we would expect. We can check the summary statistics for it just to make sure everything looks as we expect.

That looks right to me. So AGE is ready to go if we want to use it as a numeric variable. We’ll talk more later about what we can do if we want to look at people based on their generations or categories of their age, rather than just using the years they’ve been alive as a variable.

What about the variable SEX? Let’s see what values we have in the column using the table() command. Table gives us a count of how many observations of each type we have.

Okay, so we have 1’s and 2’s. One issue is that we don’t know what those mean. We have more 2’s than 1’s, but we don’t know whether Male’s were coded as 1’s or 2’s so these values have no meaning to us at the moment. We need to look at the code book for the data to understand what those values represent about respondents. It’s really useful to look through the code book before you start using your data, because most of the questions we’re asking in this section could be answered by just reading it… but reading instructions is boring, so I usually skip it as long as I can. What does the code book tell us about the variable SEX?

variable assignment in r

1’s are Male and 2’s are Female. We can leave the variable as is and remember those values, or we can also create a new variable that’s named either Male or Female so that we know the gender of respondents a little more easily. To do that we can use the command ifelse().

Great, now anyone that was coded as a 1 in the column SEX is now coded as a 1 in the column Male. That isn’t a huge change, but now we can more intuitively know the gender of respondents by just looking at the column Male. We could also create a variable called Female for respondents that were 2’s for SEX.

It really doesn’t matter whether we create a column called Male or Female. They have the same information, just coded a little differently. And we wouldn’t need to create both, since if we have 1 we know everything the second variable tells us.

We’re going to use ifelse() statements a lot to get the data into the exact structure we want, so let’s take a closer look at it.

variable assignment in r

An ifelse statement reads a bit like a sentence. We’re asking R to determine which of two things is true, either the variable SEX equals 2 or it doesn’t. If it does equal 2 (dat$SEX==2), then we want the new variable we’re creating named Female to take the value 1 (the first of two options we’re giving). If the column SEX doesn’t equal 2, then the new variable should take the value 0. As a result, our new variable Female is a series of 0’s and 1’s.

An ifelse command has 3 essential parts. 1. What should it to see if it is true, 2. what should it do if that is true, 3. What should it do if that isn’t true. We need to supply the command with something to check, and two things to do depending on whether that is true or not.

The numerical operator in an ifelse statement is really important. Here we wanted to determine if the value of SEX was 2. We can use any numerical operator though. The table below displays common numerical operators and what they’re asking about for different variables. I’ll demonstrate more of these below.

So we used two examples of equals above. Let’s see how ‘doesn’t equal’ (!=) works out.

Exactly the same. We can either ask R to code all of the observations that are 2’s, or all of the observations that aren’t 1’s and we get the same essential result.

We can also use the greater than/lesser than with SEX because it’s a numerical variable.

If the values is equal to or less than 2 we coded the observation as a 1 in Female above, or we could code the observations that are equal to or less than 1 as 1 if the variable is Male. That’s a little more silly though since we only have two observations. Above we’ve been practicing creating dichotomous or dummy variables, which are variables which take two values. Either the new variable is a 1 or a 0 depending on the value of something else. We can also create categorical variables with words as our new values. Below we’re going to ask whether the column SEX is equal to 1, and if so make the new variable take the value “Male”, and otherwise give it the value “Female”.

If our original data was coded as words, we can use the equal to operator to create new variables as well.

And round and round we go. If the data is coded as words, we can only use the equal to/not qual to operator, because it wouldn’t make sense to be “greater than” a word like male or blue or fish. You’ll also need to be really careful with spelling. R will only code a value as 1 if the values in the column match what you write exactly.

Below I’m going to make a small typo and write “Mal” instead of “Male”. What do you think is going to happen?

All of the observations are coded as 0. Why? I asked R to code anything that had the value “Mal” as a 1. Being that nothing in the column Gender is coded “Mal” everything is coded as 0. That’s why it’s important to be careful checking the values that your column has before using ifelse and to see the values your new variable has afterwards.

5.2 Lesser and Greater Than

Let’s go back to AGE to better show the utility of the “greater than” and “lesser than” operators". Let’s say we want to create a variable that takes the value of 1 if the person is over the age of 65. We’re not interested in just linear changes in Age as a person gets older, we’re intereted in differences between generations maybe.

It looks like 22235 respondents were over that age. What if we wanted to create a variable that takes the value of 1 if they’re under the age of 18?

We can also use multiple numerical operators in the same ifelse. Let’s say we want to create a variable that specifies if a person is a millennial. Millennials were born between 1981 and 1996, so for a survey from 2018 they would be between the ages of 22 and 37. We can combine multiple numerical operators with either the and (&) or an or (|). (Tip: the straight line | is near the enter key on your keyboard, the ampersand (&) is above the 7). The ifelse statement below is going to look and see if something is BOTH under or equal to 37 years old AND equal to or over the age of 22.

If we wanted to create a variable for someone that was either under 18 or over 65 we can do that with the or statement.

You’ll notice that this new variable youngORold has exactly as many 1’s as the two we created earlier did for over 65 and under 18. That should be correct since we asked if the observations is EITHER under 18 OR over 65.

Let’s make it a little more complicated and move to our next variable: RACE. What values do we have there?

So we’re going to have to turn back to our code book to figure out what any of that means.

variable assignment in r

We wouldn’t want to use this variable as is. It’s great that it offers the level of detail that it does, but looking at the table above and the code book we don’t want to leave people in as narrow of categories as it offers. So what we want to do typically is take the information in that column and create a few new variables with broader categories.

Here we face a challenge that you’ll encounter often in working with individual survey responses. How narrow or broad do you want to be in coding their race/ethnicity? That question can obviously be fraught with ethical challenges, and I don’t want to dismiss them in setting them aside. Right now we’re focus on figuring out how to get our data into a condition where it’s ready for analysis.

We don’t want people to be individually identifiable by any of the values we have in the survey. For instance, there’s only one person that is coded as “White-American Indian-Asian-Hawaiian/Pacific Islander”; that becomes a really narrow category then.

Generally, individuals are grouped into larger categories, and most typically those would be White, Black, Asian, Latino/Latina (if it’s available) and Other/Mixed Race. Sometimes American Indians are included as a separate category, other times they’re grouped in with “Other”. When you have small numbers of a given category it becomes really difficult to estimate differences between them and larger groups, and so sometimes you’re just forced to combine with another category. Let’s start by coding Whites, Blacks, and American Indians below since those are the first three categories shown in our code book.

This is a good moment to say that in creating new variable names you want to do two things: make it recognizable for yourself and short. The more characters you include, the more you will have to type later when you use it. But if I just name the variable for Whites “W” I might not remember what that is later. It’s a balancing act. Some names just become intuitive from practice. I didn’t make up the name AmInd for American Indians, it’s something I’ve seen elsewhere in data so I learned to use it myself. So make sure your new variable names are concise and clea.

What do we want to do with those that are coded as “Asian only” (651) and “Hawaiian/Pacific Islander only” (652). Do we want to combine them or leave them separate? Honestly, that is going to be determined by your research question and how central differences between different racial groups are for your analysis, as well as the amount of data you have. Here, I’ll code them both ways just for extra practice.

What that leaves is a lot of categorizations for either Other or combinations of different races. Again, it’s possible that your research will dictate that these should be left separate, but often we’ll want to combine those into a single category. Here we can do that by just asking whether the variable RACE is greater than 800.

5.3 Stacking ifelse

Above we have created 7 new variables denoting people’s races (although we created Asian in a few different ways) as a series of 0’s and 1’s. We can also create a new variable using words for the categories like we did in the variable Gender above by stacking ifelse() statements. Let’s jump into creating the first two values so we can talk in more detail about what we’re doing.

In the first statement we ask if RACE equals 100, and set those values equal to “White” in our new variable. For the values that aren’t 100 we set them equal to NA, because we’ll fill their values in later. With the second ifelse() we set the values of 200 equal to “Black”, but we don’t want to overwrite the ones we made “White” in the first command for those that don’t equal 200 we want to leave them with their previous value of RACE2. That becomes more important as we stack on more values, and we want to create all of this within a single variable (Race2).

We would probably rather have 5 separate dummy variables if we were using this for a regression. But if we’re just creating a graph for this variable, it’s easier if they’re all just included in one column.

variable assignment in r

Thus, you might use one strategy for the graph you include in a paper, and then use a different set of variables (that have equivalent information later in your analyses). The strategies you’ll use in creating new variables will always be indicated by the analysis you’re attempting to conduct.

R Tutorial

  • R - Overview
  • R - Environment Setup
  • R - Basic Syntax
  • R - Data Types

R - Variables

  • R - Operators
  • R - Decision Making
  • R - Functions
  • R - Strings
  • R - Vectors
  • R - Matrices
  • R - Factors
  • R - Data Frames
  • R - Packages
  • R - Data Reshaping
  • R Data Interfaces
  • R - CSV Files
  • R - Excel Files
  • R - Binary Files
  • R - XML Files
  • R - JSON Files
  • R - Web Data
  • R - Database
  • R Charts & Graphs
  • R - Pie Charts
  • R - Bar Charts
  • R - Boxplots
  • R - Histograms
  • R - Line Graphs
  • R - Scatterplots
  • R Statistics Examples
  • R - Mean, Median & Mode
  • R - Linear Regression
  • R - Multiple Regression
  • R - Logistic Regression
  • R - Normal Distribution
  • R - Binomial Distribution
  • R - Poisson Regression
  • R - Analysis of Covariance
  • R - Time Series Analysis
  • R - Nonlinear Least Square
  • R - Decision Tree
  • R - Random Forest
  • R - Survival Analysis
  • R - Chi Square Tests
  • R Useful Resources
  • R - Interview Questions
  • R - Quick Guide
  • R - Useful Resources
  • R - Discussion
  • Selected Reading
  • UPSC IAS Exams Notes
  • Developer's Best Practices
  • Questions and Answers
  • Effective Resume Writing
  • HR Interview Questions
  • Computer Glossary

A variable provides us with named storage that our programs can manipulate. A variable in R can store an atomic vector, group of atomic vectors or a combination of many Robjects. A valid variable name consists of letters, numbers and the dot or underline characters. The variable name starts with a letter or the dot not followed by a number.

Variable Assignment

The variables can be assigned values using leftward, rightward and equal to operator. The values of the variables can be printed using print() or cat() function. The cat() function combines multiple items into a continuous print output.

When we execute the above code, it produces the following result −

Note − The vector c(TRUE,1) has a mix of logical and numeric class. So logical class is coerced to numeric class making TRUE as 1.

Data Type of a Variable

In R, a variable itself is not declared of any data type, rather it gets the data type of the R - object assigned to it. So R is called a dynamically typed language, which means that we can change a variable’s data type of the same variable again and again when using it in a program.

Finding Variables

To know all the variables currently available in the workspace we use the ls() function. Also the ls() function can use patterns to match the variable names.

Note − It is a sample output depending on what variables are declared in your environment.

The ls() function can use patterns to match the variable names.

The variables starting with dot(.) are hidden, they can be listed using "all.names = TRUE" argument to ls() function.

Deleting Variables

Variables can be deleted by using the rm() function. Below we delete the variable var.3. On printing the value of the variable error is thrown.

All the variables can be deleted by using the rm() and ls() function together.

Introduction to R

Chapter 3 variables in r, 3.1 introduction.

In the previous chapter, we learnt to install RStudio. In this chapter, we will learn about variables and data types. You can skip this chapter, if you have prior experience in any other programming language.

3.2 What is a variable?

  • variables are the fundamental elements of any programming language
  • they are used to represent values that are likely to change
  • they reference memory locations that store information/data

Let us use a simple case study to understand variables. Suppose you are computing the area of a circle whose radius is 3. In R, you can do this straight away as shown below:

But you cannot reuse the radius or the area computed in any other analysis or computation. Let us see how variables can change the above scenario and help us in reusing values and computations.

3.3 Creating Variables

A variable consists of 3 components:

  • variable name
  • assignment operator
  • variable value

We can store the value of the radius by creating a variable and assigning it the value. In this case, we create a variable called radius and assign it the value 3 using the assignment operator <- .

Now that we have learnt to create variables, let us see how we can use them for other computations. For our case study, we will use the radius variable to compute the area of a circle.

3.4 Using Variables

We will create two variables, radius and pi , and use them to compute the area of a circle and store it in another variable area .

3.5 Components of a Variable

variable assignment in r

3.6 Naming Conventions

  • Name must begin with a letter. Do not use numbers, dollar sign ( $ ) or underscore ( _ ).
  • The name can contain numbers or underscore. Do not use dash ( - ) or period ( . ).
  • Do not use the names of keywords and avoid using the names of built in functions.
  • Variables are case sensitive; average and Average would be different variables.
  • Use names that are descriptive. Generally, variable names should be nouns.
  • If the name is made of more than one word, use underscore to separate the words.

Creating new variables

Use the assignment operator <- to create new variables. A wide array of operators and functions are available here.

(To practice working with variables in R, try the first chapter of this free interactive course .)

Recoding variables

In order to recode data, you will probably use one or more of R's control structures .

Renaming variables

You can rename variables programmatically or interactively.

Variable types in R

R supports a diverse range of variable types, each tailored to handle specific data forms:

  • Numeric: These represent numbers and can be either whole numbers or decimals.
  • Character: This type is for textual data or strings.
  • Logical: These are binary and can take on values of TRUE or FALSE.
  • Factor: Ideal for categorical data, factors can help in representing distinct categories within a dataset.
  • Date: As the name suggests, this type is used for date values.

When creating new variables, it's essential to ensure they are of the appropriate type for your analysis. If unsure, you can use the class() function to check a variable's type.

Checking and changing variable types

Ensuring your variables are of the correct type is crucial for accurate analysis:

  • Checking Variable Type: The class() function can help you determine the type of a variable.
  • Changing Variable Type: If you need to convert a variable from one type to another, R provides functions like as.numeric(), as.character(), and as.logical().

Variable scope

Understanding the scope of a variable is essential:

  • Global Variables: These are accessible throughout your entire script or session.
  • Local Variables: These are confined to the function or environment they are created in and can't be accessed outside of it. When creating new variables, especially within functions, always be mindful of their scope to avoid unexpected behaviors.

Using variables with functions

Variables play a central role when working with functions:

  • Passing Variables: You can provide variables as arguments to functions, allowing for dynamic computations based on variable values.
  • Storing Function Outputs: Functions can return values, and you can assign these values to new or existing variables for further analysis.

Variable operations

Depending on their type, you can perform various operations on variables:

  • Arithmetic Operations: For numeric variables, you can carry out standard mathematical operations like addition, subtraction, multiplication, and division.
  • String Operations: For character variables, operations like concatenation allow you to combine multiple strings into one.

Recoding involves changing the values of a variable based on certain conditions. For instance, you might want to group ages into categories like "young", "middle-aged", and "senior". R offers various control structures to facilitate this process. When recoding, always ensure that the new categories or values make logical sense and serve the purpose of your analysis.

There might be instances where you'd want to rename variables for clarity or consistency. R provides two primary ways to rename variables:

  • Interactively: You can use the fix() function to open a data editor where you can rename variables directly.
  • Programmatically: There are various packages and functions in R that allow you to rename variables within your script. When renaming, ensure that the new names are descriptive and adhere to R's variable naming conventions.

Frequently Asked Questions (FAQs) about Variables in R

A: Both <- and = can be used for assignment in R. However, <- is the more traditional and preferred method, especially in scripts and functions. The = operator is often used within function calls to specify named arguments.

A: You can use the class() function to determine the type or class of a variable. This function will return values like "numeric", "character", "factor", and so on, depending on the variable's type.

A: R provides type conversion functions like as.numeric(), as.character(), and as.logical(). You can use these functions to convert a variable to the desired type.

A: Recoding refers to the process of changing or transforming the values of a variable based on certain criteria. For instance, converting a continuous age variable into age categories (e.g., "young", "middle-aged", "senior") is an example of recoding.

A: R offers multiple ways to rename variables. You can do it interactively using the fix() function, which opens a data editor. Alternatively, there are various R packages and functions that allow for programmatic renaming of variables.

A: Yes, variable names in R are case-sensitive. This means that myVariable, MyVariable, and myvariable would be treated as three distinct variables.

A: It's not recommended to use spaces in variable names in R. Instead, you can use underscores (_) or periods (.) to separate words in variable names, like my_variable or my.variable.

A: You can use the rm() function followed by the variable name to remove it from your workspace. It's a good practice to clear unnecessary variables to free up memory.

A: Local variables are confined to the function or environment they are created in and can't be accessed outside of it. In contrast, global variables are accessible throughout your entire script or R session.

A: You can use the ls() function to list all the variables currently present in your workspace.

Variable Assignment in R

In R we operate with variables. A variable can be seen as a container for a value. To get a better conceptual understanding of this, you can go through the following and code-along in your own R -session.

Assigning a Value to a Variable

  • In R , we state values directly in the chunk or the console, e.g.:

Here, we just state 3 , so R simply “throws” that right back at you!

Now, if want to “catch” that 3 we have to assign it to a variable, e.g.:

  • Notice how now we “catch” the 3 and nothing is “thrown” back to you, because we now have the 3 stored in x :

Updating the Value of a Variable

  • Now, we can of course use x moving forward, e.g. by adding 2 :
  • Notice how this does not change x and the result is simply “thrown” right-back-at-ya
  • If we wanted to update x by adding 2 , we would have to “catch” the result as before:
  • Now, we have updated x :

Use one Variable in the Creation of Another

  • Analogue, we can create a new variable using x:
  • Again, this does not change x
  • But rather the result is now stored in y
  • In R , we use the assignment operator <- to perform assignment
  • Variables are not change in place, but needs to be stored
  • Note, this also applies to running e.g. a dplyr -pipeline, where we do not change the dataset by running the pipeline, but we must store the result of the pipeline

Before continuing, make sure that you are on track with the above concepts!

  • Create a new variable my_age containing… You guessed it!
  • Add 0.5 to the variable (I.e. your age, when you’re done with this course)
  • Check the value of my_age , did you remember to assign, thereby updating?

Pipeline Example

  • Let us create some example sequence data:
  • Notice, that our data creation is just “thrown” back at us, we forgot something!
  • Now, we have stored the data in the variable my_dna_data

Note here, that a variable can as we saw before with x and y store a single value, e.g.  2 , but here, we are storing a tibble -object in the variable my_dna_data and in that tibble -object, we have a variable sequence , which contains some randomly generated dna.

But what if we wanted to add a new variable to the tibble -object, which is the lenght of each of the dna-sequences?

Nice! Let’s see that data again then:

Wait! What? Where is the variable we literally just created?

We forgot something… We did not update the my_dna_data , let’s fix that:

  • Note, nothing is “trown” back at us! Let’s verify, that we did indeed update the my_dna_data :

Did it make sense? Check yourself, add a new variable to my_dna_data called sequence_capital by using the function str_to_upper()

That’s it - Hope it helped and remember… Bio data science in R is really fun!

variable assignment in r

  • Onsite training

3,000,000+ delegates

15,000+ clients

1,000+ locations

  • KnowledgePass
  • Log a ticket

01344203999 Available 24/7

Variables in R Programming: A Comprehensive Guide to Handling Data

Explore the world of "Variables in R Programming" with this comprehensive resource on managing data. Learn about the data types in R, covering numerics, characters, logical values, factors, date and time, and data frames. Whether you're a beginner or an experienced programmer, this guide will enhance your understanding of R's data-handling capabilities.

stars

Exclusive 40% OFF

Training Outcomes Within Your Budget!

We ensure quality, budget-alignment, and timely delivery by our expert instructors.

Share this Resource

  • R Programming Course
  • Visual Basic Course
  • Introduction to HTML
  • UI UX Design Course

course

Table of Contents  

1) What are Variables in R Programming? 

2) Declaring and assigning variables 

3) Data types in R Programming 

      a) Numeric 

      b) Character 

      c) Logical 

      d) Factor 

      e) Date and time 

      f) Complex 

      g) Raw 

      h) Data frames 

4) Conclusion 

What are Variables in R Programming?    

R Programming variables act as essential containers that hold data or values. They serve as named references to specific memory locations, allowing programmers to access and manipulate data efficiently. Variables play a fundamental role in storing information, performing calculations, and facilitating data analysis.  

When working with variables in R, programmers need to declare and assign values to them using the appropriate syntax. These variables can represent various data types, such as numeric, character, logical, and more. By understanding the concept of variables in R, programmers gain the ability to handle data effectively and derive meaningful insights from datasets in the world of statistical computing. 

Unlock the Power of Data Analysis with our R Programming Training. Join today!  

Declaring and assigning variables    

In R Programming, declaring and assigning variables is a fundamental process that allows programmers to store data and perform operations on it. It must first be declared in order to use a variable in R, indicating its existence and the type of data it will hold. The process of declaring a variable is straightforward, as R is dynamically typed, meaning it automatically determines the data type based on the assigned value. 

To declare a variable, programmers use the assignment operator, which can be either "<-" or "=". The variable name is placed on the left-hand side of the operator, followed by the value to be assigned on the right-hand side. For example: 

# Declare and assign a numeric variable  age   # Declare and assign a character variable  name   # Declare and assign a logical variable  is_student

In the above examples, we declared three variables: "age," "name," and "is_student." The variable "age" is of numeric data type and holds the value 30. The variable "name" is of character data type and stores the text "John Doe." Lastly, the variable "is_student" is of logical data type and is assigned the value TRUE. 

Programmers can use the "=" operator for variable assignment instead of "<-", though the latter is more commonly used in R Programming conventions. For example: 

# Using "=" for variable assignment  age = 30 

Additionally, R allows the simultaneous assignment of values to multiple variables using the "c()" function, which creates a vector containing the specified values. 

# Simultaneous assignment to multiple variables  x

Moreover, variables can be updated and reassigned with new values at any point during the program's execution. 

# Updating a variable's value  x x # Now, x holds the value 8 

It is important to note that variable names in R are case-sensitive, meaning "age" and "Age" would be treated as distinct variables. Moreover, variable names must begin with a letter and can include letters, numbers, and underscores. 

Unlock your PHP programming potential with our expert-led PHP Programming Training !  

Data types in R Programming    

Data Types in R Programming

Numeric  

The numeric data type in R represents numerical values, which can be either integers or decimals. These numbers are used for mathematical calculations and are fundamental for statistical analysis. For instance, if we want to store the ages of a group of individuals or the scores they obtained in an exam, we would use the numeric data type. For example: 

# Example of numeric data type  age score

Character  

The character data type is used to represent textual or alphanumeric information. Any data enclosed within single or double quotes is considered a character in R. This data type is essential for storing names, addresses, and other textual data. For example: 

# Example of character data type  name city

Logical  

The logical data type in R consists of two possible values: TRUE or FALSE. These logical values are used for conditional statements and logical operations. They are crucial for decision-making processes in programming. For example: 

# Example of logical data type  is_student is_adult

Factor  

The factor data type is specifically designed to handle categorical data. It is prevalent in statistical analysis, where we categorise data into groups or levels. Factors help in efficient data visualisation and modelling. For example: 

# Example of factor data type  gender

Date and time  

R provides specialised data types to handle dates and times. These data types are essential when dealing with temporal information, such as timestamps, durations, or scheduling tasks. For example: 

# Example of date and time data types  today current_time

Complex  

Complex data type comprises numbers with real and imaginary parts. While it may not be as commonly used as other data types, it is essential for specialised mathematical computations. For example:  

# Example of complex data type  z

The raw data type allows the storage of raw bytes or binary data. It is used for low-level manipulations and is not as frequently used in everyday programming. For example: 

# Example of raw data type  binary_data

Data frames  

Data frames are not a basic data type but rather a unique structure in R that combines multiple data types. A data frame is a two-dimensional tabular data structure, resembling a table, where columns can be of different data types. For example: 

# Example of a data frame  student_data

Age = c(23, 21, 22),

R Programming

Conclusion    

Variables are the building blocks of R Programming and play a crucial role in data manipulation and analysis. Understanding the different types of Variables of R Programming, their usage, and best practices for handling them is essential for any data analyst or programmer. By grasping the concepts discussed in this guide, you'll be better equipped to harness the power of R Programming and derive meaningful insights from your data. 

Unlock your programming potential with our expert-led Programming Training . Enlist now!  

Frequently Asked Questions

Upcoming programming & devops resources batches & dates.

Thu 15th Aug 2024

Thu 14th Nov 2024

Get A Quote

WHO WILL BE FUNDING THE COURSE?

My employer

By submitting your details you agree to be contacted in order to respond to your enquiry

  • Business Analysis
  • Lean Six Sigma Certification

Share this course

Our biggest spring sale.

red-star

We cannot process your enquiry without contacting you, please tick to confirm your consent to us for contacting you about your enquiry.

By submitting your details you agree to be contacted in order to respond to your enquiry.

We may not have the course you’re looking for. If you enquire or give us a call on 01344203999 and speak to our training experts, we may still be able to help with your training requirements.

Or select from our popular topics

  • ITIL® Certification
  • Scrum Certification
  • Change Management Certification
  • Business Analysis Courses
  • Microsoft Azure Certification
  • Microsoft Excel Courses
  • Microsoft Project
  • Explore more courses

Press esc to close

Fill out your  contact details  below and our training experts will be in touch.

Fill out your   contact details   below

Thank you for your enquiry!

One of our training experts will be in touch shortly to go over your training requirements.

Back to Course Information

Fill out your contact details below so we can get in touch with you regarding your training requirements.

* WHO WILL BE FUNDING THE COURSE?

Preferred Contact Method

No preference

Back to course information

Fill out your  training details  below

Fill out your training details below so we have a better idea of what your training requirements are.

HOW MANY DELEGATES NEED TRAINING?

HOW DO YOU WANT THE COURSE DELIVERED?

Online Instructor-led

Online Self-paced

WHEN WOULD YOU LIKE TO TAKE THIS COURSE?

Next 2 - 4 months

WHAT IS YOUR REASON FOR ENQUIRING?

Looking for some information

Looking for a discount

I want to book but have questions

One of our training experts will be in touch shortly to go overy your training requirements.

Your privacy & cookies!

Like many websites we use cookies. We care about your data and experience, so to give you the best possible experience using our site, we store a very limited amount of your data. Continuing to use this site or clicking “Accept & close” means that you agree to our use of cookies. Learn more about our privacy policy and cookie policy cookie policy .

We use cookies that are essential for our site to work. Please visit our cookie policy for more information. To accept all cookies click 'Accept & close'.

variable assignment in r

Secure Your Spot in Our PCA Online Course Starting on April 02 (Click for More Info)

Joachim Schork Image Course

Assignment Operators in R (3 Examples) | Comparing = vs. <- vs. <<-

On this page you’ll learn how to apply the different assignment operators in the R programming language .

The content of the article is structured as follows:

Let’s dive right into the exemplifying R syntax!

Example 1: Why You Should Use <- Instead of = in R

Generally speaking, there is a preference in the R programming community to use an arrow (i.e. <-) instead of an equal sign (i.e. =) for assignment.

In my opinion, it makes a lot of sense to stick to this convention to produce scripts that are easy to read for other R programmers.

However, you should also take care about the spacing when assigning in R. False spacing can even lead to error messages .

For instance, the following R code checks whether x is smaller than minus five due to the false blank between < and -:

A properly working assignment could look as follows:

However, this code is hard to read, since the missing space makes it difficult to differentiate between the different symbols and numbers.

In my opinion, the best way to assign in R is to put a blank before and after the assignment arrow:

As mentioned before, the difference between <- and = is mainly due to programming style . However, the following R code using an equal sign would also work:

In the following example, I’ll show a situation where <- and = do not lead to the same result. So keep on reading!

Example 2: When <- is Really Different Compared to =

In this Example, I’ll illustrate some substantial differences between assignment arrows and equal signs.

Let’s assume that we want to compute the mean of a vector ranging from 1 to 5. Then, we could use the following R code:

However, if we want to have a look at the vector x that we have used within the mean function, we get an error message:

Let’s compare this to exactly the same R code but with assignment arrow instead of an equal sign:

The output of the mean function is the same. However, the assignment arrow also stored the values in a new data object x:

This example shows a meaningful difference between = and <-. While the equal sign doesn’t store the used values outside of a function, the assignment arrow saves them in a new data object that can be used outside the function.

Example 3: The Difference Between <- and <<-

So far, we have only compared <- and =. However, there is another assignment method we have to discuss: The double assignment arrow <<- (also called scoping assignment).

The following code illustrates the difference between <- and <<- in R. This difference mainly gets visible when applying user-defined functions .

Let’s manually create a function that contains a single assignment arrow:

Now, let’s apply this function in R:

The data object x_fun1, to which we have assigned the value 5 within the function, does not exist:

Let’s do the same with a double assignment arrow:

Let’s apply the function:

And now let’s return the data object x_fun2:

As you can see based on the previous output of the RStudio console, the assignment via <<- saved the data object in the global environment outside of the user-defined function.

Video & Further Resources

I have recently released a video on my YouTube channel , which explains the R syntax of this tutorial. You can find the video below:

The YouTube video will be added soon.

In addition to the video, I can recommend to have a look at the other articles on this website.

  • R Programming Examples

In summary: You learned on this page how to use assignment operators in the R programming language. If you have further questions, please let me know in the comments.

assignment-operators-in-r How to use different assignment operators in R – 3 R programming examples – R programming language tutorial – Actionable R programming syntax in RStudio

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe. I hate spam & you may opt out anytime: Privacy Policy .

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Post Comment

Joachim Schork Statistician Programmer

I’m Joachim Schork. On this website, I provide statistics tutorials as well as code in Python and R programming.

Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe. I hate spam & you may opt out anytime: Privacy Policy .

Related Tutorials

Convert Vector from RcppArmadillo to Rcpp & Vice Versa in R (2 Examples)

Convert Vector from RcppArmadillo to Rcpp & Vice Versa in R (2 Examples)

replicate Function in R (2 Examples)

replicate Function in R (2 Examples)

Dynamic variable assignment in R

Evan c. mascitti, assigning multiple objects at once.

This section demonstrates how to assign each element of a list as a new object.

Generally speaking, I think this should be discouraged as most tasks can be accomplished inside a data frame or list (or the tidyverse versions: a tibble or nested list). Keeping everything in one place ensures the data lives together through an entire operation and prevents one from making mistakes (see Jenny Bryan’s tutorial, “Thinking inside the box: you can do that inside a data frame?!” ).

Why you would want to do this, sometimes

With the above disclaimer, here is one example when one wishes to refer to computed values in an R Markdown document using inline R expressions. Let’s say you have already created a long summary of data, for example the high and low points of the stock market indices over the past 100 years. In this case, it’s pain to type a long expression which requires filtering or indexing into a data frame…you just want to refer to the value of one “thing” by its name. In this case it is useful to have each element of the list available as a named object. For example, typing

“Black Friday was a precipitous crash due to computer-generated trades. Just a few months before, in late 1986, the S&P 500 hit a then-record high of $ `r sp1986hi` .”

is a lot simpler and less prone to mistakes than:

“Black Friday was a precipitous crash due to computer-generated trades. Just a few months before, in late 1986, the S&P 500 hit a then-record high of $ `r stocks_history %>% filter(year == 1986, type == "hi") %>% map_dbl(1)` .”

Minimal example

The list elements must be named (otherwise how could you assign them to objects?). Here I create a named list, where the name of each element is a lowercase letter and each element contains a vector of random numbers. All the elements have different lengths, so this data set cannot be stored as a data frame.

Let’s verify the contents of the list.

To assign each element of q to a list, simply call the function list2env() , whose first argument is (not surprisingly) the list whose elements you will be assigning as new objects. In this case, specify the environment argument as the global environment. If you don’t want the objects to clutter your global environment, you could assign it to an alternative environment that you create on the fly with environment() , but that’s not the point here.

We can verify that the objects have been assigned by noting their presence in the Environment pane:

variable assignment in r

If we call a couple of the new objects, their contents are printed in the console:

These are simple objects; they are just atomic vectors, but you could definitely do the same thing with more complicated lists.

list2env() has a much-more-sloppy cousin called list2DF() . list2DF() transposes a list to a data frame (which of course, is a special type of list…). unfortunately this function uses vector recycling without any complaints, and without regard to the length of the vectors involved. So list2DF() will “work” even if the list elements are not of the same length - it will simply extend each list by as many “extra” entries as it needs to fill out the data frame. For example, if we call this on our list q from above:

The shorter elements are just filled out to match the length of a which is 8 elements. 😱 😱 😱

There is no direct equivalent for this function in the tidyverse. You could try to coerce the list to a tbl_df with tibble::as_tibble() , which fails:

However, the equivalent coercion statement in base R will also prevent this behavior:

Strict limits on vector recycling comprise one key philosophy of the tidyverse. In this particular case there’s not much difference between base R and tibble .

“Gathering” arbitrary number of objects with mget()

The base function mget() is nearly the inverse of assigning a list to an environment. Instead, it collects any objects specified by the user and compiles them into a named list. Since it takes a character vector, you can leverage a call to ls() to construct a character vector of all existing objects in an local or global environment.

Here’s an example using the objects we created above: both the individual numeric vectors ( a , b . c , and d ) and the list object q :

The object whole_list now contains all the other objects which previously existed in the environment. This makes it easy to scoop up all the existing objects in a given environment and stash them in a list while preserving their names. I use this inside functions when I want to export all defined objects, without relying on manual assignment.

Summary and words of caution

These are some simple and powerful functions. They can make things simpler and less error-prone by replacing copy-paste and manual input with dynamically-generated variables. They should be used carefully, because it is easy to pollute an environment or cause unintentional side effects. mget() and list2env() might be best suited for local environments rather than the global workspace.

  • Data Visualization
  • Statistics in R
  • Machine Learning in R
  • Data Science in R
  • Packages in R
  • Dynamic Scoping in R Programming
  • R Variables - Creating, Naming and Using Variables in R
  • Lexical Scoping in R Programming
  • Lexical Scoping vs Dynamic Scoping in R Programming
  • Circular Barplots and Customisation in R
  • Find String Matches in a Vector or Matrix in R Programming - str_detect() Function
  • Difference Between & and && in R
  • Check if values in a vector are True or not in R Programming - all() and any() Function
  • Visualising the Hexabin Plot in R
  • Circular Packing to visualise hierarchy data in R
  • Poisson Regression in R Programming
  • Convert type of data object in R Programming - type.convert() Function
  • Introduction to R Markdown
  • Managing Secrets and Auth Details in R Language
  • Spline Chart using R
  • Parallel chart with the MASS library in R
  • Parallel coordinates chart with ggally in R
  • Learn R Programming
  • Scaling Variables Parallel Coordinates chart in R

Assigning values to variables in R programming – assign() Function

In R programming, assign() method is used to assign the value to a variable in an environment.

Syntax : assign(variable, value) Return : Return the variable having value assigned.

Please Login to comment...

Similar reads.

  • R-Variables
  • What are Tiktok AI Avatars?
  • Poe Introduces A Price-per-message Revenue Model For AI Bot Creators
  • Truecaller For Web Now Available For Android Users In India
  • Google Introduces New AI-powered Vids App
  • 30 OOPs Interview Questions and Answers (2024)

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

COMMENTS

  1. Creating, Naming and Using Variables in R

    R Variables - Creating, Naming and Using Variables in R. A variable is a memory allocated for the storage of specific data and the name associated with the variable is used to work around this reserved block. The name given to a variable is known as its variable name. Usually a single variable stores only the data belonging to a certain data ...

  2. r

    In general you assign a value exactly the way you've shown, using variable = value. However, you are dealing with the result of a t-test, where the result is a more complex value. You can still assign the result of the t-test though: result = t.test(a) Now the question becomes: how to extract the confidence interval (and its lower bound)?

  3. R Variables

    Creating Variables in R. Variables are containers for storing data values. R does not have a command for declaring a variable. A variable is created the moment you first assign a value to it. To assign a value to a variable, use the <- sign. To output (or print) the variable value, just type the variable name:

  4. How to Use the assign() Function in R (3 Examples)

    The assign() function in R can be used to assign values to variables.. This function uses the following basic syntax: assign(x, value) where: x: A variable name, given as a character string.; value: The value(s) to be assigned to x.; The following examples show how to use this function in practice.

  5. Variable assignment

    Variable assignment. A basic concept in (statistical) programming is called a variable. A variable allows you to store a value (e.g. 4) or an object (e.g. a function description) in R. You can then later use this variable's name to easily access the value or the object that is stored within this variable. You can assign a value 4 to a variable ...

  6. Data manipulation in R

    See the main functions to manipulate data in R such as how to subset a data frame, create a new variable, recode categorical variables and rename a variable ... There are three ways to assign an object in R: <-= assign() # 1st method x <- c(2.1, 5, -4, 1, 5) x ... To scale one or more variables in R use scale(): dat_scaled <- scale(dat_imputed ...

  7. Chapter 5 Creating Variables

    Chapter 5. Creating Variables. This section focuses on the creation of new variables for your analysis as part of an overall strategy of cleaning your data. Often times, data will come to you coded in a certain way, but you want to transform it to make it easier to work with. Our analyses often focus on linear changes in a given variable, but ...

  8. R

    R - Variables - A variable provides us with named storage that our programs can manipulate. A variable in R can store an atomic vector, group of atomic vectors or a combination of many Robjects. ... Variable Assignment. The variables can be assigned values using leftward, rightward and equal to operator. The values of the variables can be ...

  9. Chapter 3 Variables in R

    3.3 Creating Variables. A variable consists of 3 components: variable name. assignment operator. variable value. We can store the value of the radius by creating a variable and assigning it the value. In this case, we create a variable called radius and assign it the value 3 using the assignment operator <-. radius <- 3 radius.

  10. R

    A variable is used to store data that can be accessed later by subsequent code. In R, there are no variable "declaration" commands. Instead, they are created with the assignment operator, <- (The more familiar assignment operator, =, can be used instead, but is discouraged). The assignment operator can be chained together to initialize ...

  11. Quick-R: Creating New Variables

    When renaming, ensure that the new names are descriptive and adhere to R's variable naming conventions. Frequently Asked Questions (FAQs) about Variables in R Q: What's the difference between - and=for assignment in R? A: Both <- and = can be used for assignment in R. However, <- is the more traditional and preferred method, especially in ...

  12. R for Bio Data Science

    Assigning a Value to a Variable. In R, we state values directly in the chunk or the console, e.g.: 3. [1] 3. Here, we just state 3, so R simply "throws" that right back at you! Now, if want to "catch" that 3 we have to assign it to a variable, e.g.: x <- 3. Notice how now we "catch" the 3 and nothing is "thrown" back to you ...

  13. Variables in R Programming: A Comprehensive Overview

    In the above examples, we declared three variables: "age," "name," and "is_student." The variable "age" is of numeric data type and holds the value 30. The variable "name" is of character data type and stores the text "John Doe." Lastly, the variable "is_student" is of logical data type and is assigned the value TRUE.

  14. Assignment Operators in R (3 Examples)

    On this page you'll learn how to apply the different assignment operators in the R programming language. The content of the article is structured as follows: 1) Example 1: Why You Should Use <- Instead of = in R. 2) Example 2: When <- is Really Different Compared to =. 3) Example 3: The Difference Between <- and <<-. 4) Video ...

  15. Dynamic variable assignment in R

    To assign each element of q to a list, simply call the function list2env(), whose first argument is (not surprisingly) the list whose elements you will be assigning as new objects. In this case, specify the environment argument as the global environment. If you don't want the objects to clutter your global environment, you could assign it to ...

  16. How to create variables and assign data in R

    In this video, you will learn how to create variables and assign data in R programming. #RStudio #variables

  17. assignment operator

    The operators <- and = assign into the environment in which they are evaluated. The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions. answered Feb 16, 2010 at 8:56.

  18. Assigning values to variables in R programming

    R Variables - Creating, Naming and Using Variables in R Accessing variables of a data frame in R Programming - attach() and detach() function How to Assign Colors to Categorical Variable in ggplot2 Plot in R ?

  19. Variable assignment I

    Variable assignment I. A basic concept in R programming is the variable. It allows you to store a value or an object in R. You can then later use this variable's name to easily access the value or the object that is stored within this variable. You use <- to assign a variable:

  20. How do you use "<<-" (scoping assignment) in R?

    Instead, it seems that R looks outside of the fortest function, can't find a mySum variable to assign to, so creates one and assigns the value 1, the first time through the loop. On subsequent iterations, the RHS in the assignment must be referring to the (unchanged) inner mySum variable whereas the LHS refers to the global variable.