Week 4 Simulations
In this chapter, we will touch on the fundamentals of programming simulations in R. In doing so, we will introduce the concept of nested loops and apply it to the accumulator model.
You are encouraged to open your R console next to your browser in order to copy, paste and manipulate the R code in this chapter while following along. Understanding what code does is the first step to writing your own code!
Code chunks will be printed in the following form, paste it into your R console and see what it does:
# I am a code chunk
cat("Hi\n\nI am a code chunk!\n\n2+2 is:", 2+2, "\n")
Note: There is a button that automatically copies the block’s content to your clipboard at the top right corner of code chunks!
4.1 Simulation Routines in R
Generally, it is not best practice to use for-loops in R. Still, there is one instance were we generally use them: Simulations!
The function rnorm()
can draw randomly \(n\) times
from a gaussian (i.e., normal) distribution
with specified \(\mu\) and \(\sigma\).
set.seed(123)
rnorm(n=1, mean=0, sd=1) # 1 random z-value
## [1] -0.5604756
Note that we use the function set.seed()
to
have the same output for functions employing
random procedures each time we run our script!
Let’s imagine we want to estimate the mean
from 1000 data points drawn out of this distribution.
Sure, we could just call mean(rnorm(n=1000))
,
but, as a demonstration, simulations in R have the following
form:
<- 1000 # number of simulated data points
n
<- rep(NA, n) # empty vector with pre-allocated size
sample
# For-loops featuring only one line do not need "{}"
for (i in 1:n)
<- rnorm(n=1, mean=2, sd=2)
sample[i]
mean(sample)
We might also use the replicate()
function
for the same task, which directly returns
the vector with all sampled data points.
The function essentially runs the expression
parsed as one of its arguments \(n\) times.
For large computations, it is often more efficient to use the
replicate()
approach, but it is a less
flexible approach, as we will see later.
<- 1000 # number of simulated data points
n
<- replicate(n, rnorm(n=1, mean=2, sd=2))
sample
mean(sample)
As an intermediate example, we might fit a linear regression \(n\) times for two randomly sampled variables and extract the p-values for the F-statistic (a linear regression is a predictive model and the p-value, in this case, represents the likelihood that the data occurred under independence between the two variables):
<- 1000 # number of simulated data points
n
# Define function that should be replicated
<- function(n, mean, sd) {
get_random_pvalue <- rnorm(n=n, mean=mean, sd=sd) # create variables
x <- rnorm(n=n, mean=mean, sd=sd)
y <- lm(y ~ x) # linear regression fit
m # The extraction of p is quite complicated, just a demonstration
<- summary(m)$fstatistic
f <- pf(f[1], f[2], f[3], lower.tail=FALSE)
p return(p)
}
<- replicate(n, get_random_pvalue(n=1000, mean=0, sd=1))
result
mean(result)
# Ideally, significant results should be obtained ~5% (alpha%) of the time
# under the null hypothesis
mean(result < .05) # logical vectors can be parsed to sum() and mean()
4.2 Nested Loops
We have seen that simulations in R can be both done through for-loops as
well as the function replicate()
.
Ultimately, the advantage of for-loops in simulations is the ability to
incorporate nested loops. Nested loops are essentially for-loops inside
of for-loops. As an example, the nested for-loop (inside the topmost
for-loop) might represent a sub-routine in a simulation. A simple example
of a nested for-loop is demonstrated here (paste the code to your
R-console and follow along):
<- 10 # number of iterations in topmost for-loop
ii <- 5 # number of iterations in each sub-routine
jj
for (i in 1:ii) {
if (i>1)
cat("We reached iteration", i, "in the *topmost* for-loop !\n")
# Begin sub-routine for iteration i
for (j in 1:jj) {
cat("We reached iteration", j, "in the sub-routine for i =", i, "!\n")
} }
Naturally, you could also imagine for-loops with three layers or more
complicated rules, such as if-statements, break
clauses (stop the current
for-loop [e.g., sub-routine]), and next
clauses (finish the current
iteration and go to the next iteration; called continue
in many
programming languages). If you are not familiar with these concepts,
you are encouraged to look them up,
for example here.
Sometimes, you will also come across while-loops, which loop
as long as a logical condition is met. Other than that, they
function just like for-loops and are responsive to
break
and next
.
<- 10
check <- TRUE
special_boolean
while(check > 0) {
cat("Currently, $check still is larger than 10 since", check, "is > 0...\n")
if (check == 3 & special_boolean) {
cat("$check is", check, "right now, let me re-run this once without updating $check...\n")
<- FALSE # mark that this if-clause has been fulfilled once
special_boolean next
}<- check - 1 # ensure the condition is affected in order not to produce endless looping
check
}
cat("$check is now", check, "and not larger than 0 anymore...\n")
4.3 Simulating an Accumulator Model
The accumulator model is not a model but rather a class of models. This tutorial will briefly touch on the Linear Ballistic Accumulator Model (LBA), which is a speeded-choice model presupposing that evidence for two choices is accumulated in a linear and independent fashion until a certain response threshold is met. For more information on the LBA model, you can study chapter 14.2 of the supplementary book to these tutorials, Computational Modeling of Cognition and Behavior by Farrell & Lewandowsky.
Now, imagine we have the following task:
Generate synthetic data (100 simulations of 100 data points) from a two-choice motion-detection task assuming a normal distribution. Input an upper boundary of the starting point being 1 unit below the response threshold. Also input a non-decision time of 0.1 and make the drift rate of one response larger than the other in terms of mean drift rate and the standard deviation of the drift rate. Plot the distribution and/or quantiles of the reaction times for both responses. What can you infer from that plot?
What do we need to solve this task? When simulating a complex model it’s best to always start with a solid understanding of the model, then think of all the necessary pieces, variables, and sub-routines needed to simulate it. Try to gain such an understanding, including what the mentioned parameters represent.
Then, we need algorithms that help solving the task. In rare circumstances, we might need to write these algorithms ourselves (typically involving for loops and while loops as outlined above). Before that, it is a good idea to search the web for R packages or other software extensions that have implemented algorithms to solve the task at hand. Ideally, these packages have been used by other researchers in the field (and featured in publications) such that there is a minimum amount of credibility that the algorithms implemented in the packages are correct and do what they are supposed to do (instead of containing errors and thus outputting false results). It makes sense to search for papers that have cited or mentioned the name of the package or software explicitly.
For fitting the LBA model, we can use the package
{rtdists}
. While the package might have some dependencies (required software)
you still need to install on your computer (for example “GSL”, see installation
link in the code chunk below), it seems to be an established R-package
which is also used in
Computational Modeling of Cognition and Behavior by Farrell & Lewandowsky.
# Potential dependencies:
# Try: install.packages('gsl')
# If installation of {gsl} does not work directly, then download GSL:
# Windows:
# http://gnuwin32.sourceforge.net/packages/gsl.htm
# OSX, Linux Ubuntu:
# https://gist.github.com/TysonRayJones/af7bedcdb8dc59868c7966232b4da903
# Finally, install the {rtdists} package
# install.packages('rtdists')
Then, we need some tutorial or sample code where the software is implemented (if we do not want to read the often verbose documentation of the software we obtained). Often, scientists employ open science practices that include sharing the code they used in their study publicly as supplementary material to the publication. There is some code for simulating the LBA model in Computational Modeling of Cognition and Behavior by Farrell & Lewandowsky, you can look at all the functions in the R code you don’t understand and then played around with the code to develop an intuition for it.
Let’s look at the function rLBA()
from the package {rtdists}
,
which is featured in the code obtained and generates random data.
Write ?rtdists::rLBA
into the R console to get more information on the function. It reads,
among other things:
Density, distribution function, quantile function, and random generation for the LBA model with the following parameters: A (upper value of starting point), b (response threshold), t0 (non-decision time), and driftrate (v). All functions are available with different distributions underlying the drift rate: Normal (norm), Gamma (gamma), Frechet (frechet), and log normal (lnorm). The functions return their values conditional on the accumulator given in the response argument winning.
For an assumed normal distribution, we also see that we need additional arguments:
… two named drift rate parameters depending on distribution (e.g., mean_v and sd_v for distribution==“norm”)
Now we know which arguments we need to parse into the function
rLBA()
to simulate random data with an assumed normal distribution.
We can also look at some sample output of the function:
set.seed(1)
::rLBA(n=5, A=.7, b=.8, t0=.1, mean_v=c(.5, .5), sd_v=c(.4, .4)) %>%
rtdists::kable() # used just for more readable output knitr
## Results based on 2 accumulators/drift rates.
rt | response |
---|---|
0.6841842 | 1 |
0.7946825 | 2 |
0.6442018 | 1 |
0.3404240 | 2 |
0.9392921 | 1 |
Now, we can get started with solving the task! We might want to break it into smaller steps first:
Define all parameters needed for the simulation, including the number of iterations and responses.
Write a loop that executes the simulation.
Present the results of the simulation in a plot as described in the task.
Let’s try to implement these steps in R:
- Define all parameters needed for the simulation, including the number of iterations and responses.
Click to view the solution to this step, but try to solve it yourself first!
library(rtdists)
# Notice that this is a sample solution as the exercise
# It leaves some degrees of freedom in terms of specifying the
# exact parameters
<- 100
n_iterations <- 100
n_datapoints
<- 3
b <- b-1
A
<- .1
t0
<- c(2, 7) # small and large
mean_v <- c(1, 3) # small and large sd_v
- Write a loop that executes the simulation.
Click to view the solution to this step, but try to solve it yourself first!
# This solution is implemented to showcase a for-loop in this context
# One could also imagine defining n_datapoints as 100*100 for just one iteration
# Storing data from each iteration in a list with pre-allocated size
<- rep(list(NULL), n_iterations)
dat
for (i in 1:n_iterations) {
# `silent=TRUE` hides 'Results based on 2 accumulators/drift rates.'
# and is a method that can be used across many functions in R
<- rLBA(n_datapoints, A, b, t0, mean_v=mean_v, sd_v=sd_v, silent=TRUE)
dat[[i]]
}
# Combine data
<- do.call(rbind, dat) # call rbind with all elements of dat parsed as arguments dat
- Present the results of the simulation in a plot as described in the task.
Click to view the solution to this step, but try to solve it yourself first!
library(ggplot2)
library(magrittr)
%>%
dat ggplot(aes(x=rt)) +
geom_density() +
facet_wrap(~response) +
labs(x='Reaction Time in s', y='Density',
title='Reaction Time Density by Response')
# Alternatively: Get quantiles by response
<- dat$rt[dat$response == 1] %>% quantile()
qr1 <- dat$rt[dat$response == 2] %>% quantile()
qr2
# Can you plot another ggplot including quantiles? Maybe start here:
# https://ggplot2.tidyverse.org/reference/geom_point.html
There you have it! I will leave the interpretation of the graph (see task description) up to you. Since you have gained an intuition of the LBA model by now, the interpretation should come naturally.
Closing remarks: Writing your own simulation routines requires a lot of patience and is a complicated task. Hopefully, this chapter has made the skill of writing simulations a bit more approachable. Always try to research and learn from the code of others that have implemented the model you are trying to simulate. Work diligently to gain a firm understanding of the code, model, and task at hand and try to break everything down into pieces.