Week 4 Simulations, May 10th

In this chapter, we will touch on the fundamentals to program simulations in R. In doing so, we will introduce the concept of nested loops and apply it to the accumulator model.

You are encouraged to open your R console next to your browser in order to copy, paste and manipulate the R code in this chapter while following along. Understanding what code does is the first step to writing own code!

Code chunks will be printed in the following form, paste it into your R console and see what it does:

# I am a code chunk
cat("Hi\n\nI am a code chunk!\n\n2+2 is:", 2+2, "\n")

Note: There is a button that automatically copies the block’s content to your clipboard at the top right corner of code chunks!

4.1 Simulation Routines in R

Generally, it is not best practice to use for-loops in R. Still, there is one instance were we generally use them: Simulations!

The function rnorm() can draw randomly \(n\) times from a gaussian (i.e., normal) distribution with specified \(\mu\) and \(\sigma\).

set.seed(123)
rnorm(n=1, mean=0, sd=1) # 1 random z-value

## [1] -0.5604756

Note that we use the function set.seed() to have the same output for functions employing random procedures each time we run our script!

Let’s imagine we want to estimate the mean from 1000 data points drawn out of this distribution. Sure, we could just call mean(rnorm(n=1000)), but, as a demonstration, simulations in R have the following form:

n <- 1000 # number of simulated data points

sample <- rep(NA, n) # empty vector with pre-allocated size

# For-loops featuring only one line do not need "{}"
for (i in 1:n)  
    sample[i] <- rnorm(n=1, mean=2, sd=2)

mean(sample)

We might also use the replicate() function for the same task, which directly returns the vector with all sampled data points. The function essentially runs the expression parsed as one of its arguments \(n\) times. For large computations, it is often more efficient to use the replicate() approach, but it is a less flexible approach, as we will see later.

n <- 1000 # number of simulated data points

sample <- replicate(n, rnorm(n=1, mean=2, sd=2))

mean(sample)

As an intermediate example, we might fit a linear regression \(n\) times for two randomly sampled variables and extract the p-values for the F-statistic (a linear regression is a predictive model and the p-value, in this case, represents the likelihood that the data occurred under independence between the two variables):

n <- 1000 # number of simulated data points

# Define function that should be replicated
get_random_pvalue <- function(n, mean, sd) {
    x <- rnorm(n=n, mean=mean, sd=sd) # create variables
    y <- rnorm(n=n, mean=mean, sd=sd)
    m <- lm(y ~ x)   # linear regression fit
    # The extraction of p is quite complicated, just a demonstration
    f <- summary(m)$fstatistic    
    p <- pf(f[1], f[2], f[3], lower.tail=FALSE)
    return(p)
}

result <- replicate(n, get_random_pvalue(n=1000, mean=0, sd=1))

mean(result)

# Ideally, significant results should be obtained ~5% (alpha%) of the time 
# under the null hypothesis
mean(result < .05) # logical vectors can be parsed to sum() and mean()

4.2 Nested Loops

We have seen that simulations in R can be both done through for-loops as well as the function replicate(). Ultimately, the advantage of for-loops in simulations is the ability to incorporate nested loops. Nested loops are essentially for-loops inside of for-loops. As an example, the nested for-loop (inside the topmost for-loop) might represent a sub-routine in a simulation. A simple example of a nested for-loop is demonstrated here (paste the code to your R-console and follow along):

ii <- 10  # number of iterations in topmost for-loop
jj <- 5   # number of iterations in each sub-routine

for (i in 1:ii) {
    if (i>1)
        cat("We reached iteration", i, "in the *topmost* for-loop !\n")
    # Begin sub-routine for iteration i
    for (j in 1:jj) {
        cat("We reached iteration", j, "in the sub-routine for i =", i, "!\n")
    }
}

Naturally, you could also imagine for-loops with three layers or more complicated rules, such as if-statements, break clauses (stop the current for-loop [e.g., sub-routine]), and next clauses (finish the current iteration and go to the next iteration; called continue in many programming languages). If you are not familiar with these concepts, you are encouraged to look them up, for example here.

Sometimes, you will also come across while-loops which loop as long as a logical condition is met. Other than that, they function just like for-loops and are responsive to break and next.

check <- 10
special_boolean <- TRUE

while(check > 0) {
  cat("Currently, $check still is larger than 10 since", check, "is > 0...\n")
  if (check == 3 & special_boolean) {
    cat("$check is", check, "right now, let me re-run this once without updating $check...\n")
    special_boolean <- FALSE # mark that this if-clause has been fulfilled once
    next
  }
  check <- check - 1 # ensure the condition is affected in order not to produce endless looping
}

cat("$check is now", check, "and not larger than 0 anymore...\n")

4.3 Simulating an Accumulator Model

The accumulator model is not a model but rather a class of models. This tutorial will briefly touch on the Linear Ballistic Accumulator Model (LBA) which is a speeded-choice model presupposing that evidence for two choices is accumulated in a linear and independent fashion until a certain response threshold is met. For more information on the LBA model, you can study chapter 14.2 of the supplementary book to these tutorials, Computational Modeling of Cognition and Behavior by Farrell & Lewandowsky.

Now, imagine we have the following task:

Generate synthetic data (100 simulations of 100 data points) from a two-choice motion-detection task assuming a normal distribution. Input an upper boundary of the starting point being 1 unit below the response threshold. Also input a non-decision time of 0.1 and make the drift rate of one response larger than the other in terms of mean drift rate and the standard deviation of the drift rate. Plot the distribution and/or quantiles of the reaction times for both responses. What can you infer from that plot?

What do we need to solve this task? When simulating a complex model it’s best to always start with a solid understanding of the model, then think of all the necessary pieces, variables, and sub-routines needed to simulate it. Try to gain such an understanding, including what the mentioned parameters represent.

Then, we need algorithms that help solving the task. In rare circumstances, we might need to write these algorithms ourselves (typically involving for loops and while loops as outlined above). Before that, it is always clever to search the web for R packages or other software extensions that have implemented algorithms to solve the task at hand. Ideally, these packages have been used by other researchers in the field (and featured in publications) such that there is a minimum amount of credibility that the algorithms implemented in the packages are correct and do what they are supposed to do (instead of containing errors and thus outputting false results). It makes sense to search for papers that have cited or mentioned the name of the package or software explicitely.

For fitting the LBA model, we can use the package {rtdists}. While the package might have some dependencies (required software) you still need to install on your computer (for example “GSL”, see installation link in the code chunk below), it seems to be an established R-package which is also used in Computational Modeling of Cognition and Behavior by Farrell & Lewandowsky.

# Potential dependencies:
# Try: install.packages('gsl')
# If installation of {gsl} does not work directly, then download GSL:
# Windows: 
# http://gnuwin32.sourceforge.net/packages/gsl.htm
# OSX, Linux Ubuntu: 
# https://gist.github.com/TysonRayJones/af7bedcdb8dc59868c7966232b4da903

# Finally, install the {rtdists} package
# install.packages('rtdists')

Then, we need some tutorial or sample code where the software is implemented (if we do not want to read the often verbose documentation of the software we obtained). Often, scientists employ open science practices that include sharing the code they used in their study publicly as supplementary material to the publication. I found some code for simulating the LBA model in Computational Modeling of Cognition and Behavior by Farrell & Lewandowsky and looked at all the functions in the R code I did not understand and then played around with the code to develop an intuition for it.

Notably, I looked at the function rLBA() from the package {rtdists} which was featured in the code I obtained and generates random data. I wrote ?rtdists::rLBA into my R console to get more information on the function. It reads, among other things:

Density, distribution function, quantile function, and random generation for the LBA model with the following parameters: A (upper value of starting point), b (response threshold), t0 (non-decision time), and driftrate (v). All functions are available with different distributions underlying the drift rate: Normal (norm), Gamma (gamma), Frechet (frechet), and log normal (lnorm). The functions return their values conditional on the accumulator given in the response argument winning.

For an assumed normal distribution, we also see that we need additional arguments:

… two named drift rate parameters depending on distribution (e.g., mean_v and sd_v for distribution==“norm”)

Now we know which arguments we need to parse into the function rLBA() to simulate random data with an assumed normal distribution. We can also look at some sample output of the function:

set.seed(1)
rtdists::rLBA(n=5, A=.7, b=.8, t0=.1, mean_v=c(.5, .5), sd_v=c(.4, .4)) %>% 
  knitr::kable() # used just for more readable output

## Results based on 2 accumulators/drift rates.

rt	response
0.6841842	1
0.7946825	2
0.6442018	1
0.3404240	2
0.9392921	1

Now, we can get started with solving the task! We might want to break the tasks into smaller steps first:

Define all parameters needed for the simulation, including the number of iterations and responses.
Write a loop that executes the simulation.
Present the results of the simulation in a plot as described in the task.

Let’s try to implement these steps in R:

Define all parameters needed for the simulation, including the number of iterations and responses.

Click to view the solution to this step, but try to solve it yourself first!

library(rtdists)

# Notice that this is a sample solution as the exercise
# leaves some degrees of freedom in terms of specifying the 
# exact parameters

n_iterations <- 100 
n_datapoints <- 100

b <- 3
A <- b-1

t0 <- .1

mean_v <- c(2, 7) # small and large
sd_v <- c(1, 3)   # small and large

Write a loop that executes the simulation.

Click to view the solution to this step, but try to solve it yourself first!

# This solution is implemented to showcase a for-loop in this context
# One could also imagine defining n_datapoints as 100*100 for just one iteration

# Storing data from each iteration in a list with pre-allocated size
dat <- rep(list(NULL), n_iterations)

for (i in 1:n_iterations) {
  # `silent=TRUE` hides 'Results based on 2 accumulators/drift rates.'
  # and is a method that can be used across many functions in R
  dat[[i]] <- rLBA(n_datapoints, A, b, t0, mean_v=mean_v, sd_v=sd_v, silent=TRUE)
}

# Combine data
dat <- do.call(rbind, dat) # call rbind with all elements of dat parsed as arguments

Present the results of the simulation in a plot as described in the task.

Click to view the solution to this step, but try to solve it yourself first!

library(ggplot2)
library(magrittr)

dat %>% 
  ggplot(aes(x=rt)) + 
    geom_density() +
    facet_wrap(~response) +
    labs(x='Reaction Time in s', y='Density', 
         title='Reaction Time Density by Response')

# Alternatively: Get quantiles by response
qr1 <- dat$rt[dat$response == 1] %>% quantile()
qr2 <- dat$rt[dat$response == 2] %>% quantile()

# Can you plot another ggplot including quantiles? Maybe start here:
# https://ggplot2.tidyverse.org/reference/geom_point.html

There you have it! I will leave the interpretation of the graph (see task description) up to you. Since you have gained an intuition of the LBA model by now, the interpretation should come naturally.

Closing remarks: Writing own simulation routines requires a lot of patience and is a complicated task. Hopefully, this chapter has made the skill of writing simulations a bit more approachable. Always try to research and learn from the code of others that have implemented the model you are trying to simulate. Work diligently to gain a firm understanding of the code, model, and task at hand and try to break everything down into pieces.