Integration Simulations

The present document shows that a per-condition model will often explain data generated from an integrated model better than the integrated model itself.

Simulating Data

The base simulation will contain no error, and will be based on a modulated effect on inequality-aversion following the instructions outlined in Galvan and Sanfey’s paper in Social Cognition (2025) and in the handbook. This is an illustrative example: consider the not-so outlandish claim that selfishness in an interpersonal interaction can be affected by both (a) generic distributive preferences and (b) the extent to one (dis)likes another person. In other words, the extent to which one likes another person may modulate their distributive preferences. First we need base functions that serve as input to a model.

Base Functions

payout_maximization = function(endowment, transferred){
  kept = endowment - transferred
  if(kept > 0) {value = log(kept)} else {value = 0}
  maxValue = log(endowment)
  return(value/maxValue)
}
inequality = function(endowment, transferred){
  Equality = (endowment) * 0.5
  maxInequality = Equality - 0
  
  choices = seq(0, endowment)
  Inequality = abs(Equality - choices)
  minInequality = choices[which(Inequality == min(Inequality))][1]
  
  violation = (minInequality - transferred)/maxInequality
  return(1 - (violation)**2)
}
modulator = function(liking){
  return((1 - liking**2))
}

Utility Equation

Now we need the utility equation: theta will be the baseline distributive preferences (low = selfless, high = selfish) and phi will be the weight placed on modulation (low = independent on relation to game partner, high = dependent).

utility = function(theta, phi, modulator, inequality, payout){
  thetaMod = theta * (1-phi) + modulator * phi
  return(thetaMod*payout + (1-thetaMod)*inequality)
}

Of course, this is extremely unlikely to represent all of the ways that people’s relation to another modulates their distributive preferences, but it serves as an intuitive example in this instance.

Generate Predictions

This will be the workhorse of our modeling: the function which generates predictions from a trial set and set of parameters.

generatePredictions = function(params, df){
  prediction = vector('numeric', nrow(df))
  for (k in 1:nrow(df)){
    E = df$Endowment[k]
    L = df$Liking[k]
    Choices = seq(0, E)
    Utility = vector('numeric', length(Choices))
    for (n in 1:length(Choices)){
      Utility[n] = utility(theta = params[1], 
                           phi = params[2], 
                           modulator = modulator(L), 
                           inequality = inequality(E, Choices[n]), 
                           payout = payout_maximization(E, Choices[n]))
    }
    correct_choice = which(Utility == max(Utility))
    if (length(correct_choice) > 1){
      correct_choice = correct_choice[sample(1:length(correct_choice), 1)]
    }
    prediction[k] = Choices[correct_choice]
  }
  return(prediction)
}

Set-Up

We now establish the parameters we want to simulate for.

freeParameters = data.frame(theta = rep(seq(0, 1, 0.01), each = 101),
                            phi = rep(seq(0, 1, 0.01), times = 101))

head(freeParameters)

##   theta  phi
## 1     0 0.00
## 2     0 0.01
## 3     0 0.02
## 4     0 0.03
## 5     0 0.04
## 6     0 0.05

And the trialset we want to use. In this example, we will assess the ability of the model to explain decisions in two conditions: where they are with a best-friend (Liking = 1) and someone they despise (Liking = 0). We will also put in a generic other person (0.5), making three conditions in total.

trialSet = data.frame(Endowment = rep(seq(10, 100, 10), times = 3),
                      Liking = rep(c(0, 0.5, 1), each = 10))
trialSet

##    Endowment Liking
## 1         10    0.0
## 2         20    0.0
## 3         30    0.0
## 4         40    0.0
## 5         50    0.0
## 6         60    0.0
## 7         70    0.0
## 8         80    0.0
## 9         90    0.0
## 10       100    0.0
## 11        10    0.5
## 12        20    0.5
## 13        30    0.5
## 14        40    0.5
## 15        50    0.5
## 16        60    0.5
## 17        70    0.5
## 18        80    0.5
## 19        90    0.5
## 20       100    0.5
## 21        10    1.0
## 22        20    1.0
## 23        30    1.0
## 24        40    1.0
## 25        50    1.0
## 26        60    1.0
## 27        70    1.0
## 28        80    1.0
## 29        90    1.0
## 30       100    1.0

Output the Predictions

predictions = data.frame()
for (i in 1:nrow(freeParameters)){
  pars = as.numeric(freeParameters[i, ])
  predictions[i, 1:nrow(trialSet)] = generatePredictions(pars, trialSet)
}
head(predictions)

##   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21
## 1  5 10 15 20 25 30 35 40 45  50   5  10  15  20  25  30  35  40  45  50   5
## 2  5 10 15 20 25 30 35 40 45  50   5  10  15  20  25  30  35  40  45  50   5
## 3  5 10 15 20 25 30 35 40 45  50   5  10  15  20  25  30  35  40  45  50   5
## 4  5 10 15 20 25 30 35 40 45  50   5  10  15  20  25  30  35  40  45  50   5
## 5  5 10 15 20 25 30 35 40 45  50   5  10  15  20  25  30  35  40  45  50   5
## 6  5 10 15 20 25 30 35 40 45  50   5  10  15  20  25  30  35  40  45  50   5
##   V22 V23 V24 V25 V26 V27 V28 V29 V30
## 1  10  15  20  25  30  35  40  45  50
## 2  10  15  20  25  30  35  40  45  50
## 3  10  15  20  25  30  35  40  45  50
## 4  10  15  20  25  30  35  40  45  50
## 5  10  15  20  25  30  35  40  45  50
## 6  10  15  20  25  30  35  40  45  50

Objective Function

First let’s set up the objective function

obj_function = function(params, df, method = "OLS") {
  predicted_utility = vector('numeric', length(df[,1]))
  observed_utility = vector('numeric', length(df[,1]))
  
  Predictions = generatePredictions(params, df)
  
  for (k in 1:nrow(df)){
    E = df$Endowment[k]
    L = df$Liking[k]
    R = df$Transferred[k]
    P = Predictions[k]
    
    predicted_utility[k] = utility(theta = params[1], phi = params[2],
                                   modulator = modulator(L), 
                                   inequality = inequality(E, P), 
                                   payout = payout_maximization(E, P))
    observed_utility[k] =  utility(theta = params[1], phi = params[2],
                                   modulator = modulator(L), 
                                   inequality = inequality(E, R), 
                                   payout = payout_maximization(E, R))
  }
  
  return(sum((predicted_utility - observed_utility)**2))
}

Per Condition Model

We’ll call the per condition model the alternative model

utilityAlt = function(theta, inequality, payout){
  return(theta*payout + (1-theta)*inequality)
}
generatePredictionsAlt = function(par, df){
  prediction = vector('numeric', nrow(df))
  for (k in 1:nrow(df)){
    E = df$Endowment[k]
    
    Choices = seq(0, E)
    Utility = vector('numeric', length(Choices))
    for (n in 1:length(Choices)){
      Utility[n] = utilityAlt(theta = par,   
                              inequality = inequality(E, Choices[n]), 
                              payout = payout_maximization(E, Choices[n]))
    }
    correct_choice = which(Utility == max(Utility))
    if (length(correct_choice) > 1){
      correct_choice = correct_choice[sample(1:length(correct_choice), 1)]
    }
    prediction[k] = Choices[correct_choice]
  }
  return(prediction)
}
obj_functionAlt = function(par, df, method = "OLS") {
  predicted_utility = vector('numeric', length(df[,1]))
  observed_utility = vector('numeric', length(df[,1]))
  Predictions = generatePredictionsAlt(par, df)
  for (k in 1:nrow(df)){
    E = df$Endowment[k]
    R = df$Transferred[k]
    P = Predictions[k]
    
    predicted_utility[k] = utilityAlt(theta = par, 
                                      inequality = inequality(E, P), 
                                      payout = payout_maximization(E, P))
    observed_utility[k] =  utilityAlt(theta = par, 
                                      inequality = inequality(E, R), 
                                      payout = payout_maximization(E, R))
  }
  return(sum((predicted_utility - observed_utility)**2))
}

Now we need to optimize for the alternative model, per condition

optimizePerCondition = function(df){
  conditions = levels(factor(df$Liking))
  result = data.frame(par = vector('numeric', length(conditions)))
  for (i in 1:length(conditions)){
    condition = conditions[i]
    data = df[which(df$Liking == condition), ]
    result$par[i] = optim(fn = obj_functionAlt, 
                          par = 0.5, lower = 0, upper = 1, 
                          df = data, method = "Brent")$par
  }
  return(result)
}

And then generate predictions per condition

generateAltPerCondition = function(params, df){
  Prediction = vector('numeric', nrow(df))
  conditions = levels(factor(df$Liking))
  for (i in 1:length(conditions)){
    condition = conditions[i]
    trials = which(df$Liking == condition)
    Prediction[trials] = generatePredictionsAlt(params[i], df[trials, ])
  }
  return(Prediction)
}

Initialize for Parameter Recovery

We’re going to use an exhaustive search approach, which should be computational cheaper and more reliable

initial_params = c(0.5, 0.5)
lower_bounds = c(0, 0)
upper_bounds = c(1,1)

Proof Goals

At this point we have deduced the choices that the model predicts per each coordinate in parameter space. The goal in what follows is to demonstrate that a less-parsimonious, incorrect model which is fit to conditions will outperform the true data generation process.

Thus, we will compare the 2 parameter model which integrates across conditions to the model which segregates conditions in terms of their ability to explain the data generated by the 2 parameter model (predictions).

Data Functions

However, since the predictions do not contain noise, we will noise up the data. The default standard deviation will be 0.05.

library(ggplot2)
noise = function(Predictions, X, sd = 0.05){
  Ratio = Predictions/X
  Ratio = Ratio + rnorm(Ratio, mean = 0, sd = sd)
  Predictions = Ratio * X
  Predictions = round(Predictions)
  Predictions[which(Predictions < 0)] = 0
  return(Predictions)
}
qplot(x = rep(trialSet$Endowment, 1000), 
      y = noise(rep(t(predictions[1, 1:30]), 1000), 
                rep(trialSet$Endowment, 1000))/rep(trialSet$Endowment, 1000),
      geom = "jitter")

## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

We will also warp the data as one check

warp = function(Predictions, X, WarpFactor){
  Xmax = max(X)
  standardDeviation = vector('numeric', length(X))
  for (i in 1:length(Predictions)){
    standardDeviation[i] = 0.05 + 0.05*(((X[i] - Xmax)/Xmax) * WarpFactor)
    Predictions[i] = Predictions[i] + rnorm(1, mean = 0, sd = standardDeviation)
  }
  Ratio = Predictions/X
  Ratio = Ratio + rnorm(Ratio, mean = 0, sd = standardDeviation)
  Predictions = Ratio * X
  Predictions = round(Predictions)
  Predictions[which(Predictions < 0)] = 0
  return(Predictions)
}
qplot(x = rep(trialSet$Endowment, 1000), 
      y = warp(rep(t(predictions[1, 1:30]), 1000), 
               rep(trialSet$Endowment, 1000), 1)/rep(trialSet$Endowment, 1000),
      geom = "jitter")

Parameter Distributions

Also, a robust amount of research on these kinds of task indicates that preferences are not evenly distributed across the parameter space.

distribution_theta = rnorm(n = 1000000, mean = 0.7, sd = 0.3)
distribution_theta = distribution_theta[-which(distribution_theta < 0 | distribution_theta > 1)]
distribution_theta = round(distribution_theta, 2)
ggplot() + geom_histogram(aes(x = distribution_theta), bins = 15)

Around the typical return amount is 35%, corresponding to a theta ~ 0.7 in this equation. We want the same for phi as well, thinking that the majority of people will show a strong modulatory effect.

distribution_phi = rnorm(n = 1000000, mean = 0.7, sd = 0.3)
distribution_phi = distribution_phi[-which(distribution_phi < 0 | distribution_phi > 1)]
distribution_phi = round(distribution_phi, 2)
ggplot() + geom_histogram(aes(x = distribution_phi), bins = 15)

Sample the parameter space for the maximum number of observations: 1000

n1000 = data.frame(theta = sample(distribution_theta, 1000),
                   phi = sample(distribution_phi, 1000))
ggplot(data = n1000) + geom_point(aes(x = theta, y = phi))

And save this information so that it can be used in the lower child scripts

save.image(file = "C:/Users/DELL/Documents/integration-simulations/index.RData")

Home