Associative learning is a fundamental mechanism by which organisms form representations of relationships between stimuli and actions. In recent years, reformulations of well-established concepts of associative learning have shaped our understanding of how higher-order cognitive processes might emerge from simpler cognitive and learning mechanisms. This symposium encompasses five presentations of early-career researchers that highlight recent developments in the investigation of associative learning with a variety of cognitive, computational, and neuroscientific methods, ranging from virtual reality, cognitive-computational modeling, multivariate decoding of neural representations in fMRI data to ecological momentary assessment using cross-platform online applications.
First, Stephan Nebe will introduce new experimental approaches and computational models for the laboratory assessment of habits quantifying the influence of past behavioral frequency on future actions.
Lennart Luettgau will present evidence for cortical reinstatement of outcome representations as a mechanism underlying associative learning transfer, applying cross-session, cross-modality multivariate pattern analyses on fMRI data.
Mona Garvert will highlight a study combining virtual reality, computational modeling, and fMRI to investigate how humans use relational knowledge organized in cognitive maps to generalize value to states that were previously not experienced.
Eric Schulz will follow presenting computational modeling of a compositional bandit task, in which humans entertain compositional representations and a grammar over these structures, to show performance exceeding neural network models.
Finally, Monja Neuser will present longitudinal data of a novel reward learning task complemented by ecological momentary assessment acquired with an open-source cross-platform application, informing the creation of better models of human behavior.
Studying human habits in the lab: A novel experimental paradigm reveals inter-individual differences
Universität Zürich, Schweiz
Habitual behavior is characterized by responses elicited by stimuli without deliberation or reliance on the predicted value of the outcome. Thus, habits reduce cognitive load in everyday life, but they also dominate behavior in psychopathologies like substance use or obsessive-compulsive disorders. Due to the ubiquity and clinical importance of habits, it is essential to study them in the lab. Current operationalizations require that outcome values and contingencies are no longer considered when defining behavior as habitual but neglect that habit strength should be proportional to the past frequency of performance.
We developed a new experimental task realigning the empirical operationalization with the theoretical, frequency-based foundation of habits. This task assesses habit strength as a function of previous choice frequency while controlling for the impact of reinforcement. In two initial studies with 34 participants in total, we tested the influence of previous choice frequency on preferences in binary decisions. The development of habits was facilitated by five training sessions on consecutive days. Mixed-effects regression showed an effect of past choice frequency on behavior during test on the fifth study day. Computational modeling of participants’ behavior allowed a more detailed understanding showing inter-individual differences in choice strategies. Half of the participants combined reinforcement-based and frequency-based values to inform their choice during test. The other half ignored past choice frequency and solely relied on expected outcome values. Thus, our method quantifies individual propensity to show habits and has potential to identify subgroups of the population prone to a pathological overexpression of habits.
Reinstatement of cortical outcome representations during second-order conditioning
1Max Planck University College London Centre for Computational Psychiatry and Ageing Research, London, United Kingdom; 2Biological Psychology of Decision Making, Institute of Experimental Psychology, Heinrich Heine University, Düsseldorf, Germany; 3Center for Behavioral Brain Sciences, Otto-von-Guericke University, Magdeburg, Germany; 4Department of Biological Psychology, Otto-von-Guericke University, Magdeburg, Germany; 5Department of Neurology, Otto-von-Guericke University, Magdeburg, Germany
Naturalistic learning scenarios are characterized by sparse and infrequent experience of external feedback to guide behavior. Higher-order learning mechanisms like second-order conditioning (SOC) may allow stimuli that were never experienced together with reinforcement to acquire motivational value. Despite its explanatory potential for real-world learning, surprisingly little is known about the neural mechanism underlying such associative transfer of value in SOC. Here, we propose that during SOC, cortical patterns representing outcomes are reinstated by first-order conditioned stimuli (CS) to establish associative links between second-order CS and outcomes. During functional magnetic resonance imaging (fMRI), we presented healthy human subjects with appetitive and aversive gustatory outcomes (orange juice and quinine solution). On a separate day, participants underwent first-order conditioning (outside fMRI), establishing associations between visual CS and gustatory outcomes, followed by SOC (during fMRI). Multivariate cross-session, cross-modality searchlight classification during SOC showed reinstatement of cortical patterns representing previously paired gustatory outcomes in the lateral orbitofrontal cortex (OFC) during presentation of the (visual) first-order CS. During SOC, this OFC region showed increased functional covariation with amygdala, where neural pattern similarity between second-order CS and outcomes increased from early to late stages of SOC. Our data suggest a mechanism by which motivational value might be conferred to stimuli that were never paired with reinforcement.
Stimulus-reward learning and generalization in structured environments
1Max-Planck-Institut für Kognitions- und Neurowissenschaften, Deutschland; 2Max-Planck-Institut für biologische Kybernetik, Deutschland; 3Max-Planck-Institut für Bildungsforschung, Deutschland
It has been suggested that the brain organizes knowledge about the relationships between positions in space and non-spatial regularities in a cognitive map. Such a representation of events and knowledge may facilitate goal-directed behavior by enabling the generalization of information across related states, but the neural and computational mechanisms underlying such map-based generalization are not known. Here, we combine a virtual reality task with computational modeling and functional magnetic resonance imaging (fMRI) to investigate how humans generalize across related states to infer reward values that were never directly experienced. In this task, spatial relationships between stimuli predict reward relationships in a subsequent choice task. We find that participants not only update the stimulus-reward associations they experience directly, but they also use their knowledge about the relationships between stimuli to predict values of stimuli which were not directly sampled. This behavior can be captured by a generalizing Gaussian process model which operates over a cognitive map emerging from individual exploration behavior rather than a cognitive map reflecting true Euclidean distances. Using fMRI adaptation, we further demonstrate that an experience-based, but not a Euclidean cognitive map, is represented in the hippocampal-entorhinal system. Together, this demonstrates that relational knowledge organized inhippocampal maps can be used to extrapolate across related states and thereby facilitate novel inference.
Compositional generalization in multi-armed bandits
Max Planck Institute for Biological Cybernetics, Deutschland
To what extent do human reward learning and decision-making rely on the ability to represent and generate richly structured relationships between options? We provide evidence that structure learning and the principle of compositionality play crucial roles in human reinforcement learning. In a new multi-armed bandit paradigm, termed the compositionally-structured multi-armed bandit task, we found evidence that participants are able to learn representations of different latent reward structures and combine them to make correct generalizations about options in novel contexts. Moreover, we found substantial evidence that participants transferred knowledge of simpler reward structures, to make informed, compositional generalizations about rewards in complex contexts. We also provide a computational model which is able to generalize and compose knowledge of complex reward structures using a grammar over structures and show how such compositional inductive biases can be learned by meta-reinforcement learning agents.
Influenca: gamified smartphone-based assessment of reward learning and momentary states
1Eberhard Karls University Tübingen, Department of Psychiatry and Psychotherapy, Germany; 2Max Planck Institute of Psychiatry and International Max Planck Research School for Translational Psychiatry (IMPRS-TP), Munich, Germany; 3Eberhard Karls University Tübingen, Department of Psychology, Tübingen, Germany
Reinforcement learning is a core facet of reward processing and alterations have been associated with various mental disorders. To build better models of individual behavior, repeated measurement of reward learning and value-based decision-making is crucial. However, the focus on lab-based assessments has limited the number of consecutive measurements and the test-retest reliability of many learning parameters is therefore unknown.
Here, we present our open-source, cross-platform application Influenca that provides a reward learning task optimized for repeated testing and complemented by ecological momentary assessment (EMA) of metabolic and mood states for extended assessments over weeks (up to 31 runs).
Using an initial validation sample of 127 players (2904 runs), we found that parameters of reinforcement learning, such as the learning rate and reward sensitivity, show low to medium intra-class correlations (ICCs: 0.147-0.665; maximum likelihood estimation per run). Notably, most state items showed comparable ICCs, indicating substantial fluctuations of behavioral indices over time.
To conclude, our innovative app provides an open framework that facilitates repeated assessments of reward learning and value-based decision-making across various states. Parameter estimates from our online assessment showed comparable reliabilities as other lab-based paradigms in the literature, suggesting that one run may not be sufficiently representative of typical behavior. The presented longitudinal format may help better quantify intra- and inter-individual differences in value-based decision-making and enable early identification of risk factors such as reward-related alterations that characterize many mental disorders.