Research

For an up-to-date publication list see my Google Scholar profile, and all my research-related blog posts are tagged with #research. A short summary of my current research topics is below on this page.

Keywords: (amortized) Bayesian inference, meta-learning, AI safety, information theory, agent analysis, bounded rationality, rate-distortion theory.

Blog highlights

For these topics I have written overview posts or series of posts on my blog:

Amortized algorithm mixtures via meta-training of sequential predictors [Coming summer '24]
- What is an amortized mixture predictor and how does it arise from meta-learning?
- Distilling general algorithms via meta-learning.
- In-context learning and prompt injections as features that arise from meta-training.
Speaking with clarity - telling AIs unambiguously what to do [Coming late summer '24]
- General AI and the underspecification problem. What are the limits of what we can "tell" AI systems?
- Markovian RL can only specify regular behavior.
- Goals vs. Rules - ethics requires going beyond goal-states to non-regular descriptions.
My PhD research
- Information-optimal hierarchies for inference and decision making
- Bounded-rationality via lossy compression (rate-distortion theory).

Current research topics

Two problems that I am very interested in are:

How to build AI systems that reliably take into account and express their uncertainty about their environment and about their goals when reasoning and acting?
How to accurately and efficiently tell AI systems what specific outcomes we want them to achieve, and how we do and do not want them to behave regardless of their specific goals?

Both questions draw on a rich and complex discourse in philosophy, and more recently also in mathematics, psychology, AI research, and sociology. The first question mainly falls into the realms of epistemology, logic, and induction; with compelling formalizations via Bayesian inference, Solomonoff induction, and sequential decision-making. The second question is about the philosophy of language, ethics, and perhaps also about induction (with biases that follow social norms). Formalizations are less clear (thanks Wittgenstein), but the theory of computation, (algorithmic) information theory, and decision-theory seem like helpful tools in this context.

More concretely I am currently exploring how implicit meta-training leads to amortized Bayesian inference and more general algorithm distillation, and how this may be the main (computational) mechanism behind in-context learning. How to drive this mechanism to reliably specify tasks and desired behavior to a pre-trained model is an open question, including reliable knowledge of when the system "understands" these instructions (meaning that the instructions are received without ambiguity). Related is the question of how to deal with ambiguity and epistemic uncertainty, and how to modify meta-learning schemes and bring in other approaches (such as active learning on the agent's side) to do so.

Past research

In the context of AI Safety I have worked on analysis (gaining a computational understanding) and interpretability of current AI agents. The emphasis was on fundamental questions and methodology that are likely to be relevant for future, and more advanced AI systems. Work that came out of this research was heavily influenced by Pedro Ortega's forward-looking thinking and the members of the Safety Analysis team and includes a formal understanding of meta-learning via amortized Bayesian inference. We empirically confirmed this understanding in prediction and decision-making tasks and (more recently) on non-stationary distributions ("switching" sources) as well as general algorithmic sources. Using this understanding, we formulated alternative meta-learning schemes that lead to amortized systems that can naturally deal with epistemic uncertainty. We advocated for using causal interventions, rather than purely observational analysis of agent behavior; and formulated the delusions problem - a problem of causality when predictors are used for decision-making and falsely take their own actions as evidence for belief updates. Finally, we were interested in fundamental limitations of today's AI models, and found that formal language complexity (the Chomsky hierarchy) is highly predictive of what models (including transformers) can and cannot learn.

During my time at Bosch I was working on neural network compression (reducing the computational footprint of neural nets) and Bayesian Deep learning. Interestingly, the prior in a Bayesian neural net can be used to favor weight configurations that are easily compressible. We also found that ensembles of networks produced superior epistemic uncertainty estimates for active learning compared to computationally cheaper methods.

In my PhD I was interested in computational mechanisms that allow for structure learning, that is learning of higher-order statistical structure that is shared across a family of tasks and facilitates learning of new task instances. I investigated such learning in the human sensorimotor system (via virtual reality experiments) and formulated an information-theoretic optimality principle based on hierarchical lossy compression as a theoretical computational mechanism. Our lab was working on thermodynamic-inspired theories for bounded-rational decision-making and Bayesian models of human sensorimotor learning. See more on my PhD research pages.
📥 Download Thesis