Reward-Based Learning: How Your Brain Forms Habits

Articles · · 12 min read
Dr. Jud Brewer
Dr. Jud Brewer, MD, PhD

Psychiatrist • Neuroscientist • Brown University Professor

NYT bestselling author · 20M+ TED views · Featured on 60 Minutes

As Featured In
TIMEThe Washington PostForbesCNNHuffPostNPR

Every habit you have (good and bad) exists for one reason: your brain learned it was rewarding. That morning coffee. The urge to check your phone. The worry that kicks in before a big meeting. Your brain isn’t broken. It’s doing exactly what it was designed to do: repeat behaviors that get rewarded.

This process is called reward-based learning, and it’s the most powerful force shaping your behavior. It’s why you can’t just “decide” to stop worrying, scrolling, or stress-eating. It’s also why understanding this mechanism is the single most important step you can take toward changing any habit.

In my 20+ years of clinical and research work at Brown University, I’ve studied how reward-based learning drives everything from nicotine addiction to anxiety to emotional eating. What I’ve found has fundamentally changed how I think about habit change, and it’s the science behind the Three Gears framework that has helped thousands of people break habits without relying on willpower.

Let me walk you through how your brain actually learns habits, why most approaches to breaking them fail, and what to do instead.


What Is Reward-Based Learning?

Reward-based learning is the process by which your brain connects behaviors to outcomes. When you do something and it produces a positive result (or even the perception of a positive result) your brain takes note: Do that again.

This is one of the oldest learning systems in the brain. It evolved to keep us alive. Our ancestors needed to remember where food was, what was dangerous, and which behaviors led to survival. The brain solved this problem with a simple but elegant system:

  1. A trigger occurs (you see a berry bush)
  2. You take an action (you eat the berries)
  3. Your brain registers the outcome (calories, energy, survival)
  4. A memory is encoded (next time you see that bush, eat from it)

This is the same system B.F. Skinner described as operant conditioning. But modern neuroscience has revealed the specific brain circuits that make it work, and why it creates habits that feel impossible to break (Brewer et al., 2018).

The important thing to understand is this: reward-based learning doesn’t evaluate whether a behavior is good for you in the long run. It only tracks whether it was rewarding in the moment. And that distinction is the root of every bad habit.


How Does Your Brain Learn Habits Through Reward?

Here’s the sequence that creates every habit:

Step 1: Something triggers you. A stressful email. Boredom at 3 p.m. A notification sound. An anxious thought about tomorrow’s meeting.

Step 2: You do something. You check your phone. You eat a cookie. You start worrying. You light a cigarette.

Step 3: Your brain gets a “reward.” The cookie gives you a sugar hit. The phone gives you novelty. The worry gives you a temporary sense of control. The cigarette gives you a nicotine buzz.

Step 4: Your brain updates its records. “That behavior worked. File it under ‘things to do when triggered.’”

Step 5: The loop repeats, and becomes automatic. After enough repetitions, the trigger-behavior-reward loop no longer requires conscious decision-making. It runs on autopilot. You’re reaching for your phone before you’ve even registered the urge.

This is how a single rewarding experience becomes a habit. And the brain region that drives this process isn’t the one most people think.


What Happens in Your Brain During Reward-Based Learning?

Three brain systems work together to create and maintain habits through reward-based learning.

Dopamine: The Teaching Signal (Not the Pleasure Chemical)

Most people think of dopamine as the “feel-good” chemical. It’s more accurate to call it the teaching signal.

In 1997, neuroscientist Wolfram Schultz and colleagues published a landmark paper in Science showing that midbrain dopamine neurons don’t simply fire when you experience something pleasurable. They fire when the outcome is better than expected, and they go quiet when the outcome is worse than expected (Schultz, Dayan & Montague, 1997).

This signal is called a reward prediction error. It’s the difference between what your brain predicted would happen and what actually happened:

  • Positive prediction error (better than expected): Dopamine spikes. Your brain says, “Pay attention: do this again.”
  • Negative prediction error (worse than expected): Dopamine dips. Your brain says, “That wasn’t worth it. Adjust.”
  • No prediction error (exactly as expected): No dopamine change. The behavior is learned; it runs on autopilot.

This is why the first cigarette, the first cookie, or the first hit of a notification feels so rewarding: it’s unexpected. And it’s why the 10,000th cigarette doesn’t feel rewarding at all, but you still do it. By then, the habit is automated. The dopamine teaching signal has done its job and moved on. What remains is the conditioned response.

The Orbitofrontal Cortex: Your Brain’s Reward Scoreboard

If dopamine is the teaching signal, the orbitofrontal cortex (OFC) is the scoreboard.

The OFC sits behind your eyes and serves as your brain’s reward value comparator. It constantly evaluates: “How rewarding is this behavior compared to my alternatives?” (Brewer, 2019).

Every time you engage in a behavior and experience an outcome, the OFC updates its internal ranking. Over time, it builds a comprehensive map of what’s “worth it” and what isn’t. This map drives your automatic behavior: the things you do without thinking.

Here’s the critical insight: the OFC doesn’t care about your conscious opinions. You can intellectually know that scrolling social media at midnight is bad for you. But if the OFC has logged that behavior as “rewarding” (because it relieves boredom, provides novelty, or reduces anxiety), it will keep driving you toward it.

This is why habit change through logic and information alone doesn’t work. Knowing smoking causes cancer doesn’t update the OFC. Only direct experience updates the OFC.

The Basal Ganglia: Where Habits Get Stored

Once the dopamine teaching signal has trained a behavior and the OFC has assigned it a reward value, the habit gets encoded in the basal ganglia, specifically the striatum. This is the brain’s habit storage system.

The striatum converts learned reward associations into automatic motor programs. This is efficient: it frees up your conscious brain (the prefrontal cortex) for other tasks. But it also means that deeply encoded habits run below the level of awareness. You’re halfway through a bag of chips before you realize you started eating.


Why Can’t You Just Stop a Habit? The Willpower Problem

Here’s where most habit-change advice goes wrong.

When you try to resist a habit using willpower, you’re asking your prefrontal cortex (the part of your brain responsible for executive function, planning, and impulse control) to override a signal from the orbitofrontal cortex (the part that has catalogued this behavior as rewarding).

This is a structural mismatch. The prefrontal cortex is powerful, but it’s also resource-intensive and fatigue-prone. It’s the first thing to go offline when you’re stressed, sleep-deprived, hungry, or emotionally overwhelmed: which is exactly when habits are strongest.

The OFC, on the other hand, is always running. It doesn’t need willpower. It doesn’t need you to be well-rested or calm. It just follows the reward map it’s built through experience (Brewer, 2019).

This is why willpower-based approaches to behavior change feel like pushing a boulder uphill. You can do it for a while, but eventually the boulder rolls back. It’s not a character flaw. It’s a neural architecture problem.

The solution isn’t to push harder. It’s to update the reward map itself.


How Do You Actually Change a Habit Your Brain Thinks Is Rewarding?

This is the question I’ve spent my career studying. And the answer comes directly from understanding reward-based learning.

If habits are formed when the brain learns that a behavior is rewarding, habits can be changed when the brain learns that the behavior is no longer rewarding. The mechanism is the same: you’re just using it in reverse.

But here’s the key: you can’t think your way to this update. Telling yourself “smoking is bad” doesn’t update the OFC. Reading an article about the dangers of anxiety doesn’t update the OFC. Only direct, present-moment experience updates the OFC (Ludwig, Brown & Brewer, 2020).

This is where awareness comes in.

The Power of Paying Attention

When you bring careful, curious attention to the actual experience of your habit (not the idea of the habit, but the real-time, felt experience) you give your brain new data. And new data triggers reward prediction errors.

Here’s a real example from my smoking cessation research: I asked a participant who had smoked for 20 years to smoke a cigarette while paying very close attention to the taste, smell, and body sensations. No judgment. Just noticing.

She said it smelled like chemicals and tasted terrible.

She’d never noticed before, because the habit was running on autopilot. But when she paid attention, her OFC got new information: this behavior isn’t actually as rewarding as I recorded. That’s a negative prediction error. The reward value dropped. The habit weakened.

In our studies, this kind of awareness-based approach led to significantly greater reductions in smoking compared to standard treatment (Brewer et al., 2011). In separate research on emotional eating, we found a 40% reduction in craving-related eating when participants brought awareness to the actual experience of eating (Brewer et al., 2018).

From Awareness to the Three Gears

This mechanism (using awareness to update reward values) is the foundation of what I call the Three Gears of habit change:

  • Gear 1 (Map): Identify the trigger-behavior-reward loop. Make the unconscious habit conscious.
  • Gear 2 (Curiosity): Get curious about the actual reward. What does this behavior really feel like? Is it as rewarding as your brain thinks?
  • Gear 3 (Bigger Better Offer): When the old reward loses its luster, your brain naturally seeks something better. Curiosity itself (the intrinsic reward of awareness and presence) becomes the replacement.

The Three Gears aren’t a workaround or a hack. They’re applied reward-based learning. You’re using the same brain mechanism that created the habit to change it (Ludwig, Brown & Brewer, 2020).

The difference between this approach and willpower is fundamental. Willpower fights the habit from the outside. Awareness changes it from the inside.


Reward-Based Learning in Action: Three Examples

Smoking: The First Cigarette vs. the 10,000th

How the habit forms: The first few cigarettes produce a powerful reward signal. Nicotine stimulates dopamine release in the nucleus accumbens. The brain registers: “This substance = reward.” The OFC updates. The striatum encodes the motor pattern. Within weeks, you’re reaching for a cigarette without thinking.

Why it persists: By cigarette #10,000, the actual experience is barely pleasurable. But the habit is encoded. The trigger (stress, boredom, social cue) fires the automatic response. You smoke not because it feels good, but because your brain expects it to.

How awareness changes it: When you smoke with full attention (noticing the taste, the smell, the feeling in your lungs) you create a gap between the brain’s expectation (“this will feel good”) and the actual experience (“this tastes like chemicals”). That gap is a negative reward prediction error. The OFC updates. The habit begins to dissolve.

Anxiety: Why Worry Seems to “Work”

How the habit forms: You face an uncertain situation (a presentation, a health concern, a relationship conflict). Your mind starts rehearsing worst-case scenarios. The worry feels productive, like you’re preparing for something. The brain registers this: worry = temporary sense of control = reward. The anxiety habit loop is born.

Why it persists: Worry becomes the default response to uncertainty. The trigger (any uncertain situation) automatically produces the behavior (worry) because the OFC has catalogued it as “rewarding” (it creates an illusion of control). You worry about worrying. The loop deepens.

How awareness changes it: When you bring curious attention to what worry actually feels like in your body (the tight chest, the racing thoughts, the exhaustion) you discover something: worry doesn’t feel like control. It feels like suffering. The OFC gets new data. The perceived reward value of worry drops. The habit weakens.

Phone Checking: The Notification Dopamine Loop

How the habit forms: A notification produces a small dopamine hit: new information, social validation, novelty. The brain registers: checking phone = reward. The trigger becomes broader over time: not just notifications, but boredom, anxiety, any moment of stillness.

Why it persists: Social media platforms are designed to maximize reward prediction errors through variable reinforcement schedules. Sometimes you get something interesting, sometimes you don’t. This unpredictability keeps dopamine firing and the OFC assigning high reward value.

How awareness changes it: When you notice what scrolling actually feels like after 20 minutes (the glazed eyes, the mental fog, the vague guilt) you’re updating the reward value in real-time. The OFC recalibrates: this isn’t as rewarding as expected. The urge to check becomes less automatic.


Understanding the Mechanism Changes Everything

Most habit-change advice starts with the behavior: “Stop doing that. Start doing this.” But behavior is the output, not the input. The input is reward-based learning: the brain mechanism that decides which behaviors are worth repeating.

When you understand reward-based learning, you stop fighting your habits and start working with your brain. You stop relying on willpower (which fights the reward signal) and start using awareness (which updates it).

This is the science behind the Three Gears and our clinical research at Brown University.

You don’t need to become a neuroscientist to use this. You just need to start paying attention.


What To Do Next

1. Start With Awareness

Pick one habit you want to change. The next time it fires, pay close attention to what the reward actually feels like. Not what you think it should feel like: what it actually delivers.

2. Map Your Habit Loop

Identify the trigger, the behavior, and the reward your brain thinks it’s getting. Write it down. Making the unconscious conscious is the first step.

3. If You Want Structured Support

If reward-based habits are interfering with your daily life, consider working with a therapist experienced in habit change and reward-based learning.

And if you want a program that applies the Three Gears to anxiety and compulsive behaviors specifically, with daily guidance and community support, Going Beyond Anxiety was built on this exact framework.


This article is for educational purposes and does not constitute medical advice. If you are experiencing addiction, anxiety, or other mental health conditions, please consult a qualified healthcare provider.



Free: 2026 Behavior Change Guide

Get Dr. Jud's latest guide based on his TED Talk, plus a 10-minute guided audio exercise and access to his newest research.

Get the Free Guide

Going Beyond Anxiety

Dr. Jud's cutting-edge anxiety reduction program that combines the latest neuroscience from his lab with compassionate coaching to help people control their anxiety, end worry habits, and learn to flourish.

Learn More
reward-based learning behavior change Three Gears