What Does the Internet Do to the Brain?

Mapping Cortical Activation Fingerprints Across Digital Content Modalities Using a Deep fMRI Encoder

Does scrolling news light up your brain like watching a video, or reading a story? We used an AI brain model to find out which parts of the cortex wake up for 13 kinds of online content.

Activation Cartography maps 3,008 natural language stimuli across 13 internet content categories against predictions from TRIBE v2 (a 177M-parameter deep neural encoder trained on real fMRI recordings), revealing statistically significant, category-level differences in predicted cortical recruitment.

3,008 stimuli evaluated

13 content categories

20,484 cortical surface points

View Full Paper View Code Start Reading

Research Snapshot

Study design, encoder architecture, and key statistical findings.

F = 13.51 ANOVA main effect

96.9% variance in PC1

177M encoder parameters

4× semantic spread gain

Encoder TRIBE v2: deep neural fMRI encoder predicting whole-cortex haemodynamic responses

Key result Significant main effect: F(12, 2995) = 13.51, p < 10²&sup6;, η² = 0.051. Cohen’s d = −0.82 between highest and lowest categories.

Theory GWT receives strongest support; FEP weakly supported; DCT and IIT mixed evidence

Concept Overview

Abstract

Background

Despite extensive single-stimulus neuroscience on emotional, narrative, and threatening media, no large-scale comparative study exists of how distinct categories of internet content differentially engage the cortex at scale.

Past brain studies have looked at one thing at a time: a scary clip here, a sad story there, a single emotional jolt. Nobody has lined up the many kinds of stuff we actually scroll through online and compared them side by side. That is the gap this project fills.

Methods

Activation Cartography maps 3,008 natural language stimuli across 13 internet content categories against predictions from TRIBE v2, a 177M-parameter deep neural encoder trained on real fMRI recordings that predicts whole-cortex haemodynamic responses. Each stimulus yielded a predicted activation profile across 20,484 cortical surface points, summarised into six anatomical regions.

We collected 3,008 short text snippets (a mix of headlines, posts, stories, and more) sorted into 13 content categories. We then fed each snippet through TRIBE v2, an AI model with 177M tunable knobs that was trained on real fMRI brain scans. The model predicts how blood flow (the signal fMRI tracks) would shift across 20,484 points on the brain’s surface, which we group into six big regions.

Results

A one-way ANOVA revealed a significant main effect of content type (F(12, 2995) = 13.51, p < 10²&sup6;, η² = 0.051). ThreatSafety content ranked highest and Narrative lowest (Cohen’s d = −0.82). A dominant cortical gradient (PC1 = 96.9% variance) contrasts sensory-language against executive-motor cortex across all categories.

A standard statistical test (a one-way ANOVA) confirms the content type really matters (F(12, 2995) = 13.51, p < 10²&sup6;, η² = 0.051). Scary or threatening posts light up the brain the most. Stories light it up the least (Cohen’s d = −0.82, a big gap). One pattern dominates everything: a tug-of-war between the brain’s sense-and-language side and its planning-and-movement side, explaining 96.9% of the variation. That tug-of-war shows up no matter which category you look at.

Implications

Different internet content categories engage distinct brain circuits with statistically significant differences in predicted intensity. GWT’s prediction that threat-laden content drives broad cortical activation received the strongest support. The analysis pipeline and registered hypotheses are released with the project.

Different kinds of online content really do switch on different brain circuits, and the differences in intensity are not just noise. The theory that fits best is GWT, the idea that threatening content commands a brain-wide broadcast. The full code, data, and the predictions we made before running anything are public.

Introduction

Digital media consumption has become a defining feature of contemporary cognitive life. Recent estimates indicate the average adult consumes six to eight hours of digital content per day, a duration that exceeds sleep for many subpopulations. A fundamental empirical question follows: whether distinct categories of internet content engage the cerebral cortex equivalently, or whether systematic, category-level differences in predicted neural recruitment can be identified at scale.

Most of our waking mental life now runs through a screen. Estimates put the average adult at six to eight hours of digital content a day, more than many of us spend asleep. So a fair question is: does a news headline, a meme, and a short story all hit the brain the same way? Or do they reliably pull on different circuits in ways we can actually measure?

Activation Cartography

Rather than running a single-category fMRI study, Activation Cartography uses a validated deep neural encoder (TRIBE v2) to predict whole-brain responses at scale. This enables a 13-category comparative study with 3,008 stimuli that would be logistically impossible with real scanner time.

Instead of putting a few people in an fMRI machine and showing them one type of content, Activation Cartography uses TRIBE v2 (a tested AI model that predicts whole-brain responses) to do the work at scale. That lets us compare 13 categories across 3,008 snippets, which would never be possible with real scanner time and human volunteers.

The neuroscience of media consumption has historically been constrained by the throughput of fMRI acquisition: each participant yields a few hundred trials per session, making large comparative studies prohibitively expensive. TRIBE v2 breaks this barrier by predicting whole-cortex haemodynamic responses from text inputs, enabling population-scale analysis of content-type effects on predicted brain activation.

Brain science about media has always hit the same wall: fMRI is slow and expensive. A single person in a scanner gets through only a few hundred items per session, so big comparison studies are out of reach. TRIBE v2 jumps that wall. Feed it text, and it predicts how the whole brain would respond. That makes it possible to study content effects at a scale real scanners cannot reach.

Four neuroscientific frameworks are evaluated against the activation patterns: Global Workspace Theory (GWT), Free Energy Principle (FEP), Default-mode Circuit Theory (DCT), and Integrated Information Theory (IIT). Each makes distinct predictions about which content types should drive the broadest or most intense cortical recruitment.

We compare the results against four big ideas in brain science: Global Workspace Theory (GWT), the Free Energy Principle (FEP), Default-mode Circuit Theory (DCT), and Integrated Information Theory (IIT). Each one makes a different bet about which kind of content should set off the widest or strongest brain response.

Methods

Stimulus Construction

3,008 stimuli drawn from established NLP benchmarks and live internet sources, distributed across 13 content categories: ThreatSafety, News, Social, Scientific, Narrative, Emotional, AudioText, ImageVisual, Educational, Persuasive, Humour, Instructional, and Commerce. Each stimulus is a short natural language passage (1–3 sentences).

We pulled 3,008 short text snippets from well-known language research datasets and from the live web, then sorted them into 13 content categories: ThreatSafety, News, Social, Scientific, Narrative, Emotional, AudioText, ImageVisual, Educational, Persuasive, Humour, Instructional, and Commerce. Each snippet is just 1–3 sentences long, roughly the length of a tweet or a caption.

3,008 Stimuli 13 Categories NLP Benchmarks

TRIBE v2 Encoder

TRIBE v2 is a 177-million-parameter deep neural encoder trained on real functional MRI recordings. Given a text input, it predicts a whole-cortex haemodynamic response across 20,484 cortical surface points. Two encoding modes are used: hash-mode (fast, token-level) and semantic-mode (LLaMA-3.2-3B embeddings, N = 390 replication sample).

TRIBE v2 is an AI model with 177 million tunable knobs, trained on real brain scans (fMRI) so it can guess how the brain would react to text. Hand it a sentence and it predicts a brain-wide response across 20,484 points on the cortex. We run it in two flavors: hash-mode, which is fast and looks at the words themselves, and semantic-mode, which uses a small language model (LLaMA-3.2-3B) to grasp the meaning, tested on 390 of the snippets as a double-check.

177M Parameters 20,484 Cortical Points LLaMA-3.2-3B

Statistical Analysis

Each stimulus’s 20,484-point activation profile is summarised into six anatomical regions. A one-way ANOVA tests the main effect of content type on mean global activation. Effect sizes reported as η² and Cohen’s d. Principal component analysis across category mean profiles identifies dominant cortical gradients.

For each snippet, we shrink the 20,484-point brain map into six big anatomical regions to keep things manageable. A one-way ANOVA (a basic test for “does the group really make a difference?”) checks whether content type changes the overall brain response. We report effect sizes (η² and Cohen’s d) so it’s clear how large the differences are, not just whether they exist. Principal component analysis (a way of finding the strongest underlying pattern) pulls out the dominant cortical gradient.

One-way ANOVA Cohen’s d PCA

Results

A one-way ANOVA on predicted global activation revealed a statistically significant main effect of content type across all 13 categories.

The headline finding: content type really does make a difference. The standard statistical test (a one-way ANOVA) showed a clear, reliable effect across all 13 categories.

ANOVA: F(12, 2995) = 13.51, p < 10²&sup6;

Under hash-mode encoding, ThreatSafety content ranked highest and Narrative lowest (Cohen’s d = −0.82). The semantic replication (N = 390, LLaMA-3.2-3B) produced a 4× wider activation spread, with AudioText, ImageVisual, and Emotional leading, a ranking essentially uncorrelated with hash-mode ordering (r = 0.09).

In hash-mode (the fast, word-level version), ThreatSafety content tops the chart and Narrative sits at the bottom (Cohen’s d = −0.82, a hefty gap). The meaning-aware semantic run (N = 390, using LLaMA-3.2-3B) spreads things out 4× wider, and a different cast leads: AudioText, ImageVisual, and Emotional. The two rankings barely match (r = 0.09), so the two modes are picking up on very different things in the same text.

Dominant cortical gradient (PC1 = 96.9% variance) contrasts sensory-language cortex (high loading: auditory, visual, language network) against executive-motor cortex (low loading: prefrontal, motor) across all 13 categories. This gradient is consistent across encoding modes despite the different category rankings.

One pattern dominates everything (PC1 = 96.9% variance): a tug-of-war between the brain’s sensing-and-language side (hearing, vision, language areas) and its planning-and-doing side (the prefrontal cortex, which weighs decisions, and the motor cortex, which moves the body). That tug-of-war shows up in all 13 categories. Even though the two encoding modes rank categories differently, this underlying pattern stays the same.

Regional breakdown shows that ThreatSafety content activates the language network and prefrontal cortex most strongly under hash-mode, while AudioText and ImageVisual content drives the largest visual and auditory cortex responses under semantic encoding. This suggests hash-mode captures surface lexical features while semantic-mode captures deeper representational content.

Zooming in by brain region: in hash-mode, ThreatSafety content lights up the language areas and the prefrontal cortex (the brain’s decision-maker) the most. In semantic-mode, AudioText and ImageVisual content drive the biggest responses in the visual and auditory areas. The likely reason: hash-mode reacts to the words themselves, while semantic-mode reacts to what those words actually describe.

Robustness Checks: Ranking is Stable, Not Just Surface Text

Two checks confirmed the semantic-mode ranking is not an artefact. (1) seq_len sweep: the per-CT ordering is essentially identical across temporal integration windows seq_len = 4, 8, 16 (Spearman ρ ≥ 0.956, all p < 0.001). (2) LSA Mantel test: TRIBE’s representational geometry is near-zero correlated with surface text similarity (Mantel r = 0.055, p = 0.340), confirming the encoder captures stimulus-specific neural coding. Vertex-level analysis (20,484 independent ANOVAs) further shows the discriminative effect is whole-brain: 99.8% of vertices are significant at p < 0.001, with the top vertex reaching F = 71.05 (5.3× the six-region aggregate).

Is the ranking real? Two checks say yes. First, we ran the model with different time windows and the category order barely budged (ρ ≥ 0.956). Second, we checked whether the ranking is just echoing how similar the texts look on the surface (using a completely separate text-similarity method called LSA). The answer is no (Mantel r = 0.055) — TRIBE is picking up on something deeper. We also checked each of the 20,484 brain points independently: 99.8% show a significant category effect, and the hottest single point has 5.3× more discriminating power than the six-region average.

Cross-source robustness confirms category effects are not dataset artefacts. For each category represented in two or more independent sources, an ICC-style reproducibility index was computed. The mean ICC across 12 categories is 0.89; 10 of 12 fall in the good-to-excellent range. Narrative is the exception (ICC = 0.67), consistent with its heterogeneous sourcing (HellaSwag completions vs TinyStories).

Different data sources, same story. We checked whether the results were just an accident of which dataset we used. For categories that come from multiple independent sources, we measured how well the results reproduce. The average score is 0.89 out of 1.0 (excellent). Only Narrative is shaky (0.67), because the two types of stories we used behave quite differently.

Extended multivariate analyses further confirm content-type profiles are discriminable above chance: linear SVM classification achieves significant cross-validated accuracy, hierarchical clustering reveals a structured taxonomy (sensory categories cluster separately from semantic ones), and mutual information analysis shows distinct information carried by each cortical region — consistent with the view that different content types engage specialised neural subsystems rather than a single undifferentiated response.

Machine learning confirms the pattern. We trained a simple classifier on the brain responses and it could tell which content category a snippet belonged to, well above chance. Clustering the response patterns reveals a sensible family tree: sensory categories (AudioText, ImageVisual) group together, and semantic categories form their own branches. Different brain regions carry distinct information about the content, confirming the idea that each type of content engages its own mix of specialised circuits.

Theory Evaluation

Four neuroscientific frameworks were assessed against predicted activation patterns, each making distinct testable predictions about which content types should drive the broadest cortical recruitment.

We checked the results against four big theories of how the brain works. Each one makes a specific bet about which kind of content should set off the widest brain response, predictions we can actually test against our data.

Global Workspace Theory: Strongest Support

GWT predicts that threat-laden content should trigger a global broadcast, driving widespread cortical ignition. ThreatSafety content ranking highest under hash-mode (d = −0.82) directly supports this. The finding that a single dominant gradient accounts for 96.9% of between-category variance is also consistent with GWT’s single-workspace model.

GWT pictures the brain as a stage. When something matters (like a threat) it gets broadcast to the whole stage, lighting up many regions at once. Our results fit: ThreatSafety came out on top in hash-mode by a big margin (d = −0.82). And the fact that a single pattern explains 96.9% of the differences between categories also fits GWT’s idea of one central stage. Of the four theories, GWT comes out looking the strongest.

Free Energy Principle predicts prediction-error-rich content (novel, surprising, uncertain stimuli) should drive higher activation. The moderate support observed is consistent: ThreatSafety and News content (high surprise value) rank highly, but the correlation with uncertainty proxies is weak (r ≈ 0.3).

Free Energy Principle: this theory says the brain is constantly guessing what comes next, and content that surprises it (the unexpected or uncertain) should fire harder. Partial fit: ThreatSafety and News are both high on surprise, but the link between “surprising” and “brain lights up” is only modest (r ≈ 0.3).

Default-mode Circuit Theory predicts narrative and self-referential content should activate default mode network most strongly. This receives mixed evidence: Narrative ranked lowest under hash-mode, but higher under semantic encoding. This suggests encoding mode mediates the narrative-DMN link.

Default-mode Circuit Theory: the default mode network is the set of brain regions that hums along when you’re daydreaming or thinking about yourself. This theory says stories and self-focused content should turn it on the most. Our results are split: Narrative ranked dead last in hash-mode but climbed when we used semantic-mode. So the link only shows up if the model is reading for meaning, not just words.

Integrated Information Theory predicts content with higher integrated information (Φ) should drive more cortical activation. IIT receives mixed evidence: there is no reliable proxy for Φ in natural language stimuli, making this prediction untestable at current resolution.

Integrated Information Theory: IIT measures “richness of experience” with a number called Φ, and predicts that richer content should activate more of the brain. The honest verdict is: we cannot really tell yet. There is no good way to measure Φ in a short piece of text, so this prediction is essentially untestable with the tools we have today.

Quantitative Theory Scorecard

A formal numerical evaluation (1.0 = confirmed, 0.5 = partial, 0.0 = not confirmed) across six predictions produces the following aggregate scores: GWT scores highest (0.58), driven by ThreatSafety’s consistent top ranking. IIT and DCT tie at 0.33, while FEP scores 0.25. These scores establish a principled baseline against which future semantic-replication results can be compared.

Numbers confirm the story. We turned each theory prediction into a numerical score (1.0 for confirmed, 0.5 for partial, 0 for not confirmed). GWT leads at 0.58 out of 1.0, thanks to ThreatSafety persistently coming out on top. IIT and DCT tie at 0.33, and FEP trails at 0.25. These are not final grades, but a benchmark for the next round with meaning-aware encoding.

Conclusion

Activation Cartography demonstrates that different internet content categories engage distinct brain circuits with statistically significant differences in predicted intensity (F(12, 2995) = 13.51, p < 10²&sup6;, η² = 0.051). The dominant cortical gradient (sensory-language vs executive-motor) is stable across encoding modes and accounts for 96.9% of between-category variance.

Activation Cartography shows that different kinds of online content really do switch on different brain circuits, and the differences in intensity are big enough to take seriously (F(12, 2995) = 13.51, p < 10²&sup6;, η² = 0.051). The strongest pattern (a tug-of-war between the brain’s sense-and-language side and its planning-and-doing side) holds up no matter how we run the model, and accounts for 96.9% of the differences between categories.

The encoding-mode dependence of category rankings (r = 0.09 between hash-mode and semantic-mode) is the study’s most important methodological finding: surface lexical features (hash-mode) and deep semantic representations (semantic-mode) produce systematically different activation predictions, suggesting that fMRI encoding models are sensitive to which level of linguistic representation is used as input.

The most important method lesson is that the two modes barely agree on which categories rank highest (r = 0.09). Reading words on the surface (hash-mode) and reading them for meaning (semantic-mode) give very different brain predictions. That is a warning to anyone using these AI brain models: the answer you get depends a lot on how you describe the input.

Robustness extensions confirm the core result. Six extension analyses collectively support a nuanced picture: (i) the discriminative effect is whole-brain and 5× stronger at vertex resolution than the six-region aggregate; (ii) category effects reproduce across independent data sources (mean ICC = 0.89); (iii) the activation spectrum is a smooth continuum; (iv) the semantic-mode ranking (AudioText > ImageVisual > Emotional) is stable across temporal integration windows (ρ ≥ 0.956) and independent of surface text similarity (Mantel r = 0.055, p = 0.340). Extended multivariate analyses confirm category profiles are discriminable above chance with a structured hierarchical organisation.

Six extra checks, same big picture. We ran six additional analyses to stress-test the results. The category signal holds up across all 20,484 brain points individually (5× stronger than the big-region average). It holds up when we swap the data sources. It holds up when we change the model’s time window. And it does not come from surface text similarity. A machine-learning classifier can tell which content category a snippet belongs to from its brain response alone, and the response patterns form a sensible family tree.

Future directions include higher-powered semantic replication (N ≥ 150 per category) to resolve the hash/semantic discrepancy, extension to multimodal stimuli (images, audio) using TRIBE v2’s full multimodal encoder, independent cross-encoder validation with BrainBERT or the Huth semantic atlas, and pre-registration of the category-ranking hypothesis for confirmatory testing.

Next steps: run a bigger semantic-mode pass (at least N ≥ 150 per category) to settle which mode is closer to the truth, push beyond text by feeding TRIBE v2 real images and audio, check the results against a completely different AI brain model to make sure it is not just a TRIBE quirk, and lock in the category rankings as a prediction up front so a future study can confirm or refute them cleanly.

Related Research

Back to Portfolio