How Did Researchers Manage to Read Movie Clips From the Brain? | 80beats

What’s the news:  In a study published last week, researchers showed they could reconstruct video clips by watching viewers’ brain activity. The video of the study’s results, below, is pretty amazing, showing the original clips and their reconstructions side by side. How does it work, and does it mean mind-reading is on its way in?

How to Read Movies From the Brain, in 4 Easy Steps:

1) Build the Translator. The researchers first had three people watch hours of movie trailers, tracking bloodflow in their brains—which is linked to what the neurons are up to, as active neurons use more oxygen from the bloodstream—with an fMRI scan. (All three subjects were part of the research team; over the course of the study, they had to be in the scanner for a looong time.) The team focused on brain activity in a portion of each person’s visual cortex, compiling information about how 4,000 different spots in the the visual cortex responded to various simple features of a movie clip. “For each point in the brain we measured, we built a dictionary that told us what oriented lines and motions and textures in the original image actually caused brain activity,” says Jack Gallant, the UC Berkeley neuroscientist who led the study. “That dictionary allows us to translate between things that happen in the world and things that happen in each of the points of the brain we measure.”

2) Test the Translator. The study participants watched yet more video clips, and the team double checked that their dictionary—a statistics-based computer model—worked for the new clips, too.

3) Add More Words to the Dictionary. The researchers wanted a larger database of clips-to-brain-activity translation, so they collected 18 million seconds of video from randomly selected YouTube clips. They then ran the movies through the computer model, generating likely brain activation responses for each second of video.

4) Translate! Initially, the “dictionary” was an encoding model, translating from a movie clip into brain activity. From there, it was a theoretically simple—through practically laborious—endeavor to make a decoding model, based on Bayesian probability, to translate brain activity into a clip. (Think turning an English-French dictionary into a French-English one; you have all the information you need, but there’s a lot of reshuffling to do.) Each subject then watched a new set of second-long video clips they’d never before seen. The computer model selected the 100 clips (from that 18 million seconds of YouTube) that would produce brain activity most similar to the second-long clip the subject had just seen. It then averaged the clips together, hence the blurry quality of the videos. (You can see a video showing all three reconstructions, one made from the brain activity of each subject, here.) If the team had been after clarity, rather than proof of concept, they could’ve made the images at least somewhat crisper, Gallant says, by putting programming muscle into it. They could have set it up so that if 90 of the 100 most similar clips had faces, for instance, it would match up the eyes, nose, and mouth of each face before averaging the videos, leading to a clearer picture.

What’s the Context:

  • This isn’t the first time researchers have looked inside the brain to see what someone else is seeing. A number of scientists, including Gallant, have been working on “neural decoding” (i.e., mind-reading) techniques like this one for over a decade. They’re slowly getting better at decoding what we’ve seen, advancing from distinguishing between types of images (face vs. landscape, for instance) to reconstructing still images to reconstructing moving video clips.
  • Decoding what someone sees is different from decoding what they’re thinking. The researchers were just looking at low-level visual processing (what lines, textures, and movements people saw), not higher-level thought like what the clips reminded them of, whether they recognized the actors, or whether they wanted to see the movies they watched trailers for. Those are far more complicated questions to tease out, and can’t be tracked feature-by-feature as easily as visual processing.
  • fMRI has a built-in time lag; the level of oxygen in the blood doesn’t change unti about 4 seconds after neuron activity, since blood flow is a slow process compared to neurons’ electrical firing. By building specific lag times into their model—not just what part of a clip an area responded to, but how long after the clip the response occurred—the researchers could track brain activity in much closer to real time.

The Future Holds: How Close Are We to Reading Images From Everyone’s Brain?

  • Such brain-decoding technologies may ultimately be helpful for communicating with people who can’t otherwise communicate, due to locked-in syndrome or a similar condition. “I think that’s all possible in the future,” Gallant says, “but who knows when the future’s going to be, right?” Such advances could easily be decades away because of the complex, very specific nature of these models.
  • The brain has between 200 and 500 of functional areas in total, Gallant says, about 75 of which are related to vision—and to translate what’s happening in a new area, you’d need a new dictionary. It’s not just a matter of the time and effort involved in making new models, either; we need to understand the brain better first. Scientists know a lot more about how basic visual processing works than higher-level functions like emotion or memory.

Reference: Shinji Nishimoto, An T. Vu, Thomas Naselaris, Yuval Benjamini, Bin Yu, and Jack L. Gallant. “Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies.”


Related Posts

Comments are closed.