A sea change in functional neuroimaging towards computation-rich open science: building and mining analysis-ready data

Scott Makeig Presenter
University of California San Diego
San Diego, CA 
United States
 
Sunday, Jun 23: 9:00 AM - 6:00 PM
Educational Course - Full Day (8 hours) 
COEX 
Room: Grand Ballroom 104 
The dominant paradigm in academic research has long been for new results to be produced by individual investigator laboratories operating on data they collect themselves. The typical cycle is for small experiments to be proposed, recorded, analyzed (typically, using a single measure), and published without regard for further use of the collected data. This cycle has two inefficiencies. One is statistical: marginal results obtained from small studies may be difficult to reproduce. Another is economic: discarding carefully collected data before its information content has been fully mined is inherently wasteful.
Currently, therefore, more research funding is being directed toward large studies whose information can be mined by many computationally oriented investigators. When (and only when) the data are made available in a form that is truly analysis-ready, this can be highly productive. However, current statistical methods including machine learning demonstrate the power of richly varied data for learning to emulate, predict, and diagnose activity in complex systems such as the brain. Thus, there is value in collecting and jointly mining smaller, both existing and new datasets collected using differing paradigms under differing conditions – again when those datasets have been made available in analysis-ready form.
Making human functional neuroimaging data analysis-ready has a major difficulty. Human experience, cognition and behavior are highly complex and time varying, thus the brain dynamics supporting them must be as well. This makes the problem of recording and describing what happened during human neuroimaging experiments both essential and challenging. Current practice followed by both experimentalists and imaging equipment manufacturers is to record only a quite sparse representation of what the participant(s) experienced during data recording – e.g., onset times of events whose types are recorded using ad hoc codes such as ‘Event-type 127.’ These codes typically differ from experiment to experiment – highly constricting full analysis of their event-related brain dynamics, both within and across datasets. The problem of annotating events using a common vocabulary and syntax is more acute for electromagnetic brain data, as its fine temporal grain enables study of dynamics supporting individual thoughts, actions, and reactions.
Remarkably, only one system for annotating events in time series data has been proposed and developed: the system of Hierarchical Event Descriptors (HED). ‘HED’ (or ‘H-E-D’) annotation is accepted in all the BIDS modality formatting standards, but is still little known or adopted. Developed as an open source community project on github (see hedtags.org), HED and its growing tool infrastructure is now ready for widespread use, broader community participation, and further extension. I will describe its structure, will illustrate its use, and will offer exercises for course students to familiarize them with using HED.