September 12-13, 2023
Washington, DC

The pandemic demonstrated that there is strong public benefit derived from researchers having prompt access to a variety of data sources, such as data from public and government bodies, as well as private companies (in particular, tech companies). There is also significant interest in how to connect and link the different data sources. The Forum addressed the evolution of researcher access to data; best practices and lessons learned from fields that are on the forefront of data sharing (i.e., climate studies, astrophysics, biomedicine); and challenges related to pressing societal problems such as online information (and misinformation), modeling for pandemics, and using data in emergencies.


Keynote Address – Data for the Public Good: Advancing Researcher Access and Innovation in National Statistics

Sir Ian Diamond • UK’s National Statistician, Office for National Statistics

In this keynote, the UK’s National Statistician, Professor Sir Ian Diamond, explored the Office for National Statistics’ (ONS) commitment to data for the public good and the lessons learned from a distinguished career, including his time as Chief Executive of the Economic and Social Research Council (ESRC) and leadership on data for decision-making during the COVID-19 pandemic. The ONS is pioneering innovative approaches to generating robust statistics from various sources, while enabling researcher access to data through the Secure Research Service, Data Science Campus, and the Integrated Data Programme. They are also contributing to international data access work through engagement with the United Nations and other international partners. The address concluded with reflections on the current and future challenges surrounding researcher access to data, emphasizing the importance of understanding user needs and collaborating with partners to achieve the goal of providing timely and reliable data for the public good.

 


Session 1: Volume and Heterogeneity: Addressing Usability Challenges Between Research Communities

Session Chair:
Feryal Ozel • Professor and Chair, School of Physics, Georgia Institute of Technology

Speakers:
Leanne Guy • Data Management Scientist, Vera Rubin Observatory/NOIRLab
Casey Greene • Founding Director, Center for Health AI, University of Colorado
Megan Cromwell • Assistant Chief Data Officer, NOAA/National Ocean Service
Pier Luigi Buttigieg • Principal Investigator and Senior Data Scientist, Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research

This session explored how new developments and approaches, from organizational techniques to federated systems and AI-driven tools, can help researchers overcome the access barriers caused by very large or heterogeneous datasets. The session drew upon real-world examples in the fields of astrophysics, environmental science, and molecular dynamics to characterize these challenges, examining possible responses to managing data volume, and curating data that lacks uniformity. Attendees left with an understanding of how large-scale data processing is helping to solve important societal challenges, how approaches to access (including new developments in AI) and standardization might be able to help, and how supercomputing is being used in the US and the UK.

 


Session 2: Sharing and Processing Health Data: Lessons from Large-Scale Health Data Initiatives

Session Chair:
Arturo Casadevall • Professor and Chair, Molecular Microbiology and Immunology, Johns Hopkins Bloomberg School of Public Health

Speakers:
Michael Worobey • Professor and Head, Dept. of Ecology and Evolutionary Biology, University of Arizona
Peter Stokes • Director of Platform Development, OpenSAFELY/University of Oxford
Christl Donnelly • Professor of Applied Statistics, University of Oxford

This session addressed health data, which includes information about people’s behaviors and location, their phenotype and genotype, medical history, and response to clinical trials. Whether gleaned from mobile apps or patient records, insights from health data are critical for advancing scientific research. Health data also played a critical role in understanding and responding to the COVID-19 pandemic. However, accessing and processing health data can entail legal, ethical, and technical challenges. Drawing on case studies from the U.S. and the U.K., this session explored approaches to health data collection, access, and sharing. In doing so, discussions addressed key issues around privacy and data governance. It also explored how effectively the FAIR principles are applied in different jurisdictions, and how this can be improved upon for future health research, including in emergencies. Attendees left with an understanding of shared lessons learned about sensitive health and patient data: how health data access and use is managed in different contexts, how trusted research environments work, and how public trust can be fostered for sensitive health data sharing.

 


Session 3:  The Nature Emergency: Data for Net Zero, Biodiversity and Climate Adaptation

Session Chair:
Gina Neff • Professor and Executive Director, Minderoo Centre for Technology & Democracy, University of Cambridge

Speakers:
Loic Lannelongue • Research Associate, Biomedical Data Science & Green Computing, University of Cambridge
Jeremy Freeman • Executive Director, Carbon Plan
Lydia Jennings • Presidential Postdoctoral Fellow Arizona State University; Research Fellow, Duke University

This session explored data availability, accessibility and challenges in integrating such data as essential features of research to measure, mitigate, and explore scenarios for the climate (and nature) emergencies, from severe weather events to biodiversity loss and air pollution. Covering environmental datasets, including earth observation data, carbon emissions (at business and individual levels), and energy use (buildings and homes), this session foregrounded the challenges researchers face in accessing data in understanding and addressing the impacts of climate change. Attendees left with an understanding of the opportunities presented in various environmental and ecological datasets for designing and analyzing the efficacy of climate mitigations, as well as the challenges associated with national security, surveillance and privacy.

 


Session 4:  Privately Held Data: Opportunities, Challenges, and Lessons for Researchers

Session Chair:
Gina Neff • Professor and Executive Director, Minderoo Centre for Technology & Democracy, University of Cambridge

Speakers:

Henry T. (Hank) Greely • Professor, School of Medicine and Director, Center for Law and the Biosciences, Stanford University
Cyndi Grossman • Senior Director, Biogen Digital Health
Uyi Stewart • Chief Data and Technology Officer, Data.org
Gavin Starks • Founder and CEO, Icebreaker One

This session explored data collected by private companies, which often contain useful insights that can help alleviate major societal challenges including climate change, healthcare, food security, and disinformation. Accessing this data, however, can be costly, controversial, and unreliable. Solving these challenges could help unlock vast amounts of data for researchers, providing novel insights and better guidance for policymakers. With lessons from social media platforms, mobile health applications, and retailers, this session highlighted best practice for accessing data held by private companies and considered solutions to the challenges of commercial sensitivity, data protection, and emergency preparedness. Attendees left with an understanding of the value to be gained from privately held data, topical data security challenges, and an overview of partnership enhancing technologies.

 


Session 5:  The Role of Data Institutions in Data Access

Session Chair:
Sir Nigel Shadbolt • Principal and Professorial Research Fellow in Computer Science, University of Oxford

Speakers:
Hyon Kim • Program Director, Data.gov
Margaret Levenstein • Director, ICPSR, University of Michigan
Meredith Goins • Executive Director World Data System
Sylvie Delacroix • Professor in Law and Ethics, Birmingham Law School

This session explored data institutions, which are organizations or arrangements for facilitating researcher access to data through data stewardship; archives and statistics agencies are some of the oldest examples. Data institutions operate in a variety of ways, from combining or linking data from multiple sources to creating open access data sets or maintaining standards. This session explored the roles of institutions such as data repositories, federated data systems, and data commons. Attendees left the session with awareness of how data institutions can support scientific research and how researchers can collaborate with these institutions for access to data.

 


Session 6:  Openness and Data Availability in Academic Research: Tensions and Possibilities

Session Chair:
Frank Kelly • Emeritus Professor of the Mathematics of Systems, University of Cambridge

Speakers:
Chris Marcum • Senior Statistician and Senior Science Policy Analyst, U.S, Office of the Chief Statistician
Mila Rosenthal • Executive Director, International Science Reserve, New York Academy of Sciences
Johan Ugander • Associate Professor, Management Science & Engineering, Stanford University

This session drew from previous discussions to address cross-cutting themes related to data access for scientific research. Barriers to open data availability include tensions between open science and security/privacy, data subject intentions and repurposing of personal data, as well as cultural and practical disinclinations amongst academic researchers. However, the potential for data-driven research is interdisciplinary and cross-sector in nature: open research is increasingly encouraged by publishers, funders, universities, and governments. Considering the different definitions of open research across different jurisdictions, this session addressed the practical implications of open data practices and mandates (e.g., OSTP’s call for ‘free, immediate, and equitable access to federally funded research’). It also considered implications for useful data that is neither publicly funded nor research data, with critical attention to transparency in data provenance and collection methods. Attendees left with an understanding of how scientists can respond to open research requirements and to contribute views on best practice for open research in academia, both for meeting and thinking beyond regulatory requirements.

 


Forum Conclusion

Event Type

  • International Forum