Examples of Big Data Initiatives and Funding Projects

Here we outline a series of examples of the efforts being made in the academic community to explore various forms of big data, better understand big data's potential applications, and create infrastructure that accommodates the use of massive data sets.

  • In 2012, the National Institute of Health launched the Big Data to Knowledge initiative (BD2K), whose goal is to facilitate the use of big data in biomedical research. The following university programs have been designated Centers of Excellence for Big Data Computing by BD2K, and received funding awards for FY14:

    • University of Pittsburgh - The Center for Causal Modeling and Discovery of Biomedical Knowledge from Big Data aims to make computational methods of big data analysis more available, efficient, and easy for biomedical researchers to use (Cooper grant abstract).

    • University of Wisconsin - Madison - The Center for Predictive Computational Phenotyping (CPCP) works to develop new methods and software for the use of computational phenotyping, with special attention to breast cancer and Alzheimer's screening.

    • Stanford - The Mobilize Center focuses on big data collected from devices such as mobile phones and wearable sensors that describe human motion. The center aims to make these data more manageable and usable for research that addresses limitations in physical mobility. Meanwhile, the Center for Expanded Data Annotation and Retrieval (CEDAR) aims to create a consistent, easy-to-use metadata framework that can be implemented across all disciplines.

    • University of Illinois at Urbana-Champaign - The goal of the Knowledge Engine for Genomics project (KnowEnG) is to create a data analysis system that will allow researchers to conduct genomic data analysis "in the context of existing knowledge," rather than in isolation.

    • University of California - Santa Cruz - The Center for Big Data in Translational Genomics (abstract) aims to make the sharing of genomic data easier and facilitate the use of these data in the context of human health. This involves reworking the infrastructure available to accommodate massive genomic datasets, and ensuring that use of these data is included in clinical practice.

    • Harvard Medical School - Harvard is developing a toolkit to implement the Patient-Centered Information Commons (grant abstract), which aligns many forms of individual biomedical data at the local, regional, and national level. This will allow large-scale longitudinal research into disease risk and patient outcomes.

    • University of Memphis - The Center of Excellence for Mobile Sensor Data-to-Knowledge (MD2K) uses a collaborative approach to develop tools that facilitate the collection and use of digital mobility data.

    • University of California - Los Angeles - UCLA's Community Effort to Translate Protein Data to Knowledge: An Integrated Platform aims to advance data science tools for use in cardiovascular medicine.

    • University of Southern California - ENIGMA, or Enhancing Neuro Imaging Genetics through Meta Analysis, is a network of imaging genetics researchers. Network members intend to share promising data, attempt to replicate each other's findings, and facilitate training on new developments in the field. The Big Data for Discovery Science (BDDS) initiative focuses on the development of big data analysis tools and technologies.

  • U.S. Secretary of Commerce Penny Pritzker Announces New Collaboration to Unleash the Power of NOAA's (National Oceanic and Atmospheric Administration) Data

  • The Networking an Information Technology Research and Development (NITRD) Program houses the Big Data Senior Steering Group (BDSSG). This group helps coordinate big data projects across the federal government, and facilitates the goals of the White House Big Data R&D Initiative.

  • Genome British Columbia is hosting a funding competition called Sharing Big Data for Health Care Innovation: Advancing the Objectives of the Global Alliance for Genomics and Health. This competition is intended to encourage more collaborative use of genomic and clinical data.

  • The University of Michigan has begun developing several different initiatives to encourage the use of big data.

    • Advanced Research Computing (ARC) facilitates data-intensive research and houses several departments specializing in various aspects of this field, including computational science, data science, technology services, and consulting services.

    • In March 2014, Michigan's School of Information was awarded a grant called Seeding New Data Science Collaborations by the Gordon and Betty Moore Foundation. This 18-month $440,000 project is intended to increase the university's capacity to accommodate data science initiatives, with particular emphasis on interdisciplinary collaboration.

    • The university's Third Century Initiative is funding a project called Studying Social Behavior with Big Data: An Undergraduate Toolkit, which introduces undergraduate students to important lab tools, skills, and research questions that can be used to analyze big data. This project is headed by Elizabeth Bruch, a professor of Sociology and Complex Systems, and Jonathan Atwell, a Ph.D. candidate in Sociology.

  • The Berkeley Institute for Data Science (BIDS) at the University of California - Berkeley encourages collaboration between experts in diverse fields, such as life sciences and applied mathematics, in data-intensive research.

  • Pennsylvania State University recently established the Center for Big Data Analytics and Discovery Informatics, directed by Professor Vasant Hanover of the College of Information Sciences and Technology. The center aims to facilitate interdisciplinary research and training in the computational applications of big data and discovery informatics.

  • The University of North Carolina - Chapel Hill hosts the Integrated Cancer Information and Surveillance System (ICISS). The ICISS supports the use of large datasets to identify risk factors, prevention, and treatment options for cancer.

  • New York University, UC-Berkeley, and the University of Washington have formed a data partnership funded by the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation. This $37.8 million, five-year initiative encourages interactions between researchers from different backgrounds, long-term data-focused career paths, and the expansion of current analytical data tools and practices.

  • United Nations - The UN's Global Pulse initiative supports the collection of big data for use in sustainable development and humanitarian action. Global Pulse research combines traditional data with big data to track global change.

  • BigDataEurope - BigDataEurope is an international project aimed at creating infrastructure and tools to make big data more usable by European companies. Part of this goal includes providing companies with tools to create multilingual products and services that are flexible and transferable.