Big Data

flavor image depicting a globe composed of data points

"Big data" is a term that has seen increasing use across many disciplines, under a wide variety of definitions. The phrase was added to the Oxford English Dictionary in 2013: "data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges; (also) the branch of computing involving such data."

Gil Press, a contributing writer for Forbes, outlines the origins of big data, as well as a variety of popular definitions for big data. One such definition comes from academic and business analyst Tom Davenport: "the broad range of new and massive data types that have appeared over the last decade or so."

Many experts in data-driven disciplines seem to agree that, if properly handled, big data has the power to revolutionize the way we understand our increasingly digital world. Big data can be classified across four dimensions: volume, variety, velocity and veracity. IBM’s 2011 infographic shows how these “Four V’s” can be applied to various types of big data.

The National Institute of Health launched the Big Data to Knowledge (BD2K) program in 2013 “… to support the research and development of innovative and transformative approaches and tools to maximize and accelerate the integration of big data and data science into biomedical research.” In 2016, the BD2K Program released THE FAIR Guiding Principles for scientific data management and stewardship, focused on making data sets “FAIR” or Findable, Accessible, Interoperable, and Reusable. For a summary of the BD2K program initiatives and access to additional BD2K resources, please see the NIH BD2K Flyer.

"For me, the technological definitions (like "too big to fit in an Excel spreadsheet" or "too big to hold in memory") are important, but aren't really the main point. Big data for me is data at a scale and scope that changes in some fundamental way (not just at the margins) the range of solutions that can be considered when people and organizations face a complex problem. Different solutions, not just 'more, better.'" --Steven Weber, School of Information, UC Berkeley, datascience@berkeley

Past Big Data Workshops and Symposia

Below is a short list of past meetings concerning big data and innovations in data science. In addition, DSDR provides Examples of Big Data Initiatives and Funding Projects at various universities, companies, and organizations. These lists provide a brief introduction into the kinds of initiatives that are underway and should not be considered exhaustive.

  • The BD2K Guide to the Fundamentals of Data Science is a series of virtual lectures that occurred between 2016 and 2018 on the data science underlying modern biomedical research. All archived videos are available on YouTube.

    The International Union for the Scientific Study of Population organized a Scientific Panel on Big Data and Population Processes that in 2016-2018 conducted “… a series of activities (e.g., research and training workshops, seminars, conference sessions or side meetings) intended to:

    • promote communication and exchange between the communities of demographers and data scientists;
    • favor discussion of research questions and methodologies at the intersection of social media research and population studies;
    • stimulate scholars to think about how formal demographic methods can be applied to big data research;
    • provide opportunities for training to students and young researchers;
    • increase the visibility of demography as a discipline, in the context of big data research and stimulate attention for population studies in scientific communities related to information science.”
  • The 2018 BD2K Behavioral and Social Sciences (BSS) and Big Data Workshop focused on encouraging discussion and collaboration between computational big data and informatics researchers.
  • The Networking and Information Technology Research and Development (NITRD) Program "provides a framework in which many Federal agencies come together to coordinate their networking and information technology (IT) research and development (R&D) efforts" (NITRD website). The 2015 NITRD Big Data Strategic Initiative Workshopbrought together an interdisciplinary core of leaders to work on creating a Federal Big Data Research Agenda. More recently, the 2017 Big Data Workshop: Measuring the Impact of Digital Repositories focused on identifying “current metrics, tools and practices that are effective, and the issues that will require additional research.”  
  • Georgetown University and the White House sponsored the Improving Government Performance in the Era of Big Data: Opportunities and Challenges for Federal Agencies workshop at Georgetown in June 2014. Speakers focused on the opportunities and challenges ahead for federal agencies in light of the increasing availability of massive data sets. Videos of the panel discussions and more information about the speakers are available.
  • Since 2014, the Michigan Institute for Data Science at the University of Michigan (MIDAS) has hosted an annual symposium focused on big data.  The symposiums bring together data science experts from around the world to discuss various innovative methodologies and applications of big data. Videos of many talks are available on the respective MIDAS annual symposium pages.

References and Other Resources

Allen, Corey. 2015. "How Big Data Can Improve Healthcare." UBC News, January 8. Retrieved March 11, 2015.

"big data." The Oxford English Dictionary. 2015. Retrieved March 11, 2015.

Boyd, Danah, and Crawford, Kate. 2012. "Critical Questions for Big Data.Information, Communication and Society 15(5): 662-679.

Brooks, C. 2018. “In a Big Data World, Scholars Need New Guidelines for Research.Scientific American, May 4. Retrieved July 10, 2019.

"Community Cleverness Required." 2008. Nature 455(7209): 1.

Dutcher, Jenna. 2014. "What Is Big Data?" Berkeley School of Information. datascience@berkeley Blog. Retrieved March 11, 2015.

Executive Office of the President. 2014. Big Data: Seizing Opportunities, Preserving Values.

Graham R., Duncan. 2008. "Big Data: The Next Google." Nature 455(7209): 8-9.

Lynch, Clifford. 2008. "How Do Your Data Grow?Nature 455(7209): 28-29.

Metcalf, J., Keller, E., and Boyd, Danah. 2016. “Perspectives on Big Data Ethics, and Society.” The Council for Big Data, Ethics, and Society.

Metcalf, J., & Crawford, K. 2016. “Where are human subjects in Big Data research? The emerging ethics divide.” Big Data & Society 3, no. 1(1-14).  

Press, Gil. 2013. "A Very Short History of Big Data." Forbes Technology, May 9. Retrieved March 11, 2015.

Press, Gil. 2014. "12 Big Data Definitions: What's Yours?Forbes Technology, September 3. Retrieved March 11, 2015.

Ruggles, Steven. 2014. "Big Microdata for Population Research." Demography 51(1): 287-297.

National Institute of Standards and Technology. 2018. NIST Big Data Interoperability Framework: Volume 1, Definitions. Version 2. (Publication No. 1500-lrl). National Institute of Standards and Technology.

National Institute of Standards and Technology. 2015. NIST Big Data Interoperability Framework: Volume 2, Big Data Taxonomies. Final Version 1. (Publication No. 1500-2). National Institute of Standards and Technology.

National Institute of Standards and Technology. 2015. NIST Big Data Interoperability Framework: Volume 7, Standards Roadmap. Final Version 1. (Publication No. 1500-7). National Institute of Standards and Technology.

Waldrop, Mitch. 2008. "Wikiomics." Nature 455(7209): 22-25.

Zimmer, M. 2010. “But the data is already public”: on the ethics of research in Facebook.Ethics and information technology12(4), 313-325.

Zimmer, M. (2018). “Addressing Conceptual Gaps in Big Data Research Ethics: An Application of Contextual Integrity.Social Media + Society, 4(2).