When Big Data Misleads

As U Bremen Excellence Chair holder, Mario Small studied the risks of deceptive large datasets

Research

Never before has it been so easy to collect and analyze large amounts of data as in the big data age. In particular, anonymized data from companies are a welcome source for research. But if the quantitative analysis of such company data is not combined with qualitative research methods, this can lead to serious misinterpretations, warns Professor Mario Small. Such conclusions can, for example, obscure social inequalities.

According to Small, even datasets that are widely used in research have significant problems. These are ignored by many social scientists, although they are highly likely to alter research findings. “The extent of these problems surprised me,” he says. As holder of a U Bremen Excellence Chair, Small spent six years researching the relationship between people’s real social networks and social inequality.

The holders of Excellence Chairs serve as bridges between the University of Bremen and leading scientific institutions worldwide. Small researched and taught at Harvard until 2021 and has been a professor at Columbia University in New York since 2022. His project in cooperation with SOCIUM, the Research Center on Inequality and Social policy at the University of Bremen, also aimed to find out how the use of big data in social research can be made more reliable.

Outdated Data, False Categories

Small and his team carried out empirical case studies on racialized differences in access to and use of financial institutions in various neighborhoods across the United States. On site, they encountered conditions that differed significantly from the company data. Small gives examples: “Sometimes, large datasets are poorly updated. For example, a dataset that supposedly contains the locations of all banks in every neighborhood in the U.S. may wrongly note that banks that closed long ago are still open.” In other cases, establishments were miscategorized, for example when a payday lender, an entity that provides instant loans at high fees, was classified as some other kind of institution. “These small errors, when they are systematic in some way – for example, if they are more common for establishments in poor neighborhoods – can result in the wrong conclusions.” By uncovering this, Small and his team were able to show that access to financial services in poorer neighborhoods was worse than the company data suggested.

Researchers should adopt a more critical attitude toward data that were created outside the academic research world, Small advises. “The biggest message is this: when we are using ‘big data’ from a private company, we are not doing the same thing as when we use experimental or survey data that we as researchers have produced for the purpose of scientific discovery,” he says. While companies usually use their data to measure their own business success or better understand customer interests, social science is generally concerned with broader societal relationships. At times, a more differentiated view of terminology helps as well, since terms can have different meanings depending on context. Thus, it is questionable to test a theory of friendship using large-scale Facebook data on “friends.” “What Facebook calls a ‘friend’ is not actually what most theorists of networks refer to as a ‘friend’, yet many are tempted to ignore this problem when they have an enormous trove of Facebook data.”

“We Should Care”

Small strongly argues that this distinction must be taken seriously: “The company doesn’t care whether the data are good for science; we should care.” This care often means not only analyzing the data quantitatively, but also checking and contextualizing them through qualitative research. The same applies to data from government agencies or non-governmental organizations.

The renowned sociologist is holder of a U Bremen Excellence Chair. The results from his large research project will probably occupy him for a long time to come: “The personal impact has been extraordinary,” he says. “The connection has dramatically expanded my professional networks in Europe, expanded the range of works I am aware of and connected to, and resulted in a decided intervention in an important research agenda. I have long had an interest in social science methods, and in the connections between large-scale and qualitative data. But we now have a major edited volume, forthcoming with Oxford University Press, where many of the most established and new-coming important researchers in this field are contributors.” As a result of the new connections, the conferences, collaborations with postdoctoral and graduate students, and the work on several published papers, he says his work on these issues has become much deeper.

Ideal Research Environment at SOCIUM

The pluralism of methods at SOCIUM at the University of Bremen also contributed to this – a distinctive feature of the social science institute that sets it apart from many others, as Betina Hollstein emphasizes. The sociology professor is co-leading the SOCIUM’s method center and also hosted Mario Small. At SOCIUM, qualitative research methods such as interviews are combined with quantitative surveys, such as large representative surveys and computer-assisted analyses of external big data sources, creating an ideal research environment for Small’s topic. While he expanded his network in Europe, SOCIUM gained a great deal of additional visibility, for example at a congress of the German Sociological Association, where the “Bremen Project” presented by Small received considerable attention in a keynote address. Hollstein describes the collaboration with Small as “absolutely pleasant, highly professional, and impressively effective.”

The cooperation with him also helped to launch a new initiative for a Collaborative Research Center (CRC). “Inspired by the American Voices project, which is carried out by colleagues at Stanford and Princeton, we want to conduct the first representative study with qualitative in-depth interviews in Germany,” says Hollstein. “At its core, it is about understanding the deep roots and mechanisms of current social change and social cohesion.” It would be the fourth consecutive time that the University of Bremen’s high-profile area Social Change, Social Policy, and the State has received funding for a Collaborative Research Center from the German Research Foundation. At a kick-off workshop with leading scientists, Hollstein experienced a great sense of new beginnings and enormous enthusiasm for the possibilities associated with this project. “These unique data can revolutionize the social sciences and open up entirely new analytical possibilities.”

Further Information

U Bremen Excellence Chair

zurück back


Also interesting…