Chuck Bednar for redOrbit.com Your Universe Online
Behavioral scientists and other academic researchers are increasingly turning to social media to find subjects for their studies, but doing so could lead to erroneous results with serious implications, computer experts from Carnegie Mellon University (CMU) in Pittsburgh and McGill University in Montreal report in a newly published study.
According to the authors of the paper, which was published in the November 28 edition of the journal Science, social media appears attractive to researchers behind behavioral studies because it gives them a quick and inexpensive way to gather massive amounts of data about peoples thoughts and feelings. Some of those dataset may be misleading, however, they explained.
In their paper, Carnegie Mellons Juergen Pfeffer and McGill Universitys Derek Ruths note that thousands of research papers each year are based on information gathered through social media. However, they contend that scientists need to find ways for correcting the inherent biases in information gathered from the likes of Facebook and Twitter, or at the very least acknowledge that there could be issues with such data.
Not everything that can be labeled as Big Data is automatically great, said Pfeffer, an assistant research professor in CMUs Institute for Software Research, explained in a statement. He said that while many researchers believe that if they can gather a large enough dataset, it will overcome any potential biases or distortions inherent in that data, but the old adage of behavioral research still applies: Know Your Data.
He and Ruths, an assistant professor of computer science at McGill, said that even though the problem is far from insignificant, social media is still difficult to resist as a source of data. People want to say something about whats happening in the world and social media is a quick way to tap into that, Pfeffer said. For example, following 2013s Boston Marathon bombing, he said he collected 25 million tweets related to the topic in just two weeks time.
The main problem, according to the researchers, is the attempt for study authors to generalize their results to a broad population. However, social media sites often have significant population biases in that different social networks attract different types of users. For example, Pinterests membership is primarily females aged 25 to 34 with average household incomes of $100,000, while Instagram appeals mostly to adults under the age of 29, African-Americans, Latinos, women and urban dwellers, Pfeffer and Ruths explained.
Other possible issues include the fact that publically available data feeds may not necessarily provide an accurate representation of the platforms overall data; the design of a social media platform may impact how users behave, and what behavior can be measured (for example, the lack of a dislike button on Facebook makes it harder to detect negative responses to content); and large numbers of bots and spammers may masquerade as human users, and thus their input may mistakenly be incorporated into behavior-related measurements and predictions.
Researchers often report results for groups of easy-to-classify users, topics, and events, making new methods seem more accurate than they actually are, McGill Universitys Chris Chipello explained. For instance, efforts to infer political orientation of Twitter users achieve barely 65 percent accuracy for typical users even though studies (focusing on politically active users) have claimed 90 percent accuracy.
The common thread in all these issues is the need for researchers to be more acutely aware of what theyre actually analyzing when working with social media data, Ruths noted, comparing the issue to the telephone survey errors that led to the infamous Dewey Defeats Truman headline during the Presidential election of 1948.
Read more:
Pitfalls Of Using Social Media For Scientific Studies Examined