Posts on Weibo Quantify Air Pollution in China

Posts on Weibo Quantify Air Pollution in China
Temple of Heaven in haze-covered Beijing on Feb. 24, 2014. STR/AFP/Getty Images
Updated:

The social media comments of people in China’s megacities can give environmental scientists information about local pollution levels.

A new study shows that the frequency of key words like dust, cough, haze, mask, and blue sky can be used as a proxy measurement of the amount of airborne particulate matter in the country’s urban centers at any given time.

Rice University researchers culled the words from millions of posts to China’s Weibo, a popular microblogging platform. Rice computer scientists collected the data for a study on Chinese censorship of social media three years ago.

Rice researchers decided upon a set of bigrams — key terms in the form of two consecutive symbols – related to air quality and searched for them in a set of 112 million Weibo posts gathered between 2011 and 2013. The terms were collected from a database of 40 million Chinese bigrams and used to correlate pollution with air-quality reports from US embassies in four megacities. The 10 bigrams above are only part of the set they used. (Rice)
Rice researchers decided upon a set of bigrams — key terms in the form of two consecutive symbols – related to air quality and searched for them in a set of 112 million Weibo posts gathered between 2011 and 2013. The terms were collected from a database of 40 million Chinese bigrams and used to correlate pollution with air-quality reports from US embassies in four megacities. The 10 bigrams above are only part of the set they used. Rice

“The big takeaway is that people grouse about air quality, and as it gets worse, people complain more,” says study leader Dan Wallach, a professor of computer science and electrical and computer engineering, whose lab collected the publicly available posts.

“When it’s really bad, it flattens out,” he says. “They’re as complained-out as they’re going to be. And if it gets good enough, few people complain. But there’s a zone in the middle where people really grouse, and we can measure that.

“A city the size of Beijing has air-quality meters, but not many,” Wallach says. “But if you have millions of people, you potentially have millions of meters. It’s a way of adding extra data.”

The researchers came up with a metric, the Air Discussion Index (ADI), based on the frequency with which pollution-related terms appeared in 112 million posts from 2011 to 2013 by residents of Beijing, Shanghai, Guangzhou, and Chengdu, where pollution is thought to be most troublesome in China.

“We looked at what words correlated with the pollution-level data we had,” Wallach says. “Some words that came out were nonsense. But others, like cough or wheeze, clearly had something to do with the conditions. Others, like blue sky, inversely correlated with the weather or pollution.”

Coauthor Aynne Kokas adds: “There’s a lot of discussion about censorship in Chinese media, including in Dan Wallach’s work, but one of the things we like about this particular study is that it relies on data that are almost never censored, the most innocuous terms of all. ”

“These terms are almost impossible to censor because of how common they are,” says Kokas, an assistant professor of media studies at the University of Virginia and an affiliate of Rice University’s Baker Institute for Public Policy. “As a result, we think this method is really effective not only in China but could also work in other contexts where there are heavily regulated social-media environments.”

The most accurate ADI readings were those for Beijing. When matched to hourly sensor readings from the US Embassy there, the researchers found the technique analyzed pollution levels with an accuracy of 88.2 percent. ADI performance for the other cities where the pollution isn’t as severe and Weibo posts not as plentiful wasn’t as accurate: 63 percent for Shanghai, 42 percent for Guangzhou, and 36 percent for Chengdu.

Particulate matter measuring less than 2.5 microns in diameter—about 30 times less than the diameter of the average human hair—is known to permanently damage the lungs. The United States’ air-quality standard for concentrations of this size of particulate matter is no more than 35 micrograms (millionths of a gram) per cubic meter over any 24-hour period and an annual average of no more than 12 micrograms per cubic meter.

Cohan says Chinese air pollution standards aren’t vastly different from those in the US, but the pollutant concentrations are. “Particulate matter levels in Beijing are often 10 times as high as we typically observe in US cities,” he says.

Wallach says he was surprised by the level of air-quality information that was found in the Weibo posts—data that he and colleagues had collected for a 2013 study on social media censorship.

“I was chatting with [study co-leader] Dan Cohan, and I said, ‘Hey, I’ve got all this data about China. Do you think we could measure something about pollution from all this data?’” Wallach recalls. “We all got together to see if the Weibo data told a story, and it turns out it did.”

Cohan says, “China is an ideal testbed, because the pollutant levels are so high and so variable that you can literally see the difference day to day. Still, I was surprised that social media posts could correlate so strongly with air-quality conditions.”

Wallach says it was interesting to note that the US Embassy measurements correlated well with the Chinese government’s own ground-level reporting on urban pollution. “Some people in China think their government might be lying to them about air quality, but based on what we found, that isn’t the case,” he says.

Coauthors of the paper include alumnus Zhu Tao, now at Google, and postdoctoral fellow Rui Zhang, now at the National Park Service. Cohan is an associate professor of civil and environmental engineering. The findings appear in PLOS ONE.

Source: Rice University