Librarians trove through Twitter for social media researchers

Correction appended

While hashtags and tweets seem to disappear amid the endless torrent of Twitter conversations, the staff at Gelman Library is preserving them for history.

Laura Wrubel and Daniel Chudnov, who help run the library’s technology arm, developed a tool, now in its early stages, that gathers data from Twitter to simplify social media research for GW professors.

“The data is amazing. A lot of people look at it and think, ‘Why do we need a record of millions of people telling us what they had for lunch?’ But it’s a lot more than just that,” Chudnov, Gelman’s director of scholarly technology, said. “It’s a wealth of material that people will study for years and years and years.”

Chudnov has led the project since coming to GW in fall 2011 after working as the lead developer on the Library of Congress’ own Twitter archiving initiative – an effort that dwarfs GW’s project by creating an archive of hundreds of billions of public tweets. GW’s web-based software package stores specific Twitter handles or keywords that researchers are studying, which they can then transfer to analysis tools such as Excel.

By preserving election tweets or hashtags like #BindersFullOfWomen, the librarians are able to catch them before they get lost amid a cacophony of conversations.

Before, Chudnov said, researchers and graduate students studying study topics like the tweeting patterns of politicians or media organizations had to sort tweets by hand and enter them into Excel. Because Twitter doesn’t allow access to its historical data, researchers had to shell out thousands of dollars to private vendors to access archived tweets.

“When I heard that, I thought, ‘We can do better than that,’ ” Chudnov said. “One of the goals of librarians is to save you time. There’s no reason students and faculty members researching social media should have to do this work by hand.”

Albert May, an associate professor of media and public affairs, said he plans to use the software in his digital reporting class this semester and for his research on how Congress uses social media.

“The potential is significant,” May said. “The University has taken a significant step in creating a programming team within library services to assist researchers, who often do not have access to such technical support.”

The work is part of the library’s build-up as a digital research center – not just a paper book warehouse. The digitally focused staff Chudnov helps lead only started to take shape five years ago.

This project jumps off the growing number of researchers studying social media, as almost 5,000 theses and dissertations nationally have referenced Twitter in the last two years, according to the librarians’ research.

“It started out as a prototype, but we know this is where libraries need to go. It makes sense for libraries to support this kind of research,” Wrubel said.

Next on the project agenda is archiving tweets related to GW, a school recognized for the heavy tweeting habits of its students, professors and administrators.

Chudnov said his team is working with the University archivist to preserve tweets from accounts like @GWtweets, @GWCollegeDems and @gwhatchet for history.

“By starting to collect this now, we’ll be able to provide a service that few other university archives can,” Chudnov said. “They’re are all fair game for University archiving.”

The staff is trying to grow the software by spreading the word to researchers studying social media, not only in journalism, but also in economics, psychology and linguistics, Wrubel said. The staff has not yet determined how many professors overall would use this tool, she added.

With this technological initiative, everything released by the library will be open-source, meaning it won’t be closed off behind GW’s digital wall, but rather be open to everyone. The Twitter data collection software is available on the sharing website GitHub.

“You have to give back to the community, to academia and to the whole world, because when you do everything internally and you don’t release it, it’s a waste of money,” Karim Boughida, associate University librarian for digital initiatives and content management, said.

Cory Weinberg contributed to this report.

This article was updated Feb. 4, 2013 to reflect the following:
Due to an editing error, The Hatchet incorrectly reported that professor Albert May studies how professors use social media. He actually studies how Congress uses social media.

The Hatchet has disabled comments on our website. Learn more.