Librarians trove through Twitter for social media researchers

by Lauren Grady

Correction appended

While hashtags and tweets seem to disappear amid the endless torrent of Twitter conversations, the staff at Gelman Library is preserving them for history.

Laura Wrubel and Daniel Chudnov, who help run the library’s technology arm, developed a tool, now in its early stages, that gathers data from Twitter to simplify social media research for GW professors.

“The data is amazing. A lot of people look at it and think, 'Why do we need a record of millions of people telling us what they had for lunch?' But it’s a lot more than just that,” Chudnov, Gelman’s director of scholarly technology, said. “It’s a wealth of material that people will study for years and years and years.”

Chudnov has led the project since coming to GW in fall 2011 after working as the lead developer on the Library of Congress’ own Twitter archiving initiative – an effort that dwarfs GW’s project by creating an archive of hundreds of billions of public tweets. GW’s web-based software package stores specific Twitter handles or keywords that researchers are studying, which they can then transfer to analysis tools such as Excel.

By preserving election tweets or hashtags like #BindersFullOfWomen, the librarians are able to catch them before they get lost amid a cacophony of conversations.

Before, Chudnov said, researchers and graduate students studying study topics like the tweeting patterns of politicians or media organizations had to sort tweets by hand and enter them into Excel. Because Twitter doesn’t allow access to its historical data, researchers had to shell out thousands of dollars to private vendors to access archived tweets.

“When I heard that, I thought, 'We can do better than that,' ” Chudnov said. “One of the goals of librarians is to save you time. There’s no reason students and faculty members researching social media should have to do this work by hand.”

Albert May, an associate professor of media and public affairs, said he plans to use the software in his digital reporting class this semester and for his research on how Congress uses social media.

“The potential is significant,” May said. “The University has taken a significant step in creating a programming team within library services to assist researchers, who often do not have access to such technical support.”

The work is part of the library’s build-up as a digital research center – not just a paper book warehouse. The digitally focused staff Chudnov helps lead only started to take shape five years ago.

This project jumps off the growing number of researchers studying social media, as almost 5,000 theses and dissertations nationally have referenced Twitter in the last two years, according to the librarians’ research.

“It started out as a prototype, but we know this is where libraries need to go. It makes sense for libraries to support this kind of research,” Wrubel said.

Next on the project agenda is archiving tweets related to GW, a school recognized for the heavy tweeting habits of its students, professors and administrators.

Chudnov said his team is working with the University archivist to preserve tweets from accounts like @GWtweets, @GWCollegeDems and @gwhatchet for history.

“By starting to collect this now, we’ll be able to provide a service that few other university archives can,” Chudnov said. “They’re are all fair game for University archiving.”

The staff is trying to grow the software by spreading the word to researchers studying social media, not only in journalism, but also in economics, psychology and linguistics, Wrubel said. The staff has not yet determined how many professors overall would use this tool, she added.

With this technological initiative, everything released by the library will be open-source, meaning it won’t be closed off behind GW’s digital wall, but rather be open to everyone. The Twitter data collection software is available on the sharing website GitHub.

“You have to give back to the community, to academia and to the whole world, because when you do everything internally and you don’t release it, it’s a waste of money,” Karim Boughida, associate University librarian for digital initiatives and content management, said.

Cory Weinberg contributed to this report.

This article was updated Feb. 4, 2013 to reflect the following:
Due to an editing error, The Hatchet incorrectly reported that professor Albert May studies how professors use social media. He actually studies how Congress uses social media.

View the policies on commenting here.

5 Comments

  1. TsvetomirTodorov says:

    Well, search engines are alot better tool for research, but social media is filled with people who share – they can guide you to appropriate sources, which makes social media more valuable for research. The problem is that it is stretching to many directions – you will have to do some digging before you find something that is useful, while in search engines it is usally way easier.

    Tsvetomir

    social-media-training-courses.co.uk

  2. Ed Summers says:

    Thanks so much for this writeup. Making social media data easily (and persistently) available for researchers seems like a real growth area for libraries and archives, and GW is leading the charge. Making the software available on Github means that other like minded institutions can do the same thing. It’s a real gift to the library/archives community, and to the research community at large. Keep up the good work!

  3. OrlyTaitz says:

    “But it’s a lot more than just that.” “The potential is significant.”

    But you never say why it’s more than just that or why it’s significant. “Research” and Twitter? Please.

  4. Hmmmm says:

    @Orly…granted it’s not explicit, but before whole-heartedly poo-pooing it, consider why the researchers have to pay to have this research done.

Respond

required

required, will not be published

Please note that the following input field is an attempt at combatting spam. Please do not fill in this field if you are not a spam bot!