Thursday, November 5, 2015

Github's Big Data Adaptor: An Eclipse Plugin

[Download Pdf]

Type: Publication

Venue: Conference of the Centre for Advanced Studies on Collaborative Research (CASCON) 2015, November 2-4, Markham, ON, Canada

Authors: Ali Sajedi Badashian*, Vraj Shah**, Eleni Stroulia*
* Department of Computing Science, University of Alberta, Canada
** Indian Institute of Technology

Abstract
The data of GitHub, the most popular code-sharing platform, fits the characteristics of “big data” (Volume, Variety and Velocity). To facilitate studies on this huge GitHub data volume, the GHTorrent web-site publishes a MYSQL dump of (some) GitHub data quarterly. Unfortunately, developers using these published data dumps face challenges with respect to the time required to parse and ingest the data, the space required to store it, and the latency of their queries. To help address these challenges, we developed a data adaptor as an Eclipse plugin, which efficiently handles this dump. The plugin offers an interactive interface through which users can explore and select any field in any table. After extracting the data selected by the user, the parser exports it in easyto-use spreadsheets. We hope that using this plugin will facilitate further studies on the GitHub data as a whole.

Keywords:
GitHub, Mining software repositories, Eclipse Plugin, Software tools, Big data

No comments:

Post a Comment