Software Mining Studies: Goals, Approaches, Artifacts, and Replicability

by Sven Amann, Stefanie Beyer, Katja Kevic, and Harald C. Gall


The mining of software archives has enabled new ways for increasing the productivity in software development: Analyzing software quality, mining project evolution, investigating change patterns and evolution trends, mining models for development processes, developing methods of integrating mined data from various historical sources, or analyzing natural language artifacts in software repositories, are examples of research topics. Software repositories include various data, ranging from source control systems, issue tracking systems, artifact repositories such as requirements, design and architectural documentation, to archived communication between project members. Practitioners and researchers have recognized the potential of mining these sources to support the maintenance of software, to improve their design or architecture, and to empirically validate development techniques or processes. We revisited software mining studies that were published in recent years in the top venues of software engineering, such as ICSE, ESEC/FSE, and MSR. In analyzing these software mining studies, we highlight different viewpoints: pursued goals, state-of-the-art approaches, mined artifacts, and study replicability. To analyze the mining artifacts, we (lexically) analyzed research papers of more than a decade. In terms of replicability we looked at existing work in the field in mining approaches, tools, and platforms. We address issues of replicability and reproducibility to shed light onto challenges for large-scale mining studies that would enable a stronger conclusion stability.



@inbook {ABKG15,
  title = {{Software Mining Studies: Goals, Approaches, Artifacts, and Replicability}},
  author = {Amann, Sven and Beyer, Stefanie and Kevic, Katja and Gall, Harald C.},
  booktitle = {{Software Engineering: International Summer Schools, LASER 2013-2014, Elba, Italy, Revised Tutorial Lectures}},
  pages = {121--158},
  year = {2015},
  doi = {10.1007/978-3-319-28406-4_5},
  url = {},
  publisher = {Springer International Publishing},
  editor = {Meyer, Bertrand and Nordio, Martin}