Abstract: In this paper, we present a curated collection of 2833 C# solutions taken from Github. We encode the data in a new intermediate representation (IR) that facilitates further analysis by restricting the complexity of the syntax tree and by avoiding implicit information. The dataset is intended as a standardized input for research on recommendation systems for software engineering, but is also useful in many other areas that analyze source code.
Resources
BibTeX
@inproceedings {PANM16,
title = {{A Dataset of Simplified Syntax Trees for C#}},
author = {Proksch, Sebastian and Amann, Sven and Nadi, Sarah and Mezini, Mira},
booktitle = {{Proceedings of the 13th International Conference on Mining Software Repositories}},
series = {MSR 2016},
year = {2016},
doi = {10.1145/2901739.2903507},
url = {http://dx.doi.org/10.1145/2901739.2903507},
}