Abstract:
In this paper, we present a curated collection of 2833 C# solutions taken from Github. We encode the data in a new intermediate representation (IR) that facilitates further analysis by restricting the complexity of the syntax tree and by avoiding implicit information. The dataset is intended as a standardized input for research on recommendation systems for software engineering, but is also useful in many other areas that analyze source code.
Resources
BibTeX
@inproceedings {PANM16, title = {{A Dataset of Simplified Syntax Trees for C#}}, author = {Proksch, Sebastian and Amann, Sven and Nadi, Sarah and Mezini, Mira}, booktitle = {{Proceedings of the 13th International Conference on Mining Software Repositories}}, series = {MSR 2016}, year = {2016}, doi = {10.1145/2901739.2903507}, url = {http://dx.doi.org/10.1145/2901739.2903507}, }