A Dataset of Simplified Syntax Trees for C#

by Sebastian Proksch, Sven Amann, Sarah Nadi, and Mira Mezini


In this paper, we present a curated collection of 2833 C# solutions taken from Github. We encode the data in a new intermediate representation (IR) that facilitates further analysis by restricting the complexity of the syntax tree and by avoiding implicit information. The dataset is intended as a standardized input for research on recommendation systems for software engineering, but is also useful in many other areas that analyze source code.



@inproceedings {PANM16,
  title = {{A Dataset of Simplified Syntax Trees for C#}},
  author = {Proksch, Sebastian and Amann, Sven and Nadi, Sarah and Mezini, Mira},
  booktitle = {{Proceedings of the 13th International Conference on Mining Software Repositories}},
  series = {MSR 2016},
  year = {2016},
  doi = {10.1145/2901739.2903507},
  url = {http://dx.doi.org/10.1145/2901739.2903507},