A Dataset of Simplified Syntax Trees for C#

by Sebastian Proksch, Sven Amann, Sarah Nadi, and Mira Mezini

Abstract: In this paper, we present a curated collection of 2833 C# solutions taken from Github. We encode the data in a new intermediate representation (IR) that facilitates further analysis by restricting the complexity of the syntax tree and by avoiding implicit information. The dataset is intended as a standardized input for research on recommendation systems for software engineering, but is also useful in many other areas that analyze source code.



