We love data science and cool data science projects. If you’re a applying for our free data science fellowship and looking to propose a data science project here are four project ideas.
GitHub is a great source of data on how engineers write code. A recent post found discrimination against Pull Requests submitted by women on GitHub, although perhaps that study could have been better. But there are lots of other ideas to pursue. We can easily learn an n-gram classifier on whether a line of code is a comment or not and search for commented out code. Are repos by academics more likely to have commented out code? Are they more likely to violate lint rules? Additionally, it would be interesting to analyze commits that are in response to bug fixes to predict which lines of code bugs are more likely to occur in.