Volunteer for research project?

Thu Feb 21 22:50:29 PST 2013

On 2/21/2013 10:00 PM, H. S. Teoh wrote:
> On Fri, Feb 22, 2013 at 06:51:53AM +0100, Maxim Fomin wrote:
>> On Thursday, 21 February 2013 at 07:03:08 UTC, Brad Roberts wrote:
>>> Would any of you be interested in helping out (read that as "doing")
>>> a research / data mining project for us?  I'd love to take all of the
>>> regressions this year (or for the last year, or whatever period of
>>> time can be reasonably accomplished) and track them back to which
>>> commit introduced each of them (already done for some of them).  From
>>> there, I'd like to see what sort of correlations can be found.  Is
>>> there a particular area of code that's responsible for them.  Is
>>> there a particular feature (spread across a lot of files, maybe)
>>> that's responsible.  Etc.
>>>
>>> Maybe it's all over the map.  Maybe it will highlight one or a few
>>> areas to take a harder look at.
>>>
>>> Anyone interested?
>>>
>>> Thanks,
>>> Brad
>>
>> It sounds interesting, but what are you expecting to found? And how
>> much are you sure you can found something? I would expect that often
>> code which fixes some feature breaks the same feature in another
>> aspect of functioning which is quite obvious. Sometimes one code
>> relies implicitly on functioning of other code, so when you change the
>> the latter, the former stops working correctly. You provide example
>> with spreading across several files - how does knowing this helps in
>> reducing regressions?
> 
> I would think he's referring to issues that are filed in the bugtracker.
> Obviously, we have no way of knowing if a code change broke something if
> nobody found any bug afterwards!
> 
> So I'm thinking it's probably a matter of going through the regression
> bugs in the bugtracker, and making test cases to reproduce them, and
> then use git bisect to figure out which commit introduced the problem.
> 
> 
> T
> 

Pretty much that.  (Nearly) every bug comes with a test case already.  The part that will be work is taking that test
case and finding the exact commit that broke it.  By definition, a regression once worked and something changed that
broke it.  My hope is that one or more people can spend some time going through each regression report in bugzilla and
tracking down the exact commit for each.

What will be uncovered by the effort?  Who knows.  It's better to not try to anticipate or predict since that can bias
the analysis.  The entire point of the exercise is to find out.  If there is one or move obvious or detectible clusters,
that gives us some interesting data.  It might well point out a part of the code that's particularly sensitive to
change.  Or is very poorly covered by the test suite.  Or is flawed in some other way.  Regardless, if there are
clusters, it's worth some study and pondering to consider what can be done to make it/them NOT hot beds of regressions.

It's a research project.  It might turn out to yield nothing useful.  That's certainly a risk.  I suspect it won't turn
out to be fruitless.

To seed the effort, here's all the regression bugs that have changed since the beginning of the year:

http://d.puremagic.com/issues/buglist.cgi?chfieldto=Now&query_format=advanced&chfieldfrom=2013-01-01&bug_severity=regression&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=RESOLVED&bug_status=VERIFIED&bug_status=CLOSED