Generalized Linear Models and Stochastic Gradient Descent in D

Sun Jun 11 04:30:03 PDT 2017

On Sunday, 11 June 2017 at 10:21:03 UTC, data pulverizer wrote:
>> Speaking of not losing your audience: give a link to the NRA 
>> and/or a brief explanation of how it generalises to higher 
>> dimensions (graph or animation for the 2D case would be good, 
>> perhaps take something from wikipedia)
>
> NRA? Don't understand that acronym with reference to the 
> article.
>

Sorry Newton-Raphson Algorithm.

> I shall mention the generalisation of the equations over 
> multiple observations.
>
> I agree that there is a danger of loosing the audience and 
> perhaps some graphics would be nice.
>

I suspect that most of the people that would know what 
Newton-Raphson is took it in undergrad math and probably didn't 
cover multidimensional cases (I didn't).
>
>> I dont think it is necessary to show the signature of the BLAS 
>> and Lapacke function, just a short description and link should 
>> suffice. also any reason you don't use GLAS?
>
> True
>
>> I would just have gParamCalcs as its own function (unless you 
>> are trying to show off that particular feature of D).
>
> I think its easier to use mixin in since the same code is 
> required since mu and k are used in gLogLik, gGradient, 
> gCurvature and xB is also used in gLogLik. Showing off the use 
> of mixins is also a plus.

Then I would state why and give a brief explanation of what that 
mixin does.
>
>> omit the parentheses of .array() and .reduce()
>
> Yes
>
>> You use .array a lot: how much of that is necessary? I dont 
>> think it is in zip(k.repeat().take(n).array(), x, y, mu)
>
> Yes, I should remove the points where .array() is not necessary
>
>> `return(curv);` should be `return curve;`
>
> Thanks that's my R bleeding into my D! So should be:
>
> return curv;
>
>> Any reason you don't square the tolerance rather than sqrt the 
>> parsDiff?
>
> The calculation is the L2 norm which ends in sqrt later used 
> for the stopping criterion as in the equation
>
>> for(int i = 0; i < nepochs; ++i) => foreach(i; iota(epochs))?
>
> hmm potato
>
>> zip(pars, x).map!(a => a[0]*a[1]).reduce!((a, b) => a + b); 
>> =>dot(pars,x)?
>
> Fair point. When I started writing the article I considered 
> attempting to write the whole thing in D functional style - 
> with no external libraries. In the end I didn't want to write a 
> matrix inverse in functional style so I rolled it back somewhat 
> and started adding C calls which is more sensible.
>
>> Theres a lot of code and text, some images and graphs would be 
>> nice, particularly in combination with a more real world 
>> example use case.
>
> I would agree that the article does need be less austere - 
> however the article is about the GLM algorithm rather than its 
> uses. I think the analyst should know whether they need a GLM 
> or not - there are many sources that explain applications of 
> GLM - I could perhaps reference some.

Fair enough, I'm just thinking about breadth of audience.

>
>> Factor out code like a[2].repeat().take(a[1].length) to a 
>> function, perhaps use some more BLAS routines for things like
>>
>> .map!( a =>
>>                         zip(a[0].repeat().take(a[1].length),
>>                             a[1],
>>                             a[2].repeat().take(a[1].length),
>>                             a[3].repeat().take(a[1].length))
>>                         .map!(a => -a[2]*(a[0]/a[3])*a[1])
>>                         .array())
>>                     .array();
>>
>> to make it more obvious what the calculation is doing.
>
> Yes
>
>> It might not be the point of the article but it would be good 
>> to show some performance figures, I'm sure optimisation tips 
>> will be forthcoming.
>
> Since I am using whatever cblas algorithm is installed I'm not 
> sure that benchmarks would really mean much. Ilya raised a good 
> point about the amount of copying I am doing - as I was writing 
> it I thought so too. I address this below.
>
> Thanks again for taking time to review the article!
>

No problem, always happy to help.

> My main take way from writing this article is that it would be 
> quite straightforward to write a small GLM package in D - I'd 
> use quite a different approach with structs/classes GLM objects 
> to remove the copying issues and to give a consistent interface 
> to the user.
>
> An additional takeaway for me was that I also found the use of 
> array operations like
>
> a[] = b[]*c[]
>
> or
>
> d[] -= e[] -f
>
> created odd effects in my calculations the outputs were wrong 
> and for ages I didn't know why but later ended up removing 
> those expressions from the code all together - which remedied 
> the problem.

Hmm, did you report those? They _should_  just work.