Generalized Linear Models and Stochastic Gradient Descent in D
Nicholas Wilson via Digitalmars-d-announce
digitalmars-d-announce at puremagic.com
Sun Jun 11 04:30:03 PDT 2017
On Sunday, 11 June 2017 at 10:21:03 UTC, data pulverizer wrote:
>> Speaking of not losing your audience: give a link to the NRA
>> and/or a brief explanation of how it generalises to higher
>> dimensions (graph or animation for the 2D case would be good,
>> perhaps take something from wikipedia)
>
> NRA? Don't understand that acronym with reference to the
> article.
>
Sorry Newton-Raphson Algorithm.
> I shall mention the generalisation of the equations over
> multiple observations.
>
> I agree that there is a danger of loosing the audience and
> perhaps some graphics would be nice.
>
I suspect that most of the people that would know what
Newton-Raphson is took it in undergrad math and probably didn't
cover multidimensional cases (I didn't).
>
>> I dont think it is necessary to show the signature of the BLAS
>> and Lapacke function, just a short description and link should
>> suffice. also any reason you don't use GLAS?
>
> True
>
>> I would just have gParamCalcs as its own function (unless you
>> are trying to show off that particular feature of D).
>
> I think its easier to use mixin in since the same code is
> required since mu and k are used in gLogLik, gGradient,
> gCurvature and xB is also used in gLogLik. Showing off the use
> of mixins is also a plus.
Then I would state why and give a brief explanation of what that
mixin does.
>
>> omit the parentheses of .array() and .reduce()
>
> Yes
>
>> You use .array a lot: how much of that is necessary? I dont
>> think it is in zip(k.repeat().take(n).array(), x, y, mu)
>
> Yes, I should remove the points where .array() is not necessary
>
>> `return(curv);` should be `return curve;`
>
> Thanks that's my R bleeding into my D! So should be:
>
> return curv;
>
>> Any reason you don't square the tolerance rather than sqrt the
>> parsDiff?
>
> The calculation is the L2 norm which ends in sqrt later used
> for the stopping criterion as in the equation
>
>> for(int i = 0; i < nepochs; ++i) => foreach(i; iota(epochs))?
>
> hmm potato
>
>> zip(pars, x).map!(a => a[0]*a[1]).reduce!((a, b) => a + b);
>> =>dot(pars,x)?
>
> Fair point. When I started writing the article I considered
> attempting to write the whole thing in D functional style -
> with no external libraries. In the end I didn't want to write a
> matrix inverse in functional style so I rolled it back somewhat
> and started adding C calls which is more sensible.
>
>> Theres a lot of code and text, some images and graphs would be
>> nice, particularly in combination with a more real world
>> example use case.
>
> I would agree that the article does need be less austere -
> however the article is about the GLM algorithm rather than its
> uses. I think the analyst should know whether they need a GLM
> or not - there are many sources that explain applications of
> GLM - I could perhaps reference some.
Fair enough, I'm just thinking about breadth of audience.
>
>> Factor out code like a[2].repeat().take(a[1].length) to a
>> function, perhaps use some more BLAS routines for things like
>>
>> .map!( a =>
>> zip(a[0].repeat().take(a[1].length),
>> a[1],
>> a[2].repeat().take(a[1].length),
>> a[3].repeat().take(a[1].length))
>> .map!(a => -a[2]*(a[0]/a[3])*a[1])
>> .array())
>> .array();
>>
>> to make it more obvious what the calculation is doing.
>
> Yes
>
>> It might not be the point of the article but it would be good
>> to show some performance figures, I'm sure optimisation tips
>> will be forthcoming.
>
> Since I am using whatever cblas algorithm is installed I'm not
> sure that benchmarks would really mean much. Ilya raised a good
> point about the amount of copying I am doing - as I was writing
> it I thought so too. I address this below.
>
> Thanks again for taking time to review the article!
>
No problem, always happy to help.
> My main take way from writing this article is that it would be
> quite straightforward to write a small GLM package in D - I'd
> use quite a different approach with structs/classes GLM objects
> to remove the copying issues and to give a consistent interface
> to the user.
>
> An additional takeaway for me was that I also found the use of
> array operations like
>
> a[] = b[]*c[]
>
> or
>
> d[] -= e[] -f
>
> created odd effects in my calculations the outputs were wrong
> and for ages I didn't know why but later ended up removing
> those expressions from the code all together - which remedied
> the problem.
Hmm, did you report those? They _should_ just work.
More information about the Digitalmars-d-announce
mailing list