Generalized Linear Models and Stochastic Gradient Descent in D
data pulverizer via Digitalmars-d-announce
digitalmars-d-announce at puremagic.com
Sun Jun 11 03:21:03 PDT 2017
It is obvious that you took time and care to review the article.
Thank you very much!
On Sunday, 11 June 2017 at 00:40:23 UTC, Nicholas Wilson wrote:
>
> Maybe its the default rendering but the open math font is hard
> to read as the sub scripts get vertically compressed.
>
> My suggestions:
>
> Distinguish between the likelihood functions for gamma and
> normal rather than calling them both L(x). Maybe L subscript
> uppercase gamma and L subscript N?
>
Good idea!
> Links to wikipedia for the technical terms (e.g. dispersion,
> chi squared, curvature), again the vertical compression of the
> math font does not help here (subscripts of fractions) . It
> will expand your audience if they don't get lost in the
> introduction!
Yes I should definitely add clarification references. I should
probably also add that curvature is also the Hessian, however I
recently developed a dislike for calling mathematical constructs
after people or odd names I was serious thinking about calling
Newton-Raphson something else but that might be taking it too far.
I'll end up writing the final in html so I can add a decent html
latex package to modify the size of the equations.
> Speaking of not losing your audience: give a link to the NRA
> and/or a brief explanation of how it generalises to higher
> dimensions (graph or animation for the 2D case would be good,
> perhaps take something from wikipedia)
NRA? Don't understand that acronym with reference to the article.
I shall mention the generalisation of the equations over multiple
observations.
I agree that there is a danger of loosing the audience and
perhaps some graphics would be nice.
> I dont think it is necessary to show the signature of the BLAS
> and Lapacke function, just a short description and link should
> suffice. also any reason you don't use GLAS?
True
> I would just have gParamCalcs as its own function (unless you
> are trying to show off that particular feature of D).
I think its easier to use mixin in since the same code is
required since mu and k are used in gLogLik, gGradient,
gCurvature and xB is also used in gLogLik. Showing off the use of
mixins is also a plus.
> omit the parentheses of .array() and .reduce()
Yes
> You use .array a lot: how much of that is necessary? I dont
> think it is in zip(k.repeat().take(n).array(), x, y, mu)
Yes, I should remove the points where .array() is not necessary
> `return(curv);` should be `return curve;`
Thanks that's my R bleeding into my D! So should be:
return curv;
> Any reason you don't square the tolerance rather than sqrt the
> parsDiff?
The calculation is the L2 norm which ends in sqrt later used for
the stopping criterion as in the equation
> for(int i = 0; i < nepochs; ++i) => foreach(i; iota(epochs))?
hmm potato
> zip(pars, x).map!(a => a[0]*a[1]).reduce!((a, b) => a + b);
> =>dot(pars,x)?
Fair point. When I started writing the article I considered
attempting to write the whole thing in D functional style - with
no external libraries. In the end I didn't want to write a matrix
inverse in functional style so I rolled it back somewhat and
started adding C calls which is more sensible.
> Theres a lot of code and text, some images and graphs would be
> nice, particularly in combination with a more real world
> example use case.
I would agree that the article does need be less austere -
however the article is about the GLM algorithm rather than its
uses. I think the analyst should know whether they need a GLM or
not - there are many sources that explain applications of GLM - I
could perhaps reference some.
> Factor out code like a[2].repeat().take(a[1].length) to a
> function, perhaps use some more BLAS routines for things like
>
> .map!( a =>
> zip(a[0].repeat().take(a[1].length),
> a[1],
> a[2].repeat().take(a[1].length),
> a[3].repeat().take(a[1].length))
> .map!(a => -a[2]*(a[0]/a[3])*a[1])
> .array())
> .array();
>
> to make it more obvious what the calculation is doing.
Yes
> It might not be the point of the article but it would be good
> to show some performance figures, I'm sure optimisation tips
> will be forthcoming.
Since I am using whatever cblas algorithm is installed I'm not
sure that benchmarks would really mean much. Ilya raised a good
point about the amount of copying I am doing - as I was writing
it I thought so too. I address this below.
Thanks again for taking time to review the article!
My main take way from writing this article is that it would be
quite straightforward to write a small GLM package in D - I'd use
quite a different approach with structs/classes GLM objects to
remove the copying issues and to give a consistent interface to
the user.
An additional takeaway for me was that I also found the use of
array operations like
a[] = b[]*c[]
or
d[] -= e[] -f
created odd effects in my calculations the outputs were wrong and
for ages I didn't know why but later ended up removing those
expressions from the code all together - which remedied the
problem.
More information about the Digitalmars-d-announce
mailing list