RFC: naming for FrontTransversal and Transversal ranges
Robert Jacques
sandford at jhu.edu
Sat May 2 11:28:20 PDT 2009
On Sat, 02 May 2009 10:58:08 -0400, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> wrote:
> Robert Jacques wrote:
>> I do scientific computing. Generally, I find it breaks down into two
>> parts: things under 4x4, for which value types are probably better, and
>> everything else, for which value types are to be avoided like the
>> plague. I'll often work with 100's mb of data with algorithms that take
>> minutes to hours to complete. So an unexpected copy is both hard to
>> find (am I slow/crashing because of my algorithm, or because of a
>> typo?) and rather harmful, because its big.
>
> I don't buy this. Undue copying is an issue that manifests itself
> locally, reproducibly, and debuggably. Contrast with long-distance
> coupling which is bound to hard to debug. You change a matrix here, and
> all of a sudden a matrix miles away has been messed up. Also, efficiency
> can be fixed with COW, whereas there is nothing you can do to fix the
> coupling aside from relentless and patient user education.
>
> Walter gave me a good argument (little did he know he was making a point
> destroying his.) Consider the progress we made when replacing char[]
> with string. Why? Because with char[] long-distance dependencies crop up
> easy and fast. With string you know there's never going to be a
> long-distance dependency. Why? Because unlike char[], content
> immutability makes string as good as a value.
>
> I remember the nightmare. I'd define a little structure:
>
> struct Sentence
> {
> uint id;
> char[] data;
> }
>
> Above my desk I have a big red bulb along with an audible alarm. As soon
> as I add the member "data", the bulb and the alarm go off. Sentence is
> now an invalid struct - I need to add at least constructor and a
> postblit. In the constructor I need to call .dup on the incoming data,
> and in the postblit I need to do something similar (or something more
> complicated if I want to be efficient). This is a clear example of code
> that is short and natural, yet does precisely the wrong thing. This is
> simply a ton of trouble, as experience with C++ has shown.
>
> I'm not even getting into calling functions that take a char[] and
> keeping fingers crossed ("I hope they won't mess with it") or .dup-ing
> prior to the call to eliminate any doubt (even though the function may
> anyway call .dup internally). string has marked huge progress towards
> people considering D seriously.
Andrei, you're perfectly right about strings. They've been a god send. But
strings are small, cheap to copy and tend to roam a lot. Scientific
computing (at least the kind I do) is very local, and uses large datasets
which are expensive to copy. I load data and set up the problem
parameters, then pass through a series of functions and write it out the
answer. When I first learned Matlab, my prof gave me a great rule of
thumb: if it's more than 10 lines long, you're probably doing it wrong. My
code bases tend to be small and focused, and use either my own code or
libraries I understand since they're logically pure (Ah, math functions (I
don't mess with rounding)).
And while finding bugs is relatively easy (go small, personal code bases)
tracking down a performance issue if you don't even know it exists is a
lot harder/slower.
>> But I've generally worked on making something else fast so more data
>> can be crunched, etc. Actual prototype work (for array/matrix based
>> stuff at least) is often done in Matlab, which I think uses COW
>> under-the-hood to provide value semantics. So I think anyone turning to
>> D to do scientific computing will know reference semantics, since
>> they'd already be familiar with them from C/C++, etc (Fortran?).
>> Although successfully attracting algorithm prototypes from
>> Matlab/python/mathmatica/R/etc is probably bigger issue than just the
>> container types, growing the pie was why the Wii won the last console
>> wars.
>
> Fortran uses pass by reference, but sort of gets away with it by
> assuming and promoting no aliasing throughout. Any two named values in
> Fortran can be assumed to refer to distinct memory. Also unless I am
> wrong A = B in Fortran does the right thing (copies B's content into A).
> Please confirm/infirm.
>
> For all I know, Matlab does the closest to "the real thing". Also, C++
> numeric/scientific libraries invariably use value semantics in
> conjunction with expression templates meant to effect loop fusion. Why?
> Because value semantics is the right thing and C++ is able to express
> it. I should note, however, that Perl Data Language uses reference
> semantics (http://tinyurl.com/derlrh).
Actually, all the big, high performance libraries I know use BLAS/Linpack
(i.e. reference semantics). Though there are a good number of middle range
libraries use expression templates.
Hmm... just went googling and found some neat work on mixing expression
templates with BLAS APIs, including using the GPU. Having you're cake and
eating it too is nice.
Though personally I like the current array op syntax, which is reference
based and is just enough to make me aware of doing expensive operations,
without being annoying:
a[] = b[] + c[];
> There's also a definite stench when one realizes that
>
> a = b;
>
> on one side, and
>
> a = b * 1;
>
> or
>
> a = b + 0;
>
> on the other, do completely different things.
Again, I'd use the []= operator
a = b + 0; // Error
a[] = b + 0; // Okay
So I don't see this as an issue.
> So what we're looking at is: languages that had the option chose value
> semantics. Languages that didn't, well, they did what they could.
Well, that's one thing I love about D. We have a value assignment
operator: []= in addition to a reference semantics assignment operator =
(well, for reference types at least). I actually use it to do format
copy/conversions in my own array class.
> I started rather neutral in this discussion but the more time goes by,
> the more things tilt towards value semantics.
Well, for me, the more this goes on the more I get interested in trying
value semantics. But if I'm honest, I'd just go on using array slices and
forget about the array containers. And the lack of good, fast lock-free
value semantics containers (even the concept of shared data without
reference semantics) is a border-line deal breaker, depending on who you
are (It's bad PR regardless).
Also, in a value semantics world, refs are third class citizens, but in a
reference semantic world, value semantics get their own assignment
operator ( []= ), and by convention, their own property ( .dup )
More information about the Digitalmars-d
mailing list