persistent byLine

Nick Treleaven ntrel-public at yahoo.co.uk
Tue Jul 23 09:13:10 PDT 2013


(resending from the forum, original didn't arrive for some reason)

On 23/07/2013 00:28, Jonathan M Davis wrote:
> On Monday, July 22, 2013 13:08:05 Nick Treleaven wrote:
>> I made a pull request to re-enable using byLine!(char, 
>> immutable char).
>> (Note this compiled in the current release, but didn't work 
>> properly
>> AFAICT. It did work by commit 97cec33^).
>>
>> https://github.com/D-Programming-Language/phobos/pull/1418
>>
>> Using that allows us to drop the map!(l => l.idup) part from 
>> the above
>> snippet. The new syntax isn't much better, but it can also be 
>> more
>> efficient (as it caches front). I have an idea how to improve 
>> the
>> syntax, but I'll omit it for this post.
>
> I agree with monarch in that we really shouldn't try and mess 
> with byLine like
> this. It would just be cleaner to come up with a new function 
> for this, though
> I confess that part of me thinks that it's better to just use 
> map!(a =>
> a.idup)(), because it avoids duplicating functionality. It is 
> arguably a bit
> ugly though.

I think I'll close that PR then. I reiterate that the 
readText.splitter approach is perhaps usually more efficient than 
either byLine/map/idup or byLine!(char, immutable char). Unless 
e.g. byLineDup was implemented so it allocated more than one line 
at once.

>> I've since thought that if most or all lines in a file need to 
>> be
>> persistent, it may be more efficient to use
>> readText(filename).splitLines, because that doesn't need to 
>> allocate for
>> each line.
>>
>> There are two enhancements for that approach:
>> 1. readText should accept a File, not just a filename, so we 
>> can use stdin.
>
> I'm opposed to this. I don't think that std.file should be 
> using std.stdio.File
> at all. What we really need is for std.io to be finished, which 
> will revamp
> std.stdio and give us streams and the like. And std.file really 
> is not designed
> around using stdin - and shouldn't be IMHO. It's for operating 
> on actual files.

Yes, I meant add std.stdio.File.readText. Would that be OK to add 
now, or is std.io likely to be added relatively soon?

>> 2. splitLines makes an array. It would be more flexible to 
>> have an input
>> range created from a function e.g. lineSplitter.
>
> splitter will do the job just fine as long as you don't care 
> about /r/n -

I currently use Windows ;-)

> though we should arguably come up with a solution that works 
> with /r/n
> (assuming that something in std.range or std.algorithm doesn't 
> already do it,
> and I'm just not thinking of it at the moment).

splitter does work:
"splitting\r\nlines\r\nworks\r\n!".splitter("\r\n").writeln();

I can live with that. However...

The nice thing about splitLines is that it doesn't care what kind 
of line endings are in the file, i.e. you don't need to tell it 
in advance. You don't have to pass it std.ascii.newline as a 
separator in order to get portability.

Portability should be the default. That's what I intend for 
lineSplitter, which IMO would be better than the readln/byLine 
specific terminator approach. It would handle all text files 
portably, even ones that don't have the official system line 
ending chars, by default.

lineSplitter would be useful in other ways, e.g. for counting 
lines in a string.



More information about the Digitalmars-d mailing list