[Issue 9045] Feature request for std.asscii => function isNewline

d-bugmail at puremagic.com d-bugmail at puremagic.com
Tue Nov 20 12:13:50 PST 2012


http://d.puremagic.com/issues/show_bug.cgi?id=9045



--- Comment #8 from Dmitry Olshansky <dmitry.olsh at gmail.com> 2012-11-20 12:13:48 PST ---
(In reply to comment #7)
> (In reply to comment #1)
> > See representation on various systems:
> > 
> > http://en.wikipedia.org/wiki/Newline
> > 
> > In particular:
> > On Unix, and Mac OS X: LF (1 char)
> > On Windows: CR+LF (2 chars)
> 
> (In reply to comment #5)
> > Technically speaking, if you don't know which type of line endings a file uses
> > 
> > [SNIP]
> 
> Isn't the "line ending" a *file* totally irrelevant here? In the sense that it
> is a nothing more than the system's *storage* format?
> 

There is no system's encoding. It died and buried in the same toomb as FTP
ASCII mode long time ago. After all files are transfered in many different ways
expecting someone to transcode line-endings everywhere is plain impossible (you
don't always know the target system). So by the end of day reasonable programs
just deal with all the zoo of them.

> On my windows machine, the *strings* I manipulate don't have "\r\n" as a
> newline, they have '\n'. That's the entire reason there is a "rb" and "r"
> option when reading a file.

And I'd say rb option is a woefully broken thing. In fact putting \n does in
fact store \r\n in this mode. You are far safer with binary mode at least it's
WYSIWG.

> If you *do* have an "\r\n" in your stream, then either:
> * You have an actual a '\r' in your stream, which is then followed by a new
> line.

> Under these circumstance, and following the unicode definition, I'd say:
> 
> return 0x0A <= c && c <= 0x0D;
> 
> Is not only correct (for ascii), but any attempt to parse more than 1 character
> for this info would be incorrect...

No, no and no. It's the fact of life (or rather the standard) that \r\n is a
single entity. And it can't be parsed other the by looking at two characters
(or rather codepoints).

> 
> PS: WTF is \u{D A}

It's 2 dchars : \r\n. That is 0x0D and 0x0A. They are being cute and use
flexible width syntax not the old ones: \uXXXX and \UYYYYYYYY.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------


More information about the Digitalmars-d-bugs mailing list