New web newsreader - requesting participation

Sun Jan 30 22:48:35 PST 2011

"Adam Ruppe" <destructionator at gmail.com> wrote in message 
news:ii592i$c09$1 at digitalmars.com...
>
> c) It tries to convert news posts to HTML, so the paragraphs
> wrap to the browser, links work, quotes are put into the proper
> tags for indentation, and it tries to auto-detect D code and
> put it in a <pre> block - which my javascript can make inline
> editable and runnable. Example:
>
> http://arsdnet.net/d-web-site/nntp/get-message?
> newsgroup=digitalmars.D&messageId=%
> 3Cmailman.1085.1296409409.4748.digitalmars-d%40puremagic.com%3E
>
> With script disabled, you'll see the code in a different colored
> block. With script enabled, you'll see an Edit button there
> too.
>

That's really cool.

> d) It tries to convert HTML emails back to plain text. (Ironically,
> so it can turn it back to html...)

I love that on so many different levels :)

> h) Already read messages is tracked by your browser - if the link
> is visited, it puts up a different color url.
>

It's amazing how often people seem to forget that feature exists. That was 
introduced in what, Mosaic?  Sometimes I think I'm the only one in the world 
who ever uses the "a:visited" CSS. Not that I feel strongly about it, but 
hey.

> It doesn't always recognize code. This would be ok, but if it
> sees one line as code but doesn't include one of them, it would
> confuse the reader. Example:
>
> http://arsdnet.net/d-web-site/nntp/get-message?
> newsgroup=digitalmars.D&messageId=%3Cii4lbj%242bes%241%
> 40digitalmars.com%3E
>
> (Look for "auto str =")
>

Ha! I broke your algorithm!

Oh, speaking of fuzzy detection algorithms, it seems to think that the "//" 
comment tokens are URLs (very, very short URLs ;) ).

> The reason for this is it detects code lines by looking for
> semicolons and open braces. It will call something a generic
> <pre> if there's a lot of whitespace in it - figuring it is
> probaby ascii art (if it thinks the whitespace has human
> significance, it tries to preserve it), but it still isn't
> a perfect detection function.
>
> I'm open to ideas. We want to detect code, but not flag
> regular English text.
>

One very rough idea: Take each paragraph (ie, each block of text that's 
separated by a full newline). Run it through a D lexer. If it has at most, 
say, 1 lexical error per line (on average), then assume it's intended as D 
code. If multiple consecutive paragraphs are flagged as D code, consider it 
them all part of the same code-block.

After all, D's supposed to be fast to lex (and to parse for that matter), 
and you'd only need to do it once and cache the result. Maybe it could even 
be tied into some syntax highlighting. Maybe use DDMD (we could use more 
people on DDMD anyway - Koroskin doesn't seem to have had time for it 
lately...neither have I for that matter...).

Actually, what could also be interesting would be an "english parser". 
Obviously true full-fledged english semantic processing is out-of-reach ATM, 
but I wonder if something could be made that acts "good enough" as a mere 
english-*detector*. Or a general natural-language-detector. Could be an 
interesting project at the very least.

>
> I'm also open to graphical styling ideas. I put up a dark
> theme here because the white was hurting my eyes, but I change
> on if I like light or dark almost at random. (Depends on the room's
> lighting conditions I think). But I didn't do any more graphic
> setup other than the max-width.
>

I like to use dark themes for my own stuff for the same reasons. But then I 
always end up going with bright-ish themes for public stuff because I know 
I'm in the minority on that. (I'm not really trying to suggest one way or 
another, just commenting.)

>
> BTW, as a fun fact, this post is about 1/4th the size of the
> entire nntp.d code file!

Viva la D!