New web newsreader - requesting participation

Adam Ruppe destructionator at gmail.com
Mon Jan 31 06:12:07 PST 2011


Nick Sabalausky wrote:
> It's amazing how often people seem to forget [a:visited] exists.

Yeah, it boggles my mind - I personally find it incredibly useful.
But every design I get for clients invariably has visited colors
purposefully indistinguishable from regular links.

Other things that break it for a lot of people is URLs randomly
change ever so slightly, or don't change at all, which throws a
wrench in caching too.

I blame AJAX. (cue someone saying "ajax doesn't need to break it!
yeah, I know.)


Speaking of caching, that's something I want to work here, but
there's one problem with that: checking for replies means the
page's contents might actually change.

I figure I'll set the cache expires date to coincide with the
next newnews check. New posts won't show up immediately anywhere,
but it'll be a little faster to navigate around in the mean time.
(I'm thinking about a 30 minute check on .D and .learn, and a one
hour check on .announce, since it's slower moving anyway.)


> Oh, speaking of fuzzy detection algorithms, it seems to think
> that the "//"
> comment tokens are URLs (very, very short URLs ;) ).

Yea, looks like std.regex.url kinda sucks. It flagged that, but
it didn't match paths in website links. (Maybe I'm doing it wrong?)


> One very rough idea: Take each paragraph (ie, each block of text
> that's separated by a full newline). Run it through a D lexer. If
> it has at most, say, 1 lexical error per line (on average), then
> assume it's intended as D code.

I don't think that will work because a lot of regular sentences
would register as a series of variable names. It'd probably
have to try at least a rudimentary parse.

(For comparison, consider a jumble of English words. Each piece is
a word, so no problem there, but without understanding what they
mean, you can't tell if it is a meaningful sentence or not.)


> Actually, what could also be interesting would be an "english
> parser". Obviously true full-fledged english semantic processing
> is out-of-reach ATM, but I wonder if something could be made that
> acts "good enough" as a mere english-*detector*. Or a general
> natural-language-detector.

I did put a very primitive thing like this in there: it checks
for ". " when guessing if it's code or not when not sure. My
reasoning is that while periods are common in both, in code it
is usually followed by a method name, whereas in English, we
usually put a space in there.

I sometimes write ".\n" in code, but ". " is pretty rare in my
own usage, outside comments.


Another thing I considered was to check the frequency of
capitalized words vs punctuation, or for balanced brackets and
stuff like that. Natural language uses a lot of capital letters
right after spaces. Code is more likely to be camelCased. There's
some crossover ("McDonald's" could flag either way), but looking
for bizarre symbols like parens, operators, etc. might disambiguate
it.

However, "line[$-1] == ';'" and friends were so much simpler and
so far, seem to give good enough results, so I let it stay at that.


More information about the Digitalmars-d-announce mailing list