DLang Spec rewrite (?)

H. S. Teoh hsteoh at quickfur.ath.cx
Mon May 27 18:03:52 PDT 2013


On Mon, May 27, 2013 at 05:30:27PM -0700, Jonathan M Davis wrote:
[...]
> Well, it's more user-friendly to have macros for Unicode than having
> to figure out how to input the actual Unicode character in there
> (since it's not on the keyboard), and it's trivial to turn the macro
> into the actual character with the macro, so I'd think that it would
> be more user-friendly to just use the macros, especially if we're
> already using them. And if laTeX has to be ASCII (I don't know if it
> has to be or not), then that's all the more reason to not use Unicode
> directly. But regardless, if we're already using macros, why bother
> changing it? Just change what the macros convert to in the XHTML
> generation.
[...]

Plain vanilla LaTeX assumes ASCII input, and will do odd things if fed
8-bit data (much less UTF-8). I think macros for HTML entities is the
way to go, given the current setup.

However, it is not a straightforward 1-to-1 mapping between &entity; and
macro; to truly support LaTeX properly, one should be aware of some of
its idiosyncrasies. For example, in Unicode, a character like ẃ can be
represented by w *followed* by a combining diacritic; in LaTeX, however,
the combining diacritic must *precede* the modified character (that is,
\'w). So such characters should be represented by a single macro, say
$(WACUTE), rather than w followed by a general $(ACUTE), which will be
impossible to translate to LaTeX correctly.

LaTeX also has some special sequences for different kinds of spacings:
an abbreviation like "Mr." requires the interspersing space to be
escaped, i.e., "Mr.\ X", otherwise it will treat the "." as a sentence
terminator and give it an overly-wide space in the output. This may make
it a bit annoying to write in Ddoc, though, 'cos you'll need a macro of
some sort to indicate this non-terminating ".".

The correct way to represent quotation marks in LaTeX is `` and '' for
double quotes, and ` and ' for single quotes. Writing " or ' will still
work, but it will just be ugly in the output.

If there are math formulae involved, then they need to be enclosed with
$, for example: "This sentence contains $2+2=4$ words." Inside math
formulae, a slightly different syntax is used, but for the purposes of
Ddoc, I think that can probably be ignored for now.

A bunch of metacharacters need to be escaped; I can't recall the list
off the top of my head, but they include at the very least:

	~ # $ % ^ & { } _ \

The escape sequences required for these metacharacters are not all
obvious; for example, \\ is NOT an escaped backslash, it's a linebreak.
I forgot what a literal backslash is... And \^ is NOT a literal caret;
it's a circumflex accent on the next letter; ditto with \~. Though IIRC
\$ does represent a literal $. So, some care is required to make things
work correctly. :)


T

-- 
It is impossible to make anything foolproof because fools are so ingenious. -- Sammy


More information about the Digitalmars-d mailing list