Ddoc to PDF
Alix Pexton
alix.DOT.pexton at gmail.DOT.com
Mon Oct 18 04:54:13 PDT 2010
On 17/10/2010 18:45, Walter Bright wrote:
> Apparently, it is fairly simple to convert plain text files to PDF.
>
> http://re-factor.blogspot.com/2010/10/text-to-pdf.html
>
> Which suggests to me it should be equally simple to create a Ddoc macro
> file to allow Ddoc to emit pdf files directly.
>
> Anyone want a nice weekend project to product this?
I read the PDF spec once*, I can see in my mind what a PDF generated by
DDoc could look like, and I'm quite confident in saying that it is
nothing like as pretty or simple to produce as the current, most basic
HTML output.
The "Hello Worlrd" for PDF (found in appendix H of the spec) makes DNA
look simple and concise**.
When generating a PDF, one has to to do all the layout, calculating when
to place line breaks and begin new pages. When generating HTML, all this
work is left to the web browser instead, which is why PDFs always look
the same, but web pages are rendered 11 different ways by 7 different
browsers.
PDFs do have a tree like structure, but they are not laid out like a
html file. Instead, there is a stream of cross referenced objects, each
with a unique reference number and a reference to its parent and a list
of its children. This means that paragraphs which span pages need to be
broken up into pieces contained within different objects.
Doing a layout for an unstructured stream of text in a fixed width
typeface (such as in the link you posted) is quite simple, but - as far
as I can fathom - is still beyond the current DDoc. Using variable width
typefaces, indentation, borders, emphasis, etc. to try and produce a PDF
with the same visual style as that which can be easily achieved using
the current HTML macros would be very difficult (though I'm not going to
go so far as saying its impossible). I think something quite pleasing
could be generated with minimal post processing, but not by using DDoc
alone, after all, there is post processing for DDoc right now, every
time its HTML output is loaded into a browser.
So, what enhancements do I think DDoc needs to be able to support the
generation of PDFs?
After a lot of thought, I have come to the conclusion that giving DDoc
the power required to calculate layout in a way that is general enough
to be used not only by PDF but by any other layout technology, and the
ability to work with a flattened tree, is a non starter.
Alternatively, I can't help wondering if it would be possible to use Ds
compile time abilities to perform the post processing necessary? Well, I
know its powerful enough, but there are a few issues with letting code
from another source play in your sandbox when all one wanted to do was
read the instructions... But, if the DDoc macro file specified on the
command line could contain D code for post processing that is run by
the CTFE engine and passed the expanded DDoc, then it could be
flattened, parsed to calculate line length, generate all the cross
references, split it all into pages and spat out as a PDF.
I still think it would be more than 1 weekend's work though***.
CAVEAT LECTOR!
I'm not an expert at PDFs or DDoc, so I'd be very happy to be proven
wrong, the wronger**** the better ^^
A...
* Not as crazy as reading it twice would be.
** I will admit that this is possibly a slight exaggeration.
*** I, however, code slower than the average bear.
**** I know that is not a real word, so don't complain ><
More information about the Digitalmars-d
mailing list