Ddoc to PDF

Alix Pexton alix.DOT.pexton at gmail.DOT.com
Mon Oct 18 04:54:13 PDT 2010


On 17/10/2010 18:45, Walter Bright wrote:
> Apparently, it is fairly simple to convert plain text files to PDF.
>
> http://re-factor.blogspot.com/2010/10/text-to-pdf.html
>
> Which suggests to me it should be equally simple to create a Ddoc macro
> file to allow Ddoc to emit pdf files directly.
>
> Anyone want a nice weekend project to product this?

I read the PDF spec once*, I can see in my mind what a PDF generated by 
DDoc could look like, and I'm quite confident in saying that it is 
nothing like as pretty or simple to produce as the current, most basic 
HTML output.

The "Hello Worlrd" for PDF (found in appendix H of the spec) makes DNA 
look simple and concise**.

When generating a PDF, one has to to do all the layout, calculating when 
to place line breaks and begin new pages. When generating HTML, all this 
work is left to the web browser instead, which is why PDFs always look 
the same, but web pages are rendered 11 different ways by 7 different 
browsers.

PDFs do have a tree like structure, but they are not laid out like a 
html file. Instead, there is a stream of cross referenced objects, each 
with a unique reference number and a reference to its parent and a list 
of its children. This means that paragraphs which span pages need to be 
broken up into pieces contained within different objects.

Doing a layout for an unstructured stream of text in a fixed width 
typeface (such as in the link you posted) is quite simple, but - as far 
as I can fathom - is still beyond the current DDoc. Using variable width 
typefaces, indentation, borders, emphasis, etc. to try and produce a PDF 
with the same visual style as that which can be easily achieved using 
the current HTML macros would be very difficult (though I'm not going to 
go so far as saying its impossible). I think something quite pleasing 
could be generated with minimal post processing, but not by using DDoc 
alone, after all, there is post processing for DDoc right now, every 
time its HTML output is loaded into a browser.

So, what enhancements do I think DDoc needs to be able to support the 
generation of PDFs?

After a lot of thought, I have come to the conclusion that giving DDoc 
the power required to calculate layout in a way that is general enough 
to be used not only by PDF but by any other layout technology, and the 
ability to work with a flattened tree, is a non starter.

Alternatively, I can't help wondering if it would be possible to use Ds 
compile time abilities to perform the post processing necessary? Well, I 
know its powerful enough, but there are a few issues with letting code 
from another source play in your sandbox when all one wanted to do was 
read the instructions... But, if the DDoc macro file specified on the 
command line could contain D code  for post processing that is run by 
the CTFE engine and passed the expanded DDoc, then it could be 
flattened, parsed to calculate line length, generate all the cross 
references, split it all into pages and spat out as a PDF.

I still think it would be more than 1 weekend's work though***.

CAVEAT LECTOR!

I'm not an expert at PDFs or DDoc, so I'd be very happy to be proven 
wrong, the wronger**** the better ^^

A...

*    Not as crazy as reading it twice would be.
**   I will admit that this is possibly a slight exaggeration.
***  I, however, code slower than the average bear.
**** I know that is not a real word, so don't complain ><


More information about the Digitalmars-d mailing list