DMD Source Archive - Why?
Steven Schveighoffer
schveiguy at gmail.com
Wed Apr 10 16:42:53 UTC 2024
On Wednesday, 10 April 2024 at 03:47:30 UTC, Walter Bright wrote:
> On 4/9/2024 4:42 PM, Steven Schveighoffer wrote:
>> I will also bet that any difference in compile time will be
>> extremely insignificant. I don't bet against decades of
>> filesystem read optimizations. Saving e.g. microseconds on a
>> 1.5 second build isn't going to move the needle.
>
> On my timing on compiling hello world, a 1.412s build becomes
> 1.375s, 35 milliseconds faster. Most of the savings appear to
> be due to when the archive is first accessed, its table of
> contents is loaded into the path cache and file cache that you
> developed. Then, no stats are done on the filesystem.
Yes, the nice thing is knowing you will not have to ask the
filesystem for something you know doesn't exist. Pre-loading the
directory structure could do the same thing, but I think that's
definitely not as efficient.
>> The only benefit I might see in this is to *manage* the source
>> as one item.
>
> The convenience of being able to distribute a "header only"
> library as one file may be significant. I've always liked
> things that didn't need an installation program. An install
> should be "copy the file onto your system" and uninstall should
> be "delete the file" !
>
> Back in the days of CD software, my compiler was set up so no
> install was necessary, just put the CD in the drive and run it.
> You didn't even have to set the environment variables, as the
> compiler would look for its files relative to where the
> executable file was (argv[0]). You can see vestiges of that
> still in today's dmd.
>
> Of course, to get it to run faster you'd XCOPY it onto the hard
> drive. Though some users were flummoxed by the absence of
> INSTALL.EXE and I'd have to explain how to use XCOPY.
Consider that java archives (`.jar` files) are distributed as a
package instead of individual `.class` files.
And Microsoft (and other C compilers) can produce "pre-compiled
headers", that take away some of the initial steps of compilation.
I think there would be enthusiastic support for D archive files
that reduce some of the compilation steps, or provide extra
features (e.g. predetermined inference or matching compile-time
switches). Especially if you aren't going to directly edit these
archive files, you will be mechanically generating them, why not
do more inside there?
> A tar file is serial, meaning one has to read the entire file
> to see what it is in it (because it was designed for tape
> systems where data is simply appended).
You can index a tar file easily. Each file is preceded by a
header with the information about the file (including size). So
you can determine the catalog by seeking to each header.
Note also that we can work with tar files to add indexes that are
backwards compatible with existing tools. Remember, we are
generating this *from a tool that we control*. Prepending an
index "file" is trivial.
> The tar file doesn't have a table of contents, the filename is
> limited to 100 characters, and the path is limited to 155
> characters.
I'm not too worried about such things. I've never run into
filename length problems with tar. But also, most modern tar
formats do not have these limitations:
https://www.gnu.org/software/tar/manual/html_section/Formats.html
>
> Sar files have a table of contents at the beginning, and
> unlimited filespec sizes.
>
> P.S. the code that actually reads the .sar file is about 20
> lines! (Excluding checking for corrupt files, and the header
> structure definition.) The archive reader and writer can be
> encapsulated in a separate module, so anyone can replace it
> with a different format.
I would suggest we replace it with a modern tar format for
maximum compatibility with existing tools. We already have seen
the drawbacks of using the abandoned `sdl` format for dub
packages. We should not repeat that mistake.
-Steve
More information about the Digitalmars-d
mailing list