How to continue work on std.zip
berni44
dlang at d-ecke.de
Mon Oct 21 13:35:37 UTC 2019
On Monday, 21 October 2019 at 08:44:30 UTC, Jonathan M Davis
wrote:
First of all, thanks for pointing me to the ISO standard. I was
not aware of this standard (the german wikipedia does not mention
it and I must have overlooked it, when I skimmed through the
english one). I agree, that we should stick to that standard
which answers several of my questions.
Second, my goal is to improve Phobos. I'm pretty much aware, that
this means to avoid breaking code at (almost) all costs. In case
of std.zip the only "breaking" change will be to get rid of some
constants (digiSignLength, diskNumber) that should have been
private from the very beginning or are only usefull with
multidisk support. And this is done with a full deprecation cycle.
>> * Add structs that directly map the data in the zip records.
>> As Atila pointed
>> out, this should be done with serialisation instead of unions
>> (what I planed
>> and what he convinced me not to do, because this results in
>> endianess problems).
>> See below (*) for some suggestions, on how this might be done.
>
> What would be the point of having something so low-level? The
> ZipMembers are already put in an AA in the ZipArchive.
Sorry, I think this was a missunderstanding. I don't want to
change anything of the AA nor the members in that AA. For me the
question is, how to get the data from this void[]-array into the
AA. The current approach uses lot's of lines like "_diskNumber =
getUshort(i + 4);" with magic constants (here 4), which is error
prone. I'd prefere to get rid of these in favour of some structs
to avoid adding bugs when fixing the existing issues.
>> * Reversing the order of the four items in this file would
>> bring the more
>> important ones to the beginning. This would make for better
>> documentation.
>
> That would be terrible for diffs and really wouldn't help much
> with the documentation. There are already links to the
> top-level symbols at the top of the documentation, and the
> example at the top shows how the various types are related.
> [...]
I disagree here. First of all, documentation is made for humans
and this should be more important than diffs. (But maybe there is
some aspect of diffs, that I'm not aware of. For me, this would
be a larger change once, and after that change, the diffs would
again be pretty small.)
From my experience, the links at the top are not usefull for
people looking the first time into the docs of some module. At
that time, one often does not know, which item is the one they
are looking for.
For example. When I first looked at the documentation of std.zip,
I was looking for a way to extract an existing archive. I
expected that to be done by a function called unzip, but found no
such function. Skimming the example at the top I thought, this
was about creating zip archives (The comment in the first line
tells that this is about reading zip files; I don't know exactly,
why I overlooked this. Maybe, because it's small letters in light
gray; maybe, because it's english, maybe I did not expect it
there; I can't tell). Therefore I scrolled down. I found
ZipException which is obviously of no use. After that I found
CompressionMethod which is of little use either. Next there is
ArchiveMember, which is more interesting, but yet does not show a
way on how to get to the content of a zip file.
Finally, if you do not overlook it (which is quite possible
between all those members of ArchiveMember and ZipArchive),
you'll find ZipArchive. But again you don't know how to use it.
The constructor I was looking for is actually the second to last
item, very well hidden.
I went into detail here, because I think this is important. I had
this experience over and over again with the documentation of
Phobos. Comparing this to the documentation of PHP [1], where I
never had any problems with, I conclude that there is space for
improvement. ;-)
But it's getting a little bit off topic. Maybe Phobos
documentation should be discussed in a separate threat.
[1] https://www.php.net/manual/de/book.zip.php
>> * Using unittests for examples (maybe replacing the two
>> examples at the top of the module).
>
> The example at the top of the module is useful and really
> shouldn't be removed.
No, I don't want to remove them. Just moving them inside of a
unittest to make sure, that documentation and implementation does
not diverge too much.
>> * Rewriting some of the documentation for better ddoc
>> conformation and clearer descriptions.
>
> I have no clue what you mean by better ddoc conformation.
Probably also a missunderstanding. What I mean is: As far as I
know, all parameters of all functions should be listed; the
return value should be mentioned and a "see also" section is
recommended and stuff like this. Most of the functions do not
have that information. For example: The documentation of
"extractVersion" is "Read Only" which is not really useful, isn't
it?
>> * Support for ranges. For unzipping, a RandomAccessRange could
>> be used, for
>> zipping an OutputRange could be provided. Also, the
>> ArchiveMembers could be
>> accessed via an other RandomAccessRange. These ranges would
>> (in some cases)
>> allow for lazy evaluation, which might result in a speed gain,
>> when not all
>> elements of an archive need to be read (or kept in memory).
>
> Given that ZipArchive already sticks everything in an AA, I'm
> not sure what you'd gain by trying to add a random access range
> as well. And random-access would be incompatible with lazy
> evaluation.
This is probably only about the last range I mentioned above,
isn't it? That's the one I myself are most unsure if it's going
to be usefull. One could use AA.byValue etc instead and that's
probably sufficient.
I added support for ranges in the list, because it was explicitly
mentioned in the vision 2016H2 [2]. I think, there are use cases,
where laziness can be applied. Think of an archive with lots of
files, where only some of them are going to be extracted. The
underlying file-access-range can use fseek and the like to only
load the part of the archive, that's gona be needed. zip's
structure allowes to fill in the AA (almost) completely by
scanning the directory at the end of the file. I write "almost",
because the real data content is still missing. But that can be
retrieved, when it's going to be used.
[2] https://wiki.dlang.org/Vision/2016H2
> In order to actually have lazy evaluation, you'd have to have a
> completely different design, because ZipArchive creates the AA
> up front - and changing that would cause problems for existing
> code.
I think, it can be done with the current (public) design. I would
add a new constructor that takes a range instead of void[]. All
else would happen underneath.
>> * Files larger than 4GB? I see no reason, why we should block
>> them. All it
>> would probably need is to move from uint to ulong.
>
>>From what little I understand, that would involve being zip64
>>rather than normal zip,
Indeed, but zip64 is allready implemented. There is just still
the limit on the filesize.
More information about the Digitalmars-d
mailing list