DVCS vs. Subversion brittleness (was Re: Moving to D)

Ulrik Mikaelsson ulrik.mikaelsson at gmail.com
Sun Feb 6 06:17:46 PST 2011


2011/2/4 Bruno Medeiros <brunodomedeiros+spam at com.gmail>:
>
> Well, like I said, my concern about size is not so much disk space, but the
> time to make local copies of the repository, or cloning it from the internet
> (and the associated transfer times), both of which are not neglectable yet.
> My project at work could easily have gone to 1Gb of repo size if in the last
> year or so it has been stored on a DVCS! :S
>
> I hope this gets addressed at some point. But I fear that the main
> developers of both Git and Mercurial may be too "biased" to experience
> projects which are typically somewhat small in size, in terms of bytes
> (projects that consist almost entirely of source code).
> For example, in UI applications it would be common to store binary data
> (images, sounds, etc.) in the source control. The other case is what I
> mentioned before, wanting to store dependencies together with the project
> (in my case including the javadoc and source code of the dependencies - and
> there's very good reasons to want to do that).

I think the storage/bandwidth requirements of DVCS:s are very often
exagerated, especially for text, but also somewhat for blobs.
 * For text-content, the compression of archives reduces them to,
perhaps, 1/5 of their original size?
   - That means, that unless you completely rewrite a file 5 times
during the course of a project, simple per-revision-compression of the
file will turn out smaller, than the single uncompressed base-file
that subversion transfers and stores.
   - The delta-compression applied ensures small changes does not
count as a "rewrite".
 * For blobs, the archive-compression may not do as much, and they
certainly pose a larger challenge for storing history, but:
   - AFAIU, at least git delta-compresses even binaries so even
changes in them might be slightly reduced (dunno about the others)
   - I think more and more graphics are today are written in SVG?
   - I believe, for most projects, audio-files are usually not changed
very often, once entered a project? Usually existing samples are
simply copied in?
 * For both binaries and text, and for most projects, the latest
revision is usually the largest. (Projects usually grow over time,
they don't consistently shrink) I.E. older revisions are, compared to
current, much much smaller, making the size of old history smaller
compared to the size of current history.

Finally, as a test, I tried checking out the last version of druntime
from SVN and compare it to git (AFICT, history were preserved in the
git-migration), the results were about what I expected. Checking out
trunk from SVN, and the whole history from git:
  SVN: 7.06 seconds, 5,3 MB on disk
  Git: 2.88 seconds, 3.5 MB on disk
  Improvement Git/SVN: time reduced by 59%, space reduced by 34%.

I did not measure bandwidth, but my guess is it is somewhere between
the disk- and time- reductions. Also, if someone has an example of a
recently converted repository including some blobs it would make an
interesting experiment to repeat.

Regards
/ Ulrik

-----

ulrik at ulrik ~/p/test> time svn co
http://svn.dsource.org/projects/druntime/trunk druntime_svn
...
0.26user 0.21system 0:07.06elapsed 6%CPU (0avgtext+0avgdata 47808maxresident)k
544inputs+11736outputs (3major+3275minor)pagefaults 0swaps
ulrik at ulrik ~/p/test> du -sh druntime_svn
5,3M    druntime_svn

ulrik at ulrik ~/p/test> time git clone
git://github.com/D-Programming-Language/druntime.git druntime_git
...
0.26user 0.06system 0:02.88elapsed 11%CPU (0avgtext+0avgdata 14320maxresident)k
3704inputs+7168outputs (18major+1822minor)pagefaults 0swaps
ulrik at ulrik ~/p/test> du -sh druntime_git/
3,5M    druntime_git/


More information about the Digitalmars-d mailing list