DVCS vs. Subversion brittleness (was Re: Moving to D)

Bruno Medeiros brunodomedeiros+spam at com.gmail
Wed Feb 9 05:21:36 PST 2011


On 06/02/2011 14:17, Ulrik Mikaelsson wrote:
> 2011/2/4 Bruno Medeiros<brunodomedeiros+spam at com.gmail>:
>>
>> Well, like I said, my concern about size is not so much disk space, but the
>> time to make local copies of the repository, or cloning it from the internet
>> (and the associated transfer times), both of which are not neglectable yet.
>> My project at work could easily have gone to 1Gb of repo size if in the last
>> year or so it has been stored on a DVCS! :S
>>
>> I hope this gets addressed at some point. But I fear that the main
>> developers of both Git and Mercurial may be too "biased" to experience
>> projects which are typically somewhat small in size, in terms of bytes
>> (projects that consist almost entirely of source code).
>> For example, in UI applications it would be common to store binary data
>> (images, sounds, etc.) in the source control. The other case is what I
>> mentioned before, wanting to store dependencies together with the project
>> (in my case including the javadoc and source code of the dependencies - and
>> there's very good reasons to want to do that).
>
> I think the storage/bandwidth requirements of DVCS:s are very often
> exagerated, especially for text, but also somewhat for blobs.
>   * For text-content, the compression of archives reduces them to,
> perhaps, 1/5 of their original size?
>     - That means, that unless you completely rewrite a file 5 times
> during the course of a project, simple per-revision-compression of the
> file will turn out smaller, than the single uncompressed base-file
> that subversion transfers and stores.
>     - The delta-compression applied ensures small changes does not
> count as a "rewrite".
>   * For blobs, the archive-compression may not do as much, and they
> certainly pose a larger challenge for storing history, but:
>     - AFAIU, at least git delta-compresses even binaries so even
> changes in them might be slightly reduced (dunno about the others)
>     - I think more and more graphics are today are written in SVG?
>     - I believe, for most projects, audio-files are usually not changed
> very often, once entered a project? Usually existing samples are
> simply copied in?
>   * For both binaries and text, and for most projects, the latest
> revision is usually the largest. (Projects usually grow over time,
> they don't consistently shrink) I.E. older revisions are, compared to
> current, much much smaller, making the size of old history smaller
> compared to the size of current history.
>
> Finally, as a test, I tried checking out the last version of druntime
> from SVN and compare it to git (AFICT, history were preserved in the
> git-migration), the results were about what I expected. Checking out
> trunk from SVN, and the whole history from git:
>    SVN: 7.06 seconds, 5,3 MB on disk
>    Git: 2.88 seconds, 3.5 MB on disk
>    Improvement Git/SVN: time reduced by 59%, space reduced by 34%.
>
> I did not measure bandwidth, but my guess is it is somewhere between
> the disk- and time- reductions. Also, if someone has an example of a
> recently converted repository including some blobs it would make an
> interesting experiment to repeat.
>
> Regards
> / Ulrik
>
> -----
>
> ulrik at ulrik ~/p/test>  time svn co
> http://svn.dsource.org/projects/druntime/trunk druntime_svn
> ...
> 0.26user 0.21system 0:07.06elapsed 6%CPU (0avgtext+0avgdata 47808maxresident)k
> 544inputs+11736outputs (3major+3275minor)pagefaults 0swaps
> ulrik at ulrik ~/p/test>  du -sh druntime_svn
> 5,3M    druntime_svn
>
> ulrik at ulrik ~/p/test>  time git clone
> git://github.com/D-Programming-Language/druntime.git druntime_git
> ...
> 0.26user 0.06system 0:02.88elapsed 11%CPU (0avgtext+0avgdata 14320maxresident)k
> 3704inputs+7168outputs (18major+1822minor)pagefaults 0swaps
> ulrik at ulrik ~/p/test>  du -sh druntime_git/
> 3,5M    druntime_git/


Yes, Brad had posted some statistics of the size of the Git repositories 
for dmd, druntime, and phobos, and yes, they are pretty small.
Projects which contains practically only source code, and little to no 
binary data are unlikely to grow much and repo size ever be a problem. 
But it might not be the case for other projects (also considering that 
binary data is usually already well compressed, like .zip, .jpg, .mp3, 
.ogg, etc., so VCS compression won't help much).

It's unlikely you will see converted repositories with a lot of changing 
blob data. DVCS, at the least in the way they work currently, simply 
kill this workflow/organization-pattern.
I very much suspect this issue will become more important as time goes 
on - a lot of people are still new to DVCS and they still don't realize 
the full implications of that architecture with regards to repo size. 
Any file you commit will add to the repository size *FOREVER*. I'm 
pretty sure we haven't heard the last word on the VCS battle, in that in 
a few years time people are *again* talking about and switching to 
another VCS :( . Mark these words. (The only way this is not going to 
happen is if Git or Mercurial are able to address this issue in a 
satisfactory way, which I'm not sure is possible or easy)


-- 
Bruno Medeiros - Software Engineer


More information about the Digitalmars-d mailing list