dpaste and the wayback machine

Andrei Alexandrescu via Digitalmars-d digitalmars-d at puremagic.com
Tue Feb 9 10:23:20 PST 2016


On 2/8/16 11:44 AM, Wyatt wrote:
> On Sunday, 7 February 2016 at 21:59:00 UTC, Andrei Alexandrescu
> wrote:
>> Dpaste currently does not expire pastes by default. I was thinking
>> it would be nice if it saved them in the Wayback Machine such that
>> they are archived redundantly.
>>
>> I'm not sure what's the way to do it - probably linking the
>> newly-generated paste URLs from a page that the Wayback Machine
>> already knows of.
>>
>> I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec (when
>>  the WM does not see a link that is search for, it offers the
>> option to archive it) obtaining
>> https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec.
>>
>>
>>
>>
>> Thoughts?
>>
> You want it in Wayback?  Sounds like you need some WARC [0]. Since
> anyone can upload to IA (using a nice S3-like API, even [1]), this
> should be pretty uncomplicated.  If you can get a list of all the
> paste URLs, you can use wget [2] to build the WARC fairly trivially.
> [3]  Then I'd suggest getting a dlang account and make an item [4]
> out of it. Just make sure it's set to mediatype:web and it should get
> ingested by Wayback.
>
> After that?  Generate a WARC when a paste is made and use the dlang
> S3 keys to add it to the previous item (or maybe just do it daily or
> weekly so as to not stress the derive queue too much). I'm pretty
> sure that's all that's needed.

That's intense. I think a simple page (or chained linked collection of
pages) containing links to all pastes defined would suffice. For example
consider defining dpaste.dzfl.pl containing a link to
dpaste.dzfl.pl/today.html. That would contain e.g. the links generated
today and a button "More" linked to dpaste.dzfl.pl/2016-02-08.html
(which would be yesterday). That in turn would contain links to
yesterday's pastes and a link to the day before etc.

My understanding is this is enough to have wayback archive all pastes.

> I'm pretty sure that's Andrei's thought, too. It's a pastebin; people
> use it to make web links to pasted things. If it were to disappear, a
> lot of links would break very permanently because Heritrix has no way
> to index and crawl the site.

Yah.


Andrei




More information about the Digitalmars-d mailing list