dpaste and the wayback machine

Wyatt via Digitalmars-d digitalmars-d at puremagic.com
Mon Feb 8 08:44:45 PST 2016


On Sunday, 7 February 2016 at 21:59:00 UTC, Andrei Alexandrescu 
wrote:
> Dpaste currently does not expire pastes by default. I was 
> thinking it would be nice if it saved them in the Wayback 
> Machine such that they are archived redundantly.
>
> I'm not sure what's the way to do it - probably linking the 
> newly-generated paste URLs from a page that the Wayback Machine 
> already knows of.
>
> I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec 
> (when the WM does not see a link that is search for, it offers 
> the option to archive it) obtaining 
> https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec.
>
>
> Thoughts?
>
You want it in Wayback?  Sounds like you need some WARC [0].  
Since anyone can upload to IA (using a nice S3-like API, even 
[1]), this should be pretty uncomplicated.  If you can get a list 
of all the paste URLs, you can use wget [2] to build the WARC 
fairly trivially. [3]  Then I'd suggest getting a dlang account 
and make an item [4] out of it.  Just make sure it's set to 
mediatype:web and it should get ingested by Wayback.

After that?  Generate a WARC when a paste is made and use the 
dlang S3 keys to add it to the previous item (or maybe just do it 
daily or weekly so as to not stress the derive queue too much).  
I'm pretty sure that's all that's needed.

-Wyatt

[0] http://fileformats.archiveteam.org/wiki/WARC
[1] https://archive.org/help/abouts3.txt
[2] -i,  --input-file=FILE   download URLs found in local or 
external FILE.
[3] 
http://www.archiveteam.org/index.php?title=Wget#Creating_WARC_with_wget
[4] 
https://blog.archive.org/2011/03/31/how-archive-org-items-are-structured/


More information about the Digitalmars-d mailing list