code.dlang.org downtime

Mon Dec 16 11:34:28 UTC 2019

Am 16.12.2019 um 12:23 schrieb WebFreak001:
> On Monday, 16 December 2019 at 11:04:38 UTC, Sönke Ludwig wrote:
>> As you may have already noticed, the main registry server, 
>> code.dlang.org got unreachable yesterday. This was caused by an old 
>> VPS of mine getting terminated. The registry had already moved to a 
>> different server years ago, but, without me realizing it, the DNS 
>> entry still pointed to the old one, with a "temporary" HTTP proxy 
>> forwarding to the new server being set up.
>>
>> By now the DNS entry has been corrected, an up-to-date TLS certificate 
>> is in place, and the registry is running stable. There are still 
>> reports of people not being able to access code.dlang.org, which is 
>> apparently caused by intermediate DNS servers still reporting the old 
>> IP address and should start working during the next few hours. A 
>> temporary workaround is to specify --registry=http://31.15.67.41/ on 
>> the dub command line.
>>
>> Unfortunately both fallback servers have been down for a while now, so 
>> that this resulted in a total blackout. I plan to move the main 
>> registry to a powerful dedicated server in January, which will fix all 
>> memory resource related issues that sometimes show up, and could then 
>> keep the current VPS as a relatively reliable fallback server. Both 
>> together should guarantee virtually 100% uptime, although more 
>> fallback servers are of course highly desirable.
>>
>> In addition to that, I plan to separate the repository polling process 
>> form the web and REST frontend, as the former appears to be the main 
>> cause for failures (a GC memory leak of some kind and a possibly 
>> codegen related crash when being compiled with DMD being the two known 
>> issues, which both need further investigation).
> 
> yay thanks for fixing this so soon.
> 
> In my experience having a background task fetching the whole time with 
> vibe.d has nearly always been a bad idea in terms of memory for me. 
> These days I started using cronjobs which run every so often instead and 
> let the OS do all the memory freeing which works a lot better. This also 
> scales a lot better because all workers just read/write to the database 
> server and can be increased or decreased at any point.
> 
> Have you maybe also considered making the package zip downloads a 
> separate server? It could be load balanced using nginx as well.

The zips are currently just redirects to GitHub/Bitbucket, but 
ultimately we should really cache them, if just to keep old versions of 
packages available in case they disappear from the original repository 
for whatever reason.

BTW, it looks like most of the CI failures that are usually attributed 
to the registry are in fact caused by GitHub, probably in combination 
with the borderline short timeout that is currently configured for 
dub/curl. I'd imagine that the timeout must be a frequent problem in 
particular for countries such as China, where the network latency adds a 
few hundred milliseconds on top of the server response time.