GitHub could be acquired by Microsoft

Sat Jun 9 07:06:23 UTC 2018

On 06/08/2018 06:02 PM, Brad Roberts wrote:
> 
> Essentially (if not actually) everything on github is available through 
> their api's.  No need for scraping or other heroics to gather it.

That does make things a little bit simpler, but web scraping really 
isn't all that much more complicated.

Whether web API or web scraping: Either way, you still have to submit an 
HTTP request, parse the results according to the format the server has 
chosen to spit out, and possibly follow up with additional HTTP 
requests. The main differences are just: Web scraping can occasionally 
get thwarted by changes in the webapp's presentation layer. Whereas web 
API can occasionally get thwarted by business rules changing what 
is/isn't accessible via API (this has been known to happen).

Ie, scraping needs to deal with UI changes, but unlike API, it cannot be 
selectively hindered/disabled (unless the primary website itself is 
hindered/disabled, too).

Thus, a robust tool will support both published web API and web 
scraping, and select the answers from whichever one works.