Web crawler/scraping

Carlos Cabral cmpscabral at gmail.com
Wed Feb 17 18:06:55 UTC 2021


On Wednesday, 17 February 2021 at 13:13:00 UTC, Adam D. Ruppe 
wrote:
> On Wednesday, 17 February 2021 at 12:12:56 UTC, Carlos Cabral 
> wrote:
>> I'm trying to collect some json data from a website/admin 
>> panel automatically, which is behind a login form.
>
> Does the website need javascript?
>
> If not, my dom.d may be able to help. It can download some 
> HTML, parse it, fill in forms, then my http2.d submits it (I 
> never implemented Form.submit in dom.d but it is pretty easy to 
> make with other functions that are implemented, heck maybe I'll 
> implement it now if it sounds like it might work).
>
> Or if it is all json you might be able to just craft some 
> requests with my lib or even phobos' std.net.curl that submits 
> the login request, saves a cookie, then fetches some json stuff.
>
> I literally just rolled out of bed but in an hour or two I can 
> come back and make some example code for you if this sounds 
> plausible.

...and it's working :)
thank you Adam and Ferhat

leaving this here if anyone needs:

```
import std.stdio;
import std.string;
import std.net.curl;
import core.thread;

void main()
{
     int waitTime = 5;
     auto domain = "https://example.com";
     auto cookiesFile = "cookies.txt";
     auto http = HTTP();

     http.handle.set(CurlOption.use_ssl, 1);
     http.handle.set(CurlOption.ssl_verifypeer, 0);
     http.handle.set(CurlOption.cookiefile, cookiesFile);
     http.handle.set(CurlOption.cookiejar , cookiesFile);
     http.setUserAgent("...");
     http.onReceive = (ubyte[] data) { (...) }

     http.method = HTTP.Method.get;
     http.url = domain ~ "/login";
     http.perform();

     Thread.sleep(waitTime.seconds);

     auto data = "username=user&password=pass";
     http.method = HTTP.Method.post;
     http.url = domain ~ "/login";
     http.setPostData(data, "application/x-www-form-urlencoded");
     http.perform();

     Thread.sleep(waitTime.seconds);

     http.method = HTTP.Method.get;
     http.url = domain ~ "/fetchjson";
     http.perform();
}
```


More information about the Digitalmars-d-learn mailing list