Web crawler/scraping
Carlos Cabral
cmpscabral at gmail.com
Wed Feb 17 18:06:55 UTC 2021
On Wednesday, 17 February 2021 at 13:13:00 UTC, Adam D. Ruppe
wrote:
> On Wednesday, 17 February 2021 at 12:12:56 UTC, Carlos Cabral
> wrote:
>> I'm trying to collect some json data from a website/admin
>> panel automatically, which is behind a login form.
>
> Does the website need javascript?
>
> If not, my dom.d may be able to help. It can download some
> HTML, parse it, fill in forms, then my http2.d submits it (I
> never implemented Form.submit in dom.d but it is pretty easy to
> make with other functions that are implemented, heck maybe I'll
> implement it now if it sounds like it might work).
>
> Or if it is all json you might be able to just craft some
> requests with my lib or even phobos' std.net.curl that submits
> the login request, saves a cookie, then fetches some json stuff.
>
> I literally just rolled out of bed but in an hour or two I can
> come back and make some example code for you if this sounds
> plausible.
...and it's working :)
thank you Adam and Ferhat
leaving this here if anyone needs:
```
import std.stdio;
import std.string;
import std.net.curl;
import core.thread;
void main()
{
int waitTime = 5;
auto domain = "https://example.com";
auto cookiesFile = "cookies.txt";
auto http = HTTP();
http.handle.set(CurlOption.use_ssl, 1);
http.handle.set(CurlOption.ssl_verifypeer, 0);
http.handle.set(CurlOption.cookiefile, cookiesFile);
http.handle.set(CurlOption.cookiejar , cookiesFile);
http.setUserAgent("...");
http.onReceive = (ubyte[] data) { (...) }
http.method = HTTP.Method.get;
http.url = domain ~ "/login";
http.perform();
Thread.sleep(waitTime.seconds);
auto data = "username=user&password=pass";
http.method = HTTP.Method.post;
http.url = domain ~ "/login";
http.setPostData(data, "application/x-www-form-urlencoded");
http.perform();
Thread.sleep(waitTime.seconds);
http.method = HTTP.Method.get;
http.url = domain ~ "/fetchjson";
http.perform();
}
```
More information about the Digitalmars-d-learn
mailing list