Reading web pages

Bystroushaak bystrousak at kitakitsune.org
Fri Jan 20 09:08:29 PST 2012


If you want to know what type of file you just downloaded, look at 
.getResponseHeaders():

   std.file.write("logo3w.png", cast(ubyte[]) 
cl.get("http://www.google.cz/images/srpr/logo3w.png"));
   writeln(cl.getResponseHeaders()["Content-Type"]);

Which will print in this case: image/png

Here is full example: 
https://github.com/Bystroushaak/DHTTPClient/blob/master/examples/download_binary_file.d

On 20.1.2012 18:00, Bystroushaak wrote:
> It is unlimited, you just have to cast output to ubyte[]:
>
> std.file.write("logo3w.png", cast(ubyte[])
> cl.get("http://www.google.cz/images/srpr/logo3w.png"));
>
> On 20.1.2012 17:53, Xan xan wrote:
>> Thank you very much, Bystroushaak.
>> I see you limite httpclient to xml/html documents. Is there
>> possibility of download any files (and not only html or xml). Just
>> like:
>>
>> HTTPClient navegador = new HTTPClient();
>> auto file = navegador.download("http://www.google.com/myfile.pdf")
>>
>> ?
>>
>> Thanks a lot,
>>
>>
>>
>> 2012/1/20 Bystroushaak<bystrousak at kitakitsune.org>:
>>> First version was buggy. I've updated code at github, so if you want
>>> to try
>>> it, pull new version (git pull). I've also added new example into
>>> examples/user_agent_change.d
>>>
>>>
>>> On 20.1.2012 16:08, Bystroushaak wrote:
>>>>
>>>> There are two ways:
>>>>
>>>> Change global variable for module:
>>>>
>>>> dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own
>>>>
>>>> This will change headers for all clients.
>>>>
>>>> ---
>>>>
>>>> Change instance headers:
>>>>
>>>> string[string] my_headers = dhttpclient.FFHeaders; // there are more
>>>> headers than just User-Agent and you have to copy it
>>>> my_headers["User-Agent"] = "My own spider!";
>>>>
>>>> HTTPClient navegador = new HTTPClient();
>>>> navegador.setClientHeaders(my_headers);
>>>>
>>>> ---
>>>>
>>>> Headers are defined as:
>>>>
>>>> public enum string[string] FFHeaders = [
>>>> "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs;
>>>> rv:1.9.2.3)
>>>> Gecko/20100401 Firefox/3.6.13",
>>>> "Accept" :
>>>>
>>>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>>>>
>>>>
>>>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
>>>> "Accept-Charset" : "utf-8",
>>>> "Keep-Alive" : "300",
>>>> "Connection" : "keep-alive"
>>>> ];
>>>>
>>>> /// Headers from firefox 3.6.13 on Linux
>>>> public enum string[string] LFFHeaders = [
>>>> "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3)
>>>> Gecko/20100401 Firefox/3.6.13",
>>>> "Accept" :
>>>>
>>>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>>>>
>>>>
>>>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
>>>> "Accept-Charset" : "utf-8",
>>>> "Keep-Alive" : "300",
>>>> "Connection" : "keep-alive"
>>>> ];
>>>>
>>>> Accept, Accept-Charset, Kepp-ALive and Connection are important and if
>>>> you redefine it, module can stop work with some servers.
>>>>
>>>> On 20.1.2012 15:56, Xan xan wrote:
>>>>>
>>>>> On the other hand, I see dhttpclient identifies as
>>>>> "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
>>>>> Gecko/20100401 Firefox/3.6.13"
>>>>>
>>>>> How can I change that?


More information about the Digitalmars-d-learn mailing list