Reading web pages

Xan xan xancorreu at gmail.com
Fri Jan 20 09:12:50 PST 2012


Thanks, but what fails that, because I downloaded as collection of
bytes. No matter if a file is a pdf, png or whatever if I downloaded
as bytes, isn't?

Thanks,


2012/1/20 Bystroushaak <bystrousak at kitakitsune.org>:
> If you want to know what type of file you just downloaded, look at
> .getResponseHeaders():
>
>
>  std.file.write("logo3w.png", cast(ubyte[])
> cl.get("http://www.google.cz/images/srpr/logo3w.png"));
>  writeln(cl.getResponseHeaders()["Content-Type"]);
>
> Which will print in this case: image/png
>
> Here is full example:
> https://github.com/Bystroushaak/DHTTPClient/blob/master/examples/download_binary_file.d
>
>
> On 20.1.2012 18:00, Bystroushaak wrote:
>>
>> It is unlimited, you just have to cast output to ubyte[]:
>>
>> std.file.write("logo3w.png", cast(ubyte[])
>> cl.get("http://www.google.cz/images/srpr/logo3w.png"));
>>
>> On 20.1.2012 17:53, Xan xan wrote:
>>>
>>> Thank you very much, Bystroushaak.
>>> I see you limite httpclient to xml/html documents. Is there
>>> possibility of download any files (and not only html or xml). Just
>>> like:
>>>
>>> HTTPClient navegador = new HTTPClient();
>>> auto file = navegador.download("http://www.google.com/myfile.pdf")
>>>
>>> ?
>>>
>>> Thanks a lot,
>>>
>>>
>>>
>>> 2012/1/20 Bystroushaak<bystrousak at kitakitsune.org>:
>>>>
>>>> First version was buggy. I've updated code at github, so if you want
>>>> to try
>>>> it, pull new version (git pull). I've also added new example into
>>>> examples/user_agent_change.d
>>>>
>>>>
>>>> On 20.1.2012 16:08, Bystroushaak wrote:
>>>>>
>>>>>
>>>>> There are two ways:
>>>>>
>>>>> Change global variable for module:
>>>>>
>>>>> dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own
>>>>>
>>>>> This will change headers for all clients.
>>>>>
>>>>> ---
>>>>>
>>>>> Change instance headers:
>>>>>
>>>>> string[string] my_headers = dhttpclient.FFHeaders; // there are more
>>>>> headers than just User-Agent and you have to copy it
>>>>> my_headers["User-Agent"] = "My own spider!";
>>>>>
>>>>> HTTPClient navegador = new HTTPClient();
>>>>> navegador.setClientHeaders(my_headers);
>>>>>
>>>>> ---
>>>>>
>>>>> Headers are defined as:
>>>>>
>>>>> public enum string[string] FFHeaders = [
>>>>> "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs;
>>>>> rv:1.9.2.3)
>>>>> Gecko/20100401 Firefox/3.6.13",
>>>>> "Accept" :
>>>>>
>>>>>
>>>>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>>>>>
>>>>>
>>>>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
>>>>> "Accept-Charset" : "utf-8",
>>>>> "Keep-Alive" : "300",
>>>>> "Connection" : "keep-alive"
>>>>> ];
>>>>>
>>>>> /// Headers from firefox 3.6.13 on Linux
>>>>> public enum string[string] LFFHeaders = [
>>>>> "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3)
>>>>> Gecko/20100401 Firefox/3.6.13",
>>>>> "Accept" :
>>>>>
>>>>>
>>>>> "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
>>>>>
>>>>>
>>>>> "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
>>>>> "Accept-Charset" : "utf-8",
>>>>> "Keep-Alive" : "300",
>>>>> "Connection" : "keep-alive"
>>>>> ];
>>>>>
>>>>> Accept, Accept-Charset, Kepp-ALive and Connection are important and if
>>>>> you redefine it, module can stop work with some servers.
>>>>>
>>>>> On 20.1.2012 15:56, Xan xan wrote:
>>>>>>
>>>>>>
>>>>>> On the other hand, I see dhttpclient identifies as
>>>>>> "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
>>>>>> Gecko/20100401 Firefox/3.6.13"
>>>>>>
>>>>>> How can I change that?


More information about the Digitalmars-d-learn mailing list