On 23 November 2012 05:03, xenon325 <span dir="ltr"><<a href="mailto:1@a.net" target="_blank">1@a.net</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">On Monday, 19 November 2012 at 15:48:23 UTC, Manu wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

This wouldn't strictly retain half precision though, it would be<br>

slightly higher precision since the intermediates were full precision<br>

(which is surely preferable?).<br>

</blockquote>

<br></div>

I would think it's actually not preferable.<br>

Imagine you developed and tuned all the code on x86 and everything is fine. Then run it on ARM and suddenly all computations are inaccurate.<br>

</blockquote></div><br></div><div class="gmail_extra">I think it would always be the case that work is done in float space, hardware support for half applies to fast load/store into full float registers. You would lose precision too fast if work were done directly in half space.</div>

<div class="gmail_extra">Most CPU's also apply this principle to integer work. It's typical to load a byte/short into a 32/64bit register and sign extend or zero extend, then all integer work is done in the maximum integer precision, and store then truncates the top bits again. Many older FPU's work this way too.</div>