Can you shrink it further?

Stefan Koch via Digitalmars-d digitalmars-d at puremagic.com
Thu Oct 13 21:32:26 PDT 2016


On Friday, 14 October 2016 at 04:21:28 UTC, Stefan Koch wrote:
> On Wednesday, 12 October 2016 at 17:59:51 UTC, Andrei 
> Alexandrescu wrote:
>> On 10/12/2016 01:05 PM, safety0ff wrote:
>>> On Wednesday, 12 October 2016 at 16:48:36 UTC, safety0ff 
>>> wrote:
>>>> [Snip]
>>>
>>> Didn't see the LUT implementation, nvm!
>>
>> Yah, that's pretty clever. Better yet, I suspect we can reuse 
>> the look-up table for front() as well. -- Andrei
>
> The first results from stoke are in.
> It turns out stoke likes to produce garbage :(
> It's smallest result so far has around 100 instructions.
> However it might get better if I give it a few more hours to 
> explore.

Also I doubt that it is correct :(

testb $0x8, 0x200aa9(%rip)
movl $0x6, %eax
prefetchnta 0x200a9d(%rip)
je .L_400650
mulb -0x4(%rsp)
movb $0xfa, -0x5(%rsp)
vmovd (%rax), %xmm6
pmovzxbd -0x5(%rsp), %xmm11      1
psrad $0xf9, %xmm6
movl $0xef, %esp
pextrd $0xfe, %xmm6, (%rax)
.L_4005b0:
vrsqrtps 0x200a69(%rip), %ymm13
vzeroall
incl %edi
cmpb %ah, %dl
cmpq %rdi, %rdi
jbe .L_400640
ja .L_4005f0
pcmpeqq -0x4(%rsp), %xmm10
sbbb %ah, 0x200a4d(%rip)
jmpq .L_400643
.L_4005f0:
ja .L_40060c
jmpq .L_400643
.L_40060c:
ja .L_400628
minsd 0x200a3c(%rip), %xmm10
jmpq .L_400643
.L_400628:
vmovsldup %ymm3, %ymm3
vrcpps %ymm12, %ymm7
vrsqrtps -0x4(%rsp), %xmm0
fldl2t
vmovmskpd %xmm8, %r10
vrcpps %xmm6, %xmm13
rcrw $0xf7, %ax
jbe .L_400643
sbbq $0x40, %rax
xorb $0xfe, 0x200a0d(%rip)
adcw $0xf0, %r10w
.L_400640:
vmaskmovpd %xmm4, %xmm10, 0x2009ff(%rip)
pabsb %xmm12, %xmm15
.L_400643:
jne .L_4005b0
.L_400650:
retq

I am not quite sure what this does.
But I am certain it has nothing to do with UTF-8 decoding :)

Oh btw using an end pointer instead of a length reduces the table 
version to 12 instructions.


More information about the Digitalmars-d mailing list