D on lm32-CPU: string argument on stack instead of register
Chad Joan
chadjoan at gmail.com
Fri Jul 31 15:13:29 UTC 2020
On Friday, 31 July 2020 at 10:22:20 UTC, Michael Reese wrote:
> Hi all,
>
> at work we put embedded lm32 soft-core CPUs in FPGAs and write
> the firmware in C.
> At home I enjoy writing small projects in D from time to time,
> but I don't consider myself a D expert.
>
> Now, I'm trying to run some toy examples in D on the lm32 cpu.
> I'm using a recent gcc-elf-lm32. I succeeded in compiling and
> running some code and it works fine.
>
> But I noticed, when calling a function with a string argument,
> the string is not stored in registers, but on the stack.
> Consider a simple function (below) that writes bytes to a
> peripheral (that forwards the data to the host computer via
> USB). I've two versions, an ideomatic D one, and another
> version where pointer and length are two distinct function
> parameters.
> I also show the generated assembly code. The string version is
> 4 instructions longer, just because of the stack manipulation.
> In addition, it is also slower because it need to access the
> ram, and it needs more stack space.
>
> My question: Is there a way I can tell the D compiler to use
> registers instead of stack for string arguments, or any other
> trick to reduce code size while maintaining an ideomatic D
> codestyle?
>
> Best regards
> Michael
>
>
> // ideomatic D version
> void write_to_host(in string msg) {
> // a fixed address to get bytes to the host via usb
> char *usb_slave = cast(char*)BaseAdr.ft232_slave;
> foreach(ch; msg) {
> *usb_slave = ch;
> }
> }
> // resulting assembly code (compiled with -Os) 12 instructions
> _D10firmware_d13write_to_hostFxAyaZv:
> addi sp, sp, -8
> addi r3, r0, 4096
> sw (sp+4), r1
> sw (sp+8), r2
> add r1, r2, r1
> .L3:
> be r2,r1,.L1
> lbu r4, (r2+0)
> addi r2, r2, 1
> sb (r3+0), r4
> bi .L3
> .L1:
> addi sp, sp, 8
> b ra
>
> // C-like version
> void write_to_hostC(const char *msg, int len) {
> char *ptr = cast(char*)msg;
> char *usb_slave = cast(char*)BaseAdr.ft232_slave;
> while (len--) {
> *usb_slave = *ptr++;
> }
> }
> // resulting assembly code (compiled with -Os) 8 instructions
> _D10firmware_d14write_to_hostCFxPaiZv:
> add r2, r1, r2
> addi r3, r0, 4096
> .L7:
> be r1,r2,.L5
> lbu r4, (r1+0)
> addi r1, r1, 1
> sb (r3+0), r4
> bi .L7
> .L5:
> b ra
Hi Michael!
Last time I checked, D doesn't have any specific type attributes
or special ways to force variables to enregister. But I could be
poorly informed. Maybe there are GDC-specific hints or something.
I hope that if anyone else knows better, they will toss in an
answer.
THAT SAID, I think there are things to try and I hope we can get
you what you want.
If you're willing to entertain more experimentation, here are my
thoughts:
---------------------------------------
(1) Try writing "in string" as "in const(char)[]" instead:
// ideomatic D version
void write_to_host(in const(char)[] msg) {
// a fixed address to get bytes to the host via usb
char *usb_slave = cast(char*)BaseAdr.ft232_slave;
foreach(ch; msg) {
*usb_slave = ch;
}
}
Explanation:
The "string" type is an alias for "immutable(char)[]".
In D, "immutable" is a stronger guarantee than "const". The
"const" modifier, like in C, tells the compiler that this
function shall not modify the data referenced by this
pointer/array/whatever. The "immutable" modifier is a bit
different, as it says that NO ONE will modify the data referenced
by this pointer/array/whatever, including other functions that
may or may not be concurrently executing alongside the one you're
in. So "const" constraints the callee, while "immutable"
constrains both the callee AND the caller. This makes it more
useful for some multithreaded code, because if you can accept the
potential inefficiency of needing to do more copying of data (if
you can't modify, usually you must copy instead), then you can
have more deterministic behavior and sometimes even much better
total efficiency by way of parallelization. This might not be a
guarantee you care about though, at which point you can just toss
it out completely and see if the compiler generates better code
now that it sees the same type qualifier as in the other example.
I'd actually be surprised if using "immutable" causes /less/
efficient code in this case, because it should be even /safer/ to
use the argument as-is. But it IS a difference between the two
examples, and one that might not be benefiting your cause (though
that's totally up to you).
---------------------------------------
(2) Try keeping the string argument, but make the function more
closely identical in semantics:
// ideomatic D version
void write_to_host(string msg) {
// a fixed address to get bytes to the host via usb
char *usb_slave = cast(char*)BaseAdr.ft232_slave;
while(msg.length > 0) {
*usb_slave = msg[0];
msg = msg[1 .. $];
}
}
Explanation:
First of all, I wouldn't expect you to keep this, especially if
you need utf-8 autodecoding behavior (more on that later). But it
might be revealing if this leads to different assembly output.
The idea behind this one is to see if the regression is actually
caused by the foreach construct, rather than the parameter type.
I did have to change the parameter slightly by removing the "in"
qualifier. It shouldn't make much difference though, because the
'string' type's pointer and length are copied from the caller, so
any modifications to "msg" (that don't affect "msg"'s array
elements) will be contained within the function and will not be
observable anywhere else. In other words, the "in" qualifier is
largely redundant with "string"'s immutability guarantees plus
function argument copying semantics.
---------------------------------------
(3) Try a different type of while-loop in the D-style version:
// ideomatic D version
void write_to_host(in string msg) {
// a fixed address to get bytes to the host via usb
char *usb_slave = cast(char*)BaseAdr.ft232_slave;
size_t i = 0;
while(i < msg.length) {
*usb_slave = msg[i++];
}
}
Explanation:
This is a variant of #2. It does ask for an extra size_t
variable, so I don't have high hopes. But the compiler might
optimize that out and make it look like the C-style version.
Again, I don't expect you to use this version if it discards one
of D's features that you hope to use, but it might at least help
you identify where your expenses are coming from.
---------------------------------------
(4) Try having these examples use "const ubyte* msg" and
"immutable(ubyte)[] msg" instead of "const char* msg" and "string
msg".
// ideomatic D version
void write_to_host(in immutable(ubyte)[] msg) {
// a fixed address to get bytes to the host via usb
ubyte *usb_slave = cast(ubyte*)BaseAdr.ft232_slave;
foreach(ch; msg) {
*usb_slave = ch;
}
}
// C-like version
void write_to_hostC(const ubyte *msg, int len) {
ubyte *ptr = cast(ubyte*)msg;
ubyte *usb_slave = cast(ubyte*)BaseAdr.ft232_slave;
while (len--) {
*usb_slave = *ptr++;
}
}
Explanation:
The "string" type is an alias for "immutable(char)[]", which
seems like it would be very similar to "immutable(ubyte)[]", but
the 'char' element type communicates a requirement that the
'ubyte' element type does not: utf-8 awareness. And that can have
a cost.
In D, char[] arrays are defined as containing utf-8 text. This is
rather different from C, where the 'char' type is more like D's
'byte' or 'ubyte' types and just happens to also be used to store
text data in any encoding the author feels like. When I see
"foreach(ch; msg)" and msg's element type is "char", then I
expect "ch" to be of type 'dchar' (instead of 'char') and I
expect the foreach loop to auto-decode the utf-8 text in the
string (or immutable(char)[]) type into whole unicode codepoints
that are then placed into the 'dchar'. If you are only dealing
with ASCII text (or any 8-bit-or-less encoding that isn't utf-8),
then you may just want to use the 'byte' or 'ubyte' types
instead. In everyday D, this changes the semantics of the foreach
loop, because no autodecoding is done on types like byte[] or
ubyte[], and it may "behave" (from an implementor perspective)
more like the while-loop in your second example.
You probably won't see a lot of text-processing through byte[] or
ubyte[] in normal D code, but that's because most programmers
will want their programs to be able to process utf-8 text, while
in the embedded programming space you might not have to worry
about utf-8 at all.
Now, I actually didn't see any autodecoding of utf-8 in the
assembly you posted. Maybe I could be wrong though; I am not
experienced in lm32 assembly. Nonetheless, I'd expect to seem
some sort of conditional call or, at the very least, some kind of
masking of the highest bit of every char (to detect utf-8
sequences). Maybe it's a bug in your (cross?) compiler, or even
just an intentional configuration choice that I didn't expect. At
any rate, I don't think your code is larger or less efficient due
to utf-8 decoding, because I don't see the utf-8 decoding.
Still, I'm curious to see if changing up the types causes the
compiler to choose different codepaths for its codegen, even for
inane reasons. Maybe the autodecoding is turned off, but it still
thinks it needs to allocate extra space for the autodecoder's
"dchar" or something, and then that exceeds some threshold for
passing enregistered arguments. Maybe for similar reasons it
thinks it needs to keep a copy of that string around. Compilers
are mysterious beasts sometimes. *shrug*
---------------------------------------
(5) And for maximum curiousity, what happens if you write the
C-like version this way instead?
// C-like version
// msg parameter change: "const char *msg" -> "const(char)* msg"
void write_to_hostC(const(char)* msg, int len) {
// cast() statement removed.
char *usb_slave = cast(char*)BaseAdr.ft232_slave;
while (len--) {
*usb_slave = *msg++;
}
}
Explanation:
I realize the difference is subtle, but "const char *msg" says
that both the pointed-to chars can't be modified and also that
the /pointer itself/ cannot be modified. In the other case, with
"const(char)* msg", the constraint is looser but still very
useful: the pointed-to chars can't be modified, but the pointer
can be modified. Because the pointer (but not the referred data)
is a copy of the caller's pointer, any modifications to the
pointer (increments and such) are only visible within the scope
of this function.
The C-like version is already the more optimal one, but if making
this change causes it to regress to generating assembly similar
to the D-like version, then it might suggest that the additional
assignment statement is actually helpful somehow. It'd be
unintuitive, but you never know.
---------------------------------------
(6) OK sorry, one more. Because #5 made me think: what if we
extended the D-idiomatic-version's immutability guarantee to the
whole array value and not just the array elements?
// ideomatic D version
void write_to_host(immutable(char[]) msg) {
// a fixed address to get bytes to the host via usb
char *usb_slave = cast(char*)BaseAdr.ft232_slave;
foreach(ch; msg) {
*usb_slave = ch;
}
}
And to make it even more like the C-style version without being
C-style, it might also be worth stacking it with the
immutable->const change:
// ideomatic D version
void write_to_host(const(char[]) msg) {
// a fixed address to get bytes to the host via usb
char *usb_slave = cast(char*)BaseAdr.ft232_slave;
foreach(ch; msg) {
*usb_slave = ch;
}
}
After all, if the original C-style version isn't allowed to
change its argument's pointer, then we could try making the
D-idiomatic version behave that way too, and see if this minor
alteration makes the difference.
---------------------------------------
Just to be safe, I also want to point out the difference between
"char *ptr" and "char* ptr": in a single-variable declaration,
there is none, but if there is more than one, the pointer binds
more strongly to the type than to the variable in D.
Consider a declaration like:
char* str0, str1;
In C, this would make str0 a pointer, and str1 a char.
In D, this means that both str0 and str1 are pointers.
Thus, in D, it is more conventional to write the * character next
to the type than it is to write it next to the
variable/identifier. This reinforces the notion that pointer-ness
is (syntactically) part of the type, rather than part of the
variables.
There's a similar example in this article:
https://dlang.org/blog/2018/10/17/interfacing-d-with-c-arrays-part-1/
If you already knew that, don't mind me. I realize that a lot of
C code gets copied into D without changing this thing, and unless
there are multiple variables in the same declaration, it really
doesn't matter.
Good luck with your lm32/FPGA coding. That sounds like cool stuff!
More information about the Digitalmars-d-learn
mailing list