How to map machine instctions in memory and execute them? (Aka, how to create a loader)

rempas rempas at tutanota.com
Mon Jun 6 15:13:45 UTC 2022


I tried to find anything that will show code but I wasn't able to 
find anything expect for an answer on stackoverflow. I would find 
a lot of theory but no practical code that works. What I want to 
do is allocate memory (with execution mapping), add the machine 
instructions and then allocate another memory block for the data 
and finally, execute the block of memory that contains the code. 
So something like what the OS loader does when reading an 
executable. I have come with the following code:

```d
import core.stdc.stdio;
import core.stdc.string;
import core.stdc.stdlib;
import core.sys.linux.sys.mman;

extern (C) void main() {
   char* data = cast(char*)mmap(null, cast(ulong)15, 
PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
   memset(data, 0x0, 15); // Default value

   *data = 'H';
   data[1] = 'e';
   data[2] = 'l';
   data[3] = 'l';
   data[4] = 'o';
   data[5] = ' ';

   data[6] = 'w';
   data[7] = 'o';
   data[8] = 'r';
   data[9] = 'l';
   data[10] = 'd';
   data[11] = '!';

   void* code = mmap(null, cast(ulong)500, PROT_READ | PROT_WRITE 
| PROT_EXEC, MAP_PRIVATE | MAP_ANON, -1, 0);
   memset(code, 0xc3, 500); // Default value

   /* Call the "write" and "exit" system calls*/
   // mov rax, 0x04
   *cast(char*)code = 0x48;
   *cast(char*)(code + 1) = 0xC7;
   *cast(char*)(code + 2) = 0xC0;
   *cast(char*)(code + 3) = 0x04;
   *cast(char*)(code + 4) = 0x00;
   *cast(char*)(code + 5) = 0x00;
   *cast(char*)(code + 6) = 0x00;

   // mov rbx, 0x01
   *cast(char*)(code + 7)  = 0x48;
   *cast(char*)(code + 8)  = 0xC7;
   *cast(char*)(code + 9)  = 0xC3;
   *cast(char*)(code + 10) = 0x01;
   *cast(char*)(code + 11) = 0x00;
   *cast(char*)(code + 12) = 0x00;
   *cast(char*)(code + 13) = 0x00;

   // mov rdx, <wordLen>
   *cast(char*)(code + 14) = 0x48;
   *cast(char*)(code + 15) = 0xC7;
   *cast(char*)(code + 16) = 0xC2;
   *cast(char*)(code + 17) = 12;
   *cast(char*)(code + 18) = 0x00;
   *cast(char*)(code + 19) = 0x00;
   *cast(char*)(code + 20) = 0x00;

   // mov rdx, <location where data are allocated>
   *cast(char*)(code + 21) = 0x48;
   *cast(char*)(code + 22) = 0xC7;
   *cast(char*)(code + 23) = 0xC1;
   *cast(long*)(code + 24) = cast(long)data;
   *cast(char*)(code + 32) = 0x00;

   // int 0x80
   *cast(char*)(code + 33) = 0xcd;
   *cast(char*)(code + 34) = 0x80;

   /* Execute the code */
   (cast(void* function())&code)();
}
```

I'm 100% sure that the instructions work as I have tested them 
with another example that creates an ELF executable file and it 
was able to execute correctly. So unless I copy-pasted them 
wrong, the instructions are not the problem. The only thing that 
may be wrong is when I'm getting the location of the "data" 
"segment". In my eyes, this uses 8 bytes for the memory address 
(I'm in a 64bit machine) and it takes the memory address the 
"data" variable holds so I would expect it to work....

Any ideas?


More information about the Digitalmars-d-learn mailing list