Can't set BP in middle of line
wjoe
fake at example.com
Fri Apr 12 11:51:23 UTC 2019
On Monday, 26 November 2018 at 10:42:02 UTC, Michelle Long wrote:
> On Tuesday, 20 November 2018 at 15:11:56 UTC, Stefan Koch wrote:
>> On Sunday, 4 November 2018 at 21:17:32 UTC, Michelle Long
>> wrote:
>>> On Sunday, 4 November 2018 at 08:27:34 UTC, Rainer Schuetze
>>> wrote:
>>>> [...]
>>>
>>>
>>> Is it then possible to simply split a line internally to
>>> handle it?
>>>
>>> [...]
>>
>> Debug information is mapped to machine-code on a source line
>> by source line basis.
>>
>> Therefore it's indeed nontrivial to do this.
>
>
> Why would that be non-trivial?
>
> semantically there is nothing different between
>
> statement; statement;
>
Theoretically nothing's stopping a debugger from setting a
breakpoint in between those 2 statements.
Because when those statements are compiled to machine code that
structure doesn't exist anymore. There's no whitespace in machine
code nor statements separated by a delimiter.
Consider this:
A)
10| int x = 0; int y = 0; // this is line number 10 in the source
file
11| writeln(x, y);
B)
10| int x = 0, y = 0; // this is line number 10 in the source file
11| writeln(x, y);
C)
10| int x = 0; // this is line number 10 in the source file
11| int y = 0;
12| writeln(x, y);
this would be translated to something like this (and is highly
dependent on the architecture):
0x00230| xor accumulator, accumulator
0x00231| move to-address-of-var-x, value-of-accumulator
0x00233| move to-address-of-var-y, value-of-accumulator
0x00235| ...
^
| this is the address of the machine code in memory
all of these assignments would compile to the same 3 machine
instructions.
(This is not quite how it actually works but just to show the
concept. Depending on the architecture it might not be possible
to write to an address directly, and it would have to be loaded
into an address register to which the value would then be moved.
Also consider stack, variable alignment, machine instructions
might need to be aligned and padded with NOPs, etc., etc.)
So what happens when you set a breakpoint on line 10 in your
debugger ?
The debugger will rewrite your code in RAM and overwrite the
instruction on address 0x00230 with an INT3, like so:
0x00230| INT3
0x00232| move to-address-of-var-x, value-of-accumulator
0x00234| move to-address-of-var-y, value-of-accumulator
0x00235| ...
The INT3 is a one-byte instruction (and applies to the x86
architecture) intended to be used by debuggers in order to
interrupt the flow of a running program.
The important part here is: one-byte instruction. Which means it
can be used to overwrite any machine instruction.
Once the CPU executes the INT3 instruction, the execution stops
and you regain control of your program in the debugger.
Now it depends on how you proceed.
I) You continue execution, e.g. by pressing F5 -> The debugger
restores the original opcode (xor accumulator, accumulator) to
address 0x00230, and transfers control back to the CPU which
continues to execute instruction at address 0x00231, 0x00232, etc.
II) You single step, e.g. by pressing F8.
Case A) and B) behave in the same way, the debugger would write
the INT3 instruction to 0x00235, restore 0x00230 and resume. And
halt again, executing code at 0x00235.
Case C) behaves similarly but would write INT3 to 0x00233, then
next step to 0x00235, etc.
So, because INT3 is a one byte instruction, theoretically,
debuggers could set a breakpoint in between the same one
statement, like after pushing the function parameters on the
stack, but before making the actual call.
0x00235| push x
0x00237| push y
0x00239| call writeln | <- by overwriting the call with INT3
But if all this is possible in theory why can't you break on your
'statment2;' ?
The answer is granularity. Line number ganularity to be more
specific.
In order for the debugger to know where statement 1; is the
compiler must generate debug information. Which is a map that
maps the address of a machine instruction to a line in - oh wait
- what actually is a line ?
Technically there are no lines in your source code either. It's
simply a stream of characters and lines are a convention, a
formatting hint, for your editor.
This convention differs even between different OSs.
On Posix the convention is a line feed (ASCII code 10).
On Windows it is a carriage return (ASCII code 13) followed
immediately by a line feed.
You can see this concept when you open a text file produced in a
Posix environment in Windows Notepad. (although, I believe, a
recent Windows 10 Notepad understands both conventions.)
So, back to debug info.
The compiler maps the address of the instruction to the offset of
that CR/LF offset in your source code file. (It might be a little
bit more complicated than that with indirections and stuff so
someone more knowledgeable in that domain might want to weigh in)
And since the CR/LF is the granularity of the debug info you
can't get better resolution for your breakpoints in your debugger.
Now if you wanted to set a breakpoint at any statement in a line,
or inside for(...;...;...) loops, etc. you would need a) a
compiler that can produce this kind of debug info, and b) a
debugger which can take advantage of it.
So technically you would need 2 different pieces of software to
work together which probably are made by completely different
groups people, who would have to reach consensus - and that's why
it's non-trivial.
More information about the Digitalmars-d-debugger
mailing list