interpolation proposals and safety

Sat Aug 31 21:35:39 UTC 2024

On Friday, 30 August 2024 at 15:09:39 UTC, H. S. Teoh wrote:
> The program logic has to be structured in such a way that *all* 
> input data is properly escaped,

What does that mean? Can, after such input escaping, Bobby Tables 
first name still be stored in the database?

> or *all* output data is properly encoded.  The latter is much 
> harder; recoding input data is recommended.

By whom? And "recode" (re-encode) into which code?

The web is full of content recommending the opposite:

- https://benhoyt.com/writings/dont-sanitize-do-escape/

"Every so often developers talk about “sanitizing user input” to 
prevent cross-site scripting attacks. This is well-intentioned, 
but leads to a false sense of security, and sometimes mangles 
perfectly good input."

- 
https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html

- 
https://security.stackexchange.com/questions/32394/when-to-escape-user-input (2013)

"In general, you want to keep strings as strings, and delegate 
any encoding or escaping to specialized functions which do that 
well. For instance, for SQL, you use prepared statements. With 
HTML from a PHP context, you would use htmlspecialchars()."

- 
https://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea/

"The right way to handle issues with untrusted data is:

     Filter on input, escape on output

This means that you validate or limit data that comes in 
(filter), but only transform (escape or encode) it at the point 
you are sending it as output to another system that requires the 
encoding. It has been standard best practice since just about 
forever [citation required].

[...]

First of all, escape-on-input is just wrong – you've taken some 
input and applied some transformation that is totally irrelevant 
to that data. If, taking our example, you have some data 
collected by HTTP POST or GET parameters, applying HTML escaping 
to it is a layering violation – it mixes an output formatting 
concern into input handling. Layering violations make your code 
much harder to understand and maintain, because you have to take 
into account other layers instead of letting each component and 
layer do its own job.

Doing things ‘right’ is very important, even if doing them 
‘wrong’ seems to work and you are tempted to be dismissive of 
‘theoretical’ concerns about purity etc. When you have to 
maintain code, you will be very glad if things are in the right 
place, and not full of hacks and surprises."