Typical security issues in C++: why the GC isn't your enemy

H. S. Teoh hsteoh at qfbox.info
Mon Dec 5 19:57:39 UTC 2022


In the past, I've posted about my impressions of coding issues
encountered in a large C codebase (approx 2M LOC), as found by Coverity.
My impression was that there was a large predominance of bugs related to
memory management and raw pointers.  However, I didn't have actual data
to back up my memory.  So I decided to do a slightly more evidence-based
analysis by doing a little informal analysis of the following list of
CVE issues in the Chromium browser, a commonly-used browser, from the
Debian/Linux security tracker page:

	https://security-tracker.debian.org/tracker/source-package/chromium

There are 1239 issues listed on this page, with several frequently-
recurring keywords / key phrases, that one could argue represents the
most commonly encountered issues in a typical large C++ codebase. Here
are some of keywords of interest with their counts:

Use after free:				423	(34%)
Insufficient policy enforcement:	159	(13%)
Inappropriate implementation:		149	(12%)
Buffer overflow:			98	(8%)
Data validation:			91	(7%)
Out of bounds access/read/write/etc:	71	(6%)
Type confusion (JS):			48	(4%)
Incorrect security UI:			33	(3%)
Integer overflow:			22	(2%)
Object lifecycle/lifetime:		15	(1%)
Uninitialized data/use:			14	(1%)
Information leak:			14	(1%)
Side-channel:				7	(<1%)
Handling of confusable characters:	7	(<1%)
Data race:				7	(<1%)
Double free:				3	(<1%)


Most interesting point here is that the largest category of bugs is
use-after-free bugs, constituting 34% of the reported issues.  (Arguably
we should include "object lifecycle/lifetime" in this category, but I
think those refer to bugs in the JS implementation. In any case, it
doesn't change the conclusion.)  This is strong evidence that memory
management is a major source of bugs, and a strong argument for GC use
in application code.

The next largest categories are insufficient policy enforcement and
inappropriate implementation (it's unclear what exactly the latter
means, at a glance it looks like various issues with JS and various
browser features).  I contend that these two categories could be lumped
together as application / business logic bugs.  Tellingly, these add up
to 25%, overshadowed by use-after-free bugs.

D's bounds checks are often touted as a major feature to prevent issues
with buffer overflow and out-of-bounds accesses.  Interestingly, "buffer
overflow" and "out of bounds..." add up only to about 14% of the total
issues.  Nothing to sneeze at, but nonetheless not as big an issue as
use-after-free bugs.

Integer overflow is also sometimes brought up as something important;
but at least according to the above categorization it only accounts for
2% of issues.  So not as big a deal as some may have made it sound.

Similarly, D's initialized-by-default variables are often touted as a
big thing, but overall issues with uninitialized variables only
constitute about 1% of the total issues.

I included also a few categories with small counts that are nonetheless
interesting: side-channel attacks, which in recent years have been
making noise in security circles, seem not as common as one might think
(<1% of total issues).  Also interesting is the "confusable character"
category: apparently this is increasingly being recognized as an issue
in today's climate of spoofers and online swindling. But it's still only
a rather minor category of issues.  Data races are also only a small
category, at least as far as Chromium is concerned.

Most interestingly, "double free" only has 3 counts of the total, less
than 1%, compared with "use after free", which constitute the largest
category of issues.  This seems to suggest that it's not memory
management in general that's necessarily problematic, but it's keeping
track of the *lifetime* of allocated memory.  One could say that this is
proof that lifetime is a complex problem. But again it's a strong
argument that the GC brings a major benefit: it relieves the programmer
from having to worry about lifetime issues.  You can instantly be freed
from 34% of security issues, if the above numbers are anything to go by.
:-P


T

-- 
If Java had true garbage collection, most programs would delete themselves upon execution. -- Robert Sewell


More information about the Digitalmars-d mailing list