Google C++ style guide

Sat Oct 3 19:08:44 PDT 2009

bearophile wrote:
> I have found this page linked from Reddit (click "Toggle all summaries" at the top to read the full page):
> http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
> 
> At Google C++ isn't the most used language, so it may be better to use a C++ style guide from a firm that uses C++ more than Google. On the other hand Google has hired many good programmers, and probably some of them have strong C++ experience, so if you are interested in C++/D this style guide deserves to be read.
> 
> This guide is mostly (as it often happens with C++) a list of features that are forbidden, I think usually to reduce the total bug count of the programs. Some of such imposed limits make me a little nervous, so I'd like to remove/relax some of those limits, but I am ignorant regarding C++, while the people that have written this document are expert, so their judgement has weight.
> 
> They forbid several features that are present in D too. Does it means D has to drop such features (or make them less "natural", so the syntax discourages their use)?
> 
> Here are few things from that document that I think are somehow interesting. Some of those things may be added to D style guide, or they may even suggest changes in the language itself.

I think these are more programming guidelines than language design 
rules. That's like most academic teachers saying "goto" is evil and 
should never be used, yet new languages like D still support it.

> -------------------
> 
>> Function Parameter Ordering: When defining a function, parameter order is: inputs, then outputs.<
> 
> D may even enforce this, allowing "out" only after "in" arguments.

That can be good for readability in most cases, but I also like to order 
parameters in logical order instead of storage class order, enforcing 
parameter order would also break lots of existing code.

>> Static and Global Variables: Static or global variables of class type are forbidden: they cause hard-to-find bugs due to indeterminate order of construction and destruction. [...] The order in which class constructors, destructors, and initializers for static variables are called is only partially specified in C++ and can even change from build to build, which can cause bugs that are difficult to find. [...] As a result we only allow static variables to contain POD data.<
> 
> I think D avoids such problem.

Indeed, static ctors/dtors are very useful but I like to keep their 
number down to a minimum and perform lazy initialization instead.

> -------------------
> 
>> Declaration Order: Use the specified order of declarations within a class: public: before private:, methods before data members (variables), etc.<
> 
> D may even enforce such order (Pascal does something similar).

Again, I wouldn't want to enforce such an order, sometimes I declare a 
private helper method right next to the set of public methods using it 
so I don't have to scroll down 400 lines to view the two.

> -------------------
> 
>> Reference Arguments: All parameters passed by reference must be labeled const.<
> 
>> In fact it is a very strong convention in Google code that input arguments are values or const references while output arguments are pointers. Input parameters may be const pointers, but we never allow non-const reference parameters.<
> 
> I think C solves part of such problem forcing the programmer to add "ref" before the variable name in the calling place too. D may do the same.

I don't recall C having a "ref" keyword :)

That guideline I agree with, that's also how I write my parameters, 
although I take it a step further in D with in/const/immutable:

'in' for variables that are not modified and don't escape the method's 
scope.
'const' for variables that are not modified but escape the method's 
scope, maybe with a copy because the data may be mutable somewhere else.
'immutable' for variables that are not modified but escape the method's 
scope, never copied because they're expected to never change for their 
entire lifetime.

> -------------------
> 
> Function Overloading: Use overloaded functions (including constructors) only in cases where input can be specified in different types that contain the same information.
> 
>> Cons: One reason to minimize function overloading is that overloading can make it hard to tell which function is being called at a particular call site. Another one is that most people are confused by the semantics of inheritance if a deriving class overrides only some of the variants of a function.<
> 
>> Decision: If you want to overload a function, consider qualifying the name with some information about the arguments, e.g., AppendString(), AppendInt() rather than just Append().<
> 
> 
> This is a strong limitation. One of the things that makes C++ more handy than C. I accept it for normal code, but I refuse it for "library code". Library code is designed to be more flexible and reusable, making syntax simpler, etc.
> So I want D to keep overloaded functions.

I partly agree, function overloading is very nice if you need generic 
code. But I also agree with the guideline in that you should keep your 
overloads short and to the point.

For example on my output stream interface I allow writes from direct 
data or data from an input stream, those have different names instead of 
an overload because there's nothing generic here.

Anyways, considering how easy it is to write method templates in D 
overloading for different primitive types is almost unneeded.

> -------------------
> 
>> Default Arguments: We do not allow default function parameters.<
> 
>> Cons: People often figure out how to use an API by looking at existing code that uses it. Default parameters are more difficult to maintain because copy-and-paste from previous code may not reveal all the parameters. Copy-and-pasting of code segments can cause major problems when the default arguments are not appropriate for the new code.<
> 
>> Decision: We require all arguments to be explicitly specified, to force programmers to consider the API and the values they are passing for each argument rather than silently accepting defaults they may not be aware of.<
> 
> 
> This too is a strong limitation. I understand that it may make life a little more complex, but they are handy. So I think their usage has to be limited, but I don't like to totally forbid them.
> "Forcing the programmers to consider the API" has some negative side-effects too that they seem to ignore. So I want D to keep its default function parameters feature.

I completely agree here, JavaScript for example has no default 
parameters and it's annoying as hell. Looking at existing code is really 
handy to learn about the usage of a function when the documentation is 
too vague, that documentation is still the best source to learn about 
the parameters.

> -------------------
> 
>> Variable-Length Arrays and alloca(): We do not allow variable-length arrays or alloca().<
> 
>> Cons: Variable-length arrays and alloca [...] allocate a data-dependent amount of stack space that can trigger difficult-to-find memory overwriting bugs: "It ran fine on my machine, but dies mysteriously in production".<
> 
>> Decision:  Use a safe allocator instead, such as scoped_ptr/scoped_array.<
> 
> After reading this page:
> http://www.boost.org/doc/libs/1_40_0/libs/smart_ptr/scoped_array.htm
> I think they are just a pointer that points to heap-allocated memory, plus it gets deallocated when the scope ends.
> 
> In 99.5% of the cases a heap allocation is good enough in D (especially of the GC gets better). But once in a while speed is more important, so for very small arrays I'd like to have variable-length arrays in D (allocating large arrays on the stack is always bad in production code).

I barely use alloca at all, since you don't always know if the array is 
going to be 50 bytes or 20k bytes. If you know the array's size or at 
least the max size it can get then you can just use a fixed-size array 
which will get allocated on the stack.

> -------------------
> 
>> Run-Time Type Information (RTTI): We do not use Run Time Type Information (RTTI).<
> 
>> If you find yourself in need of writing code that behaves differently based on the class of an object, consider one of the alternatives to querying the type. Virtual methods are the preferred way of executing different code paths depending on a specific subclass type. This puts the work within the object itself. If the work belongs outside the object and instead in some processing code, consider a double-dispatch solution, such as the Visitor design pattern. This allows a facility outside the object itself to determine the type of class using the built-in type system. If you think you truly cannot use those ideas, you may use RTTI. But think twice about it. :-) Then think twice again. Do not hand-implement an RTTI-like workaround. The arguments against RTTI apply just as much to workarounds like class hierarchies with type tags. <
> 
> I think this is in most situations acceptable. On the other hand I'd like D to have a better implemented reflection (whithin the bounds of the things that can be done by a static compiler, even if future D implementations may run on a VM, like a future alternative LDC), that can be useful in unittesting.
> 
> I am not sure about this, I don't use RTTI a lot in D code.

Me neither, in fact I would *love* to see a -nrtti switch in DMD to 
disable the generation of all ClassInfo and TypeInfo instances, along 
with a version identifier, maybe "version = RTTI_Disabled;" to let code 
handle it.

I use RTTI a lot for simple debugging like printing the name of a class 
or type in generic code or meta programming, but not at all in 
production code. Most of the time I can rely on .stringof and a message 
pragma to do the same.

> -------------------
> 
>> Casting: Use C++ casts like static_cast<>(). Do not use other cast formats like int y = (int)x; or int y = int(x);.<
> 
>> Pros: The problem with C casts is the ambiguity of the operation; sometimes you are doing a conversion (e.g., (int)3.5) and sometimes you are doing a cast (e.g., (int)"hello"); C++ casts avoid this. Additionally C++ casts are more visible when searching for them.<
> 
>> Do not use C-style casts. Instead, use these C++-style casts.
> * Use static_cast as the equivalent of a C-style cast that does value conversion, or when you need to explicitly up-cast a pointer from a class to its superclass.
> * Use const_cast to remove the const qualifier (see const).
> * Use reinterpret_cast to do unsafe conversions of pointer types to and from integer and other pointer types. Use this only if you know what you are doing and you understand the aliasing issues.
> * Do not use dynamic_cast except in test code. If you need to know type information at runtime in this way outside of a unittest, you probably have a design flaw.<
> 
> I agree with them that mixing all different kinds of cast as in D is bad. In D I'd like to know what I'm doing in a more precise way. This is something that can be improved in D.

I also agree with you here, static/dynamic/reinterpret casts aren't that 
hard to understand in C++ and really say what the programmer wants to 
do, as well as letting the compiler warn you when its not a possible cast.

Its all neat to have a single cast keyword that does it all, but its 
even better to know whats happening and to control it, maybe the cast 
syntax can be extended like this:

cast(Object, static)(new Foo);

as well as dynamic and reinterpret identifiers, which wouldn't be 
keywords anywhere else in the language (just like __traits and pragma do)

> -------------------
> 
> Integer Types:
> 
>> You should not use the unsigned integer types such as uint32_t, unless the quantity you are representing is really a bit pattern rather than a number, or unless you need defined twos-complement overflow. In particular, do not use unsigned types to say a number will never be negative. Instead, use assertions for this.<
> 
> I'm for the removal of size_t from everywhere it's not stricly necessary (so for example from array lenghts) to avoid bugs.

I don't think this guideline was about the size of integrals but rather 
their sign bit.

> See also the recent thread about signed-unsigned issues:
> http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=17800
> 
> Integer oveflow tests too will help.

Yeah I would like overflow tests in D too, although I don't like how you 
can't control which tests are used and which arent, they're either all 
enabled or all disabled.

> -------------------
> 
> Boost:
> 
>> Cons: Some Boost libraries encourage coding practices which can hamper readability, such as metaprogramming and other advanced template techniques, and an excessively "functional" style of programming.<
> 
> Advanced used of templates makes the code less easy to understand. But sometimes functional style makes code shorter, more readable, safer multiprocessing-wise, sometimes even parallelizable, etc.

Boost is the best thing to happen to C++! I agree it can get very hard 
to maintain readability in C++, but D does not have that problem. 
Templates in D are very elegant and much more powerful than C++'s at the 
same time.

It really depends on what you're coding, for example I use very little 
templates in a GUI interface but I use templates on nearly every 
function to handle strings. I also use templates a lot as class/method 
traits to lower the runtime overhead.

> -------------------
> 
> Type Names: often I don't like the C++ practice of using a single uppercase letter for a template type, like T. Better to give a meaningful name to types, when possible.

I think T fits generic template parameters the same way i fits for loops :)

> -------------------
> 
>> Class Data Members: Data members (also called instance variables or member variables) are lowercase with optional underscores like regular variable names, but always end with a trailing underscore.<
> 
> D may even enforce some simple syntax for class members, like that underscore or something else. No other variable is allowed to share the same syntax (so this syntax is used iff it's a class member). It makes conversions from other languages a little more work, but I think it will pay off.

I don't think it should be enforced by the language, it's a great 
guideline but the programmer should be free to select its flavor (ie 
m_var, mVar, _var, var_, etc)

> -------------------
> 
>> Regular Functions: Functions should start with a capital letter and have a capital letter for each new word. No underscores:<
> 
> That's ugly.

That's how I write my method names! Maybe I did too much code around the 
win32 api, the Mozilla code also uses these method names.

I like it that way cause I can easily differentiate variableNames from 
MethodNames from CONSTANT_NAMES :)

> -------------------
> 
>> Spaces vs. Tabs: Use only spaces, and indent 2 spaces at a time.<
> 
> 4 spaces are more readable :-)

Tabs are better since the editor can be set to whatever number of spaces 
you wish for them :) I use 4 myself.

> -------------------
> 
> Loops and Conditionals:
> 
> for ( ; i < 5 ; ++i) {  // For loops always have a space after the
>   ...                   // semicolon, and may have a space before the
>                         // semicolon.
> 
> That space before the ; is quite important. But I don't think there's a need for a warning if it's absent.

Why would there be a warning?

> -------------------
> 
> Bye,
> bearophile