Friday, February 8, 2008

Microsoft VS2005 C++ non-compliance issues (Part III)

3. Compiler extensions

This section describes a number of VS2005 features, which can be safely referred to as compiler extensions, but at the same time cannot be disabled by using /Za switch. I never intended to turn this into a complete list of extensions implemented in that compiler. What is listed here is what I came across while testing something that didn't work properly in VC6.

3.1. Default initialization is a bit overdone, but no value-initialization yet

The known VC6 problem with default initialization, when '()' initializer was ignored more often than it should be, is now gone. In the following code sample a '()' initializer correctly causes zero-initialization of a POD object in VS2005

struct POD { int i; }; ... POD* pod = new POD(); // '*pod' object is zero-initialized assert(pod->i == 0);

The proper support for '()' initializer extends to array initialization and constructor initializer lists

struct S { int a[10]; int* p; POD pod; S() : a(), POD() { // Both 'a' and 'pod' subobjects are zero-initialized assert(a[5] == 0 && pod.i == 0); p = new int[20](); // The 'new'ed array is zero-initialized assert(p[10] == 0); } };

It is interesting to note that an attempt to use '()' on an array in a constructor initializer list triggers warning C4351 ("new behavior..."), while no other newly supported context with '()' initializer seems to cause this warning message to appear. Maybe this behavior is only new for VS2005 as compared to VS2003, and it only appears to be illogical to me because I compare VS2005 to VC6.

Further research demonstrates that trying to provide proper support for '()' initializer VS2005 developers have actually overdone it. The C++98 standard requires zero-initialization for POD-types, while VS2005 also zero-initializes certain non-POD types, as can be seen from the following example

struct NonPOD { int i; int nonPOD::*p; private: int j; }; ... NonPOD* nonpod = new NonPOD(); // '*nonpod' object is zero-initialized assert(nonpod->i == 0);

This is, of course, not a violation of the C++98 standard, since zero value is not worse than any other indeterminate value. It's just that portable C++98 code is not supposed to rely on this behavior. Additionally, the post-TC1 specification of C++ introduces the concept of value-initialization, which actually happens to require zero-initialization of the above 'NonPOD' structure in response to '()' initializer. Make no mistake though, VS2005 does not implement value-initialization, as can be illustrated by the following code

struct NoUserConstructor { int i; std::string s; }; ... NoUserConstructor* nuc = new NoUserConstructor(); // In VS2005 'nuc->i' is holding an indeterminate value // here, while value-initialization in post-TC1 C++ // would set it to zero

It appears that in VS2005 the decision to zero-initialize an object of certain type in response to '()' initializer is based on the triviality of that type's constructor. Types without constructor or with a trivial constructor are zero-initialized. Types with non-trivial constructor are not.

3.2. Polymorphic delete[]

This extension earned a place in this blog because it was associated with a genuine bug in VC6. The bug is no longer there in VS2005, thus turning it into a plain extension.

In more detail, the issue can be illustrated by the following example

struct A { virtual ~A() {} }; struct B : A {}; ... A* p = new B; delete p; // OK A* pa = new B[20]; delete[] pa; // Undefined behavior in standard C++, but defined // behavior in VS2005

In standard C++ polymorphic deletion can only be applied to standalone dynamic objects. "Standalone" in this case refers to non-array objects, i.e. objects created with 'new' and destroyed with 'delete' (as opposed to 'new[]'/'delete[]' pair). The 'new[]'/'delete[]' pair doesn't support any kind of polymorphism with regard to deletion. This means that static type of the pointer passed to 'delete[] ' must be exactly the same as that of the pointer returned by 'new[]'. Otherwise, the behavior is undefined. MS compilers since VC6 (maybe even earlier) extend this behavior by defining it. They support polymorphic deletion of arrays by introducing a special kind of implicit virtual destructor, which they internally call 'vector deleting destructor'. However, in VC6 this extended functionality was implemented with a bug, which also caused a crash in perfectly standard code. In the context of the previous example, the following code crashes in VC6 even though it is supposed to work from the standard C++ point of view

A* pz = new A[0]; delete[] pz; // crash here in VC6

This crash no longer happens in VS2005.

3.3. Explicit specialization of member templates

This extension carries over fom VC6 without any apparent changes

template<typename T> struct S { template<typename U> struct R {}; template<> struct R<int> {}; // This explicit specialization is ill-formed in // standard C++, but supported in VS2005 };

Thursday, February 7, 2008

Microsoft VS2005 C++ non-compliance issues (Part II)

2. Issues that respond to /Za

Header files supplied with VS2005 seem to tolerate /Za setting (although, as I said before, I haven't checked all of them). For this reason, the issues listed in this section are not as critical as the previous ones. Most will probably see them as compiler extensions, which can be easily disabled if necessary. Yet, if for some reason you can't use /Za then these are something to be aware of.

Since the /Za switch is supposed to control language extensions, one might ask about the difference between the extensions listed in this section (part II) and extensions listed in the next one (part III). Well, by definition, a true "language extension" takes place when the compiler takes specific steps to define the behavior of a program, whose behavior would otherwise be undefined by the language specification (that includes ill-formed code). In other words, extensions allow the compiler to treat an invalid C++ program as a valid one, interpreting it in some implementation-defined fashion. However, one key moment here is that the compiler is never allowed to change the behavior of what is originally a valid C++ program. Unfortunately, in a number of cases VS2005 compiler does change the behavior of a valid C++ program to the point when it no longer agrees with the language specification. Even though these issues can be controlled with /Za switch, they still aren't "language extensions". This part is specifically intended to include the issues of this particular kind, while part III is intended to be used for true language extensions.

The text below describes the compiler behavior with language extensions enabled (i.e. /Za is not used).

2.1. Implicit conversion of function pointers to 'void*'

VS2005 assumes that function pointers are implicitly convertible to 'void*' type. For example, in the following code sample

void foo(); ... std::cout << &foo << std::endl;

overload resolution will select the version of '<<' operator that outputs 'const void*' values. A compliant compiler shall select the version for 'bool' values in this case.

Additionally, during overload resolution VS2005 believes that implicit conversion of a function pointer to 'void*' is a better alternative than matching that function pointer to ellipsis parameter specification, as illustrated by the following example

void foo(); void bar(void*); void bar(...); ... bar(foo); // 'bar(void*)' is called here, while 'bar(...)' is // supposed to be called by a compliant compiler

This behavior might lead to incorrect results in some known template meta-programming tricks and techniques.

Once again, VS2005 seems to behave correctly in this respect when used with /Za switch.

2.2. Multiple user-defined conversions are allowed in copy-initialization

This issue is best illustrated by the following code sample

struct A { A(int); }; struct B { B(const A&); }; ... // Direct-initialization, B bd(0); // OK // Copy-initialization B bc = 0; // ill-formed, but OK in VS2005

The initialization of 'bc' is a copy-initialization and the types involved on the left-hand side ('B') and the right-hand size ('int') are not the same. In this case the initialization must attempt to convert type 'int' to type 'B' by selecting one of the 'B's conversion constructors and then copy the result of the conversion to 'bc' by using 'B's copy constructor. This process in not allowed to perform any additional intermediate conversions, like convert 'int' to 'A' and then convert 'A' to 'B'. Yet that's exactly what VS2005 does.

One side effect of this behavior is that the following initialization compiles fine in VS2005

std::auto_ptr<int> p = new int;

while virtually everyone who ever worked with 'std::auto_ptr' knows for a fact that this code is supposed to be ill-formed. Prohibiting this initialization is actually the reason why the pointer-to-auto_ptr conversion constructor is declared 'explicit'. Standard library, which comes with VS2005 also declares this constructor 'explicit', but its effect is immediately defeated by the issue in question: VS2005 happily works around the restriction by performing a dual user defined conversion. Firstly, it converts the pointer to 'auto_ptr_ref', and then it converts the resultant 'auto_ptr_ref' to 'auto_ptr'. (Why 'auto_ptr_ref' is so immediately available to user code is another question.)

The first thing that comes to mind is that VS2005 simply handles copy-initialization in the same manner as direct-initialization.

One interesting detail about this extended behavior is that it applies to the implicit copy-initialization in function return, but doesn't apply to function argument passing. Function arguments are initialized by copy-initialization and that particular copy-initialization works correctly, meaning that it is restricted to just one user-defined conversion. For example, in the context of the first code sample the following code will not compile in VS2005

B foo(B){ return 0; // ill-formed, but OK in VS2005 } ... foo(0); // ill-formed, rejected by VS2005

Most likely the argument passing context was singled out and left unextended in order to avoid breaking some known overloading-based template meta-programming techniques, which otherwise would become useless in VS2005. Under these circumstances it is difficult to say whether this behavior of VS2005 is a relatively harmless language extension or a serious code-breaking non-compliance issue. For now I'll leave it in this part of the report.

2.3. Non-constant reference can be bound to a temporary

This is an old and a well-known issue with VC6, which could be fixed with /Za switch in VC6, just like it can be fixed with that switch in VS2005.

struct S {}; ... S& r = S(); // ill-formed, but OK in VC6 and VS2005

However, there is a number of interesting changes in VS2005, which generally make this issue not as harmful as it was in VC6.

In VC6 the compiler used to carelessly bind non-constant references to temporary objects without even giving it a second thought. For example, given a choice of constant and non-constant reference during overload resolution VC6 blindly selected the latter

struct S {}; void foo(S&); void foo(const S&); ... foo(S()); // VC6 resolves it to the 'foo(S&)' version

VS2005 appears to follow a different logic. During overload resolution it still considers functions having non-constant reference parameters for temporary arguments as viable functions. However, the conversion sequence necessary to perform the binding is given the lowest rank. This generally means that VS2005 tries to perform the overload resolution as close to the standard requirements as possible, and only when if the standard resolution fails does it consider the non-standard binding as the last resort. In accordance with this approach in the previous code sample VS2005 will correctly select the 'foo(const S&)' version of overloaded function. In the next sample VS2005 also exhibits standard-compliant behavior

struct S { operator int() const; }; void foo(S&); void foo(int); ... foo(S()); // VS2005 resolves it to the 'foo(int)' version

Microsoft VS2005 C++ non-compliance issues (Part I)

I'm currently doing some research on the standard conformance of C++ compiler and standard library supplied with MS VS2005 SP1 (compiler version 14.00.50727.762). The main reason of creating this blog entry is that so far I was unable to find a Web page, which would provide a good list of known non-compliance issues with the aforementioned compiler (I'd appreciate a link, if anyone could provide one). Also I'd like to note here that for me this happens to be a part of the process of transitioning from MS VC6 to VS2005 and, for this reason, I'll begin with testing the VS2005 version against the "usual suspects": the most simplistic and obvious deviations from the C++ specification present in VC6 SP5. Maybe later I'll update this blog entry (or create an additional one) with more complex issues, possibly specific to VS2005 only.

By default I will evaluate the compiler's behavior from the point of view of C++98 standard, trying to keep in mind the known issues within the document. I don't expect VS2005 to observe the changes introduced in TC1, but of course it is always more than welcome to follow the new specification.

Just for starters I'd like to say that I was impressed by the number of compiler bugs fixed in VS2005. (This is, once again, compared to VC6. To those who got to use VS2003 most of this might be old news.) Almost everything I tried checked out just fine right away and some things that looked wrong initially could be fixed by the [infamous] /Za switch. The latter appears to be much more useful that it used to be (i.e. as opposed to being completely useless in VC6), since now the compiler appears to be able to compile its own system and standard library headers in presence of /Za, although I can't say that I thoroughly tested this.

OK, here comes the list of what is still wrong.

1. Issues not fixed by /Za

1.1. String literals are thrown as 'char*' values

This issue is inherited from VC6 unchanged. The following code will catch the exception as 'char*' one in VS2005, while a compliant compiler shall not do it

try { throw "Hello"; } catch (char*) { // VS2005 catches it here... } catch (const char*) { // ... while it is supposed to be caught here }

The problem here is not with 'catch', since it can be easily demonstrated that 'catch' in VS2005 can reliably distinguish between const and non-const pointer types. The problem is with 'throw', which manages to lose the const-qualification of the result of array-to-pointer conversion applied to string literal.

It is interesting to note that VS2005 does realize that the type of string literal is 'array of const char', which is a welcome change from VC6. VC6 firmly believed that string literals have 'array of char' type, which lead to incorrect behavior in many other contexts, in addition to the one being considered. I checked a few of these contexts in VS2005 and they worked fine. For example, overload resolution now works correctly

void foo(const char*); void foo(char*); ... foo("Hello"); // Calls 'foo(const char*)' as it is supposed to, while // VC6 would incorrectly call 'foo(char*)'

Also 'typeid' now behaves properly

assert(typeid("Hello") == typeid(const char[6])); // Holds in VS2005 and fails in VC6 assert(typeid("Hello") == typeid(char[6])); // Holds in VC6 and fails in VS2005 // VS2005 exhibits the correct behavior

Meanwhile, 'throw' is still broken. Setting /Za option doesn't make it work the way it is supposed to. It is hard to say how this issue managed to survive in view of the fact that VS2005 now sees the type of string literal correctly. Was it preserved intentionally for backward compatibility? How come /Za has no effect on it then?

1.2. Exception specifications are not checked at compile time

I know that exception specifications in VS2005 are "parsed and ignored" with the exception of the empty one 'throw()', which does have some beneficial effect on the generated code. Moreover, I'm not a big fan of run-time functionality of exception specifications and as far as I know I'm not alone. However, together with run-time effects of exception specifications MS compilers so far managed to ignore the compile-time ones. More precisely, VS2005 (just like VC6) fails to enforce the requirements imposed on the exception specification of a virtual overrider. For example, the following code is ill-formed, but compiles without any diagnostic messages in VS2005

struct A { virtual void foo() throw(); }; struct B : A { void foo(); // ill-formed, not caught by VS2005 };

In C++ the exception specification of an overriding virtual function must be at least as restrictive as the exception specification of the corresponding function in the parent class, i.e. in the above example 'B::foo()' is required to be specified as 'throw()'. The problem with this is that what appears to be a correct code in VS2005 might fail to compile on any other platform. For some it is not a big deal, but it happens to be one for me.

One can argue that since exception specifications are mostly useless, they should not be used at all and the problem in question will never arise. There are two things that can be said in response to this argument. Firstly, as it's been said above the empty specification 'throw()' is actually useful. Secondly, the standard library does use exception specifications, which can lead to unexpected errors even it the users themselves avoid them in their code. For example this innocent looking code is ill-formed and VS2005 doesn't detect the problem

class my_exception : public std::exception { std::string s; };

The culprit is the implicitly-declared virtual destructor of 'my_exception' class. Since the destructor of the only data member 's' has unrestricted exception specification, the implicitly declared destructor of 'my_exception' also has unrestricted exception specification. At the same time virtual destructor of the base class 'std::exception' is specified as 'throw()'. Now it is obvious that the destructor of the derived class attempts to extend the exception specification of the virtual destructor it overrides, and the code is ill-formed.

1.3. Two-phase name lookup is still not implemented

Name lookup for non-dependent names used in template definitions is still delayed till the moment (and point) of actual instantiation. The following perfectly valid code sample will not compile in VS2005

int foo(void*); template<typename T> struct S { S() { int i = foo(0); } // A standard-compliant compiler is supposed to // resolve the 'foo(0)' call here (i.e. early) and // bind it to 'foo(void*)' }; void foo(int); int main() { S<int> s; // VS2005 will resolve the 'foo(0)' call here (i.e. // late, during instantiation of 'S::S()') and // bind it to 'foo(int)', reporting an error in the // initialization of 'i' }
1.4. Explicit template argument specification is ignored in qualified template names used as default arguments

In order to reproduce this problem one needs to mix several "ingredients": a qualified name of a function template should be used as a default argument in another function template declaration. Under these conditions explicitly specified arguments of the former template are ignored. The following code sample illustrates the problem

namespace N { template <typename T> T foo(); } template <typename U> void bar(int i = N::foo<int>()); int main() { bar<int>(); // VS2005 fails to compile the call, complaining about // not being able to deduce the template argument for // 'N::foo' }

In this case VS2005 complains about not being able to deduce the template argument for 'N::foo' call, even though the argument in question is specified explicitly. It is fairly easy to demonstrate that the problem is caused by the fact that the explicitly specified template argument is simply ignored. The compiler makes an attempt to deduce the argument and fails, since it is non-deducible in the above code sample. If we modify the code to make it deducible, the compiler will "prefer" the deduced argument, once again ignoring the explicitly specified one

namespace N { template <typename T> T foo(T t); } template <typename U> void bar(int i = N::foo<int>(0.0)); int main() { bar<int>(); // VS2005 specializes the 'N::foo' in the default // argument as 'N::foo<double>' }

The problem doesn't seem to be tied to namespaces in any way, since it reproduces just as well with a function template declared as a class member (as opposed to namespace member)

struct N { template <typename T> static T foo(); }; ...

With a namespace declaration a using-declaration can be used to eliminate the need for a qualified name, thus making the problem to go away, as shown in the code sample below

namespace N { template <typename T> T foo(); } using N::foo; template <typename U> void bar(int i = foo<int>()); int main() { bar<int>(); // Compiles correctly }
1.5. Name lookup refuses to search for data member names from within a data member declaration

When defining a class, the declaration of each member can refer to the previously declared members of the same class. For example, in the following code snippet

struct S { enum { size = 100 }; int a[size]; };

the declaration of data member 'S::a' refers to the previously declared enumeration constant 'S::size'. The generic lookup mechanism for the member name is, of course, implemented in VS2005. However, it appears to be artificially restricted to look up for some kinds of class members, and refuses to look up others, such as non-static data members, for example. Apparently the compiler authors believed that there's no valid context in which the compiler would have to look for a non-static data member. In reality, such contexts do exist. A reference to a non-static data member name can be used as a template argument in a member declaration, as in the following code sample

template <class S, int S::*> struct T {}; struct S { int x; T<S, &S::x> t; // VS2005 fails to compile the declaration, refusing // to look up the non-static data member name };

The compiler responds with the error messages 'error C2327: 'S::x' : is not a type name, static, or enumerator' and 'error C2065: 'x' : undeclared identifier', while in fact the code is perfectly valid.

It is interesting to note that in the absolutely similar context the lookup for a member function name works perfectly fine, even though member function name is not "a type name, static, or enumerator" either

template <class S, void (S::*)()> struct T {}; struct S { void f(); T<S, &S::f> t; // compiles fine };
1.6. Const-qualified array types work incorrectly as template arguments

In C++, when a constant array type (i.e. an array of constants) is associated with const-qualified template type-parameter, the compiler has to be able to interpret the constness of array elements as the constness of the entire array, and properly match it with the const-qualifier on the template type-parameter. For example, when the argument type 'const int[10]' is passed as a template type-parameter 'const T&', the type 'T' in this case shall stand for 'int[10]', not for 'const int[10]'. Unfortunately, VS2005 gets it wrong and interprets 'T' in this case as 'const int[10]'. The following code demonstrates the incorrect behavior

template <class T> void foo(const T& c) { std::cout << (typeid(T) == typeid(int[10])) << std::endl; std::cout << (typeid(T) == typeid(const int[10])) << std::endl; } int main() { const int a[10] = {}; foo(a); }

The above program outputs 0 and 1, while the correct behavior is to output 1 and 0.

The consequences of this problem are apparent in template argument deduction. For example, VS2005 fails to compile the following code

template <class T> void foo(T&, const T&); int main() { int l[10]; const int r[10] = {}; foo(l, r); // VS2005 fails to compile the call, complaining about // ambiguous template parameter 'T' }

because the argument deduction for the first and for the second argument produce different values of 'T' ('int[10]' and 'const int[10]'), while in reality they should produce the same 'T' (just 'int[10]' for both arguments) and the code should compile.

Another context, where the same problem arises is partial specialization, as in the following code sample

template <class T> struct S {}; template <class T> struct S<const T> { int i; }; int main() { S<const int[10]> s; s.i = 0; // VS2005 reports an error, insisting that there's // no 's.i' }
1.7. Name lookup problem with 'using' directive and out-of-namespace definitions

When a function declared as a member of a namespace is defined outside of its namespace, any 'using' directives present in the declaration namespace "spill out" into the definition namespace causing potential ambuguities during name lookup. In the following code sample

typedef int T; namespace A { typedef int T; } namespace B { using namespace A; void foo(); } void B::foo() {} T i; // VS2005 reports an error, complaining about ambiguious // name 'T', with '::T' and 'A::T' being two possible // candidates

function 'B::foo' is defined in the global namespace, which makes the names from namespace 'A' (nominated by 'using' directive in 'B') to pollute the global namespace and cause name conflicts. If we put the declaration of 'i' before the definition of 'B::foo' the problem will disappear. Also, as one would expect, removing the 'using namespace A' directive from namespace 'B' fixes the problem.

1.8. Zero-initialization of static objects is implemented naively

It is well-known that the language specification requires all objects with static storage duration to be zero-initialized before any other initialization begins. Zero-initialization, as always, means initialization with logical zeros. Since for almost all data types on our "everyday" platforms the logical zero coincides with physical zero, most of the time the implementation can get away with filling the corresponding memory region with all-zero bit pattern. However, one often-overlooked category of types usually has different representation for its logical and physical zeros. These are pointers of pointer-to-data-member type. Since internally they are usually implemented as mere offsets, the physical zero value is actually useful, and some other physical value must be reserved to represent the null-pointer value of the type. The all-one bit pattern (0xFFFF...) is usually chosen for that purpose. This is the case in VS2005 implementation as well. However, the compiler happens to forget that objects of such types have to be initialized with that all-ones patern at program startup. In the following simple program

struct S { int i; }; int S::*p; int main() { assert(p == NULL); p = &S::i; assert(p != NULL); }

the first assertion will fail, even though it is not supposed to. Adding an explicit zero initializer to the declaration of 'p' fixes the issue.