Shafik Yaghmour

Compiler Engineer at Intel. This is a personal blog. The opinions stated here are my own, not those of my company.

C++ initialization, arrays and lambdas oh my!

20 Sep 2022 » C++

I recently ran a C++ weekly quiz #198 with the following code:

int main() {
    int arr[5]{1,2};

    return arr[ [](){return 4;}() ]; // What does main return?
}

Likely folks will be surprised that this does not compile (see it live). The standard says that two consecutive [ shall only be used to introduce an attribute, see [dcl.attr.grammar]p7:

Two consecutive left square bracket tokens shall appear only when introducing an attribute-specifier or within the balanced-token-seq of an attribute-argument-clause.

Originally when attributes was introduced into C++ this restriction did not exist but later on it was realized that [[ could be ambiguous in some cases. Such as introducing lambda inside an array subscript. See defect report 968:

The [[ … ]] notation for attributes was thought to be completely unambiguous. However, it turns out that two [ characters can be adjacent and not be an attribute-introducer: the first could be the beginning of an array bound or subscript operator and the second could be the beginning of a lambda-introducer. This needs to be explored and addressed.

The reason this matters is that allowing this would complicate the compiler. It would require unlimited look ahead in order to disambiguate. Introducing this complexity to solve the few cases where it mattered are not compelling enough to compensate for the additional complexity. For example:

arr[[someSetOfCharacter](){return 4;}()]
  // ^ unlimited lookahead needed to decide if this is a capture of attribute-list 

Ultimately, in order to disambiguate you can add parenthesizes:

return arr[ ( [](){return 4;}() ) ];
         // ^                   ^
	 // Added parenthesizes make this valid.

This in some ways is similar to maximal munch type problems. Which I have written about previously. The most infamous pre C++11 case of maximal munch was that of closing template parameter lists:

std::vector<std::vector<int>> v;
                          //^^ Prior to C++11 we required a space between each >

This was seen as a significant gothca so an exception was added to fix it.

Although unlike maximal munch [[ is not a token unlike other cases such as >>, /*, *= etc which are all operators or punctuators. Also unlike classical maximal munch problems adding a whitespace between each [ does not disambiguate.