1. 程式人生 > 實用技巧 >使用現代C++如何避免bugs(上)

使用現代C++如何避免bugs(上)

使用現代C++如何避免bugs(上)

How to avoid bugs using modern C++

C++的主要問題之一是擁有大量的構造,它們的行為是未定義的,或者程式設計師只是意想不到的。在各種專案中使用靜態分析器時,我們經常會遇到這些問題。但是,眾所周知,最好的方法是在編譯階段檢測錯誤。讓我們看看現代C++中的哪些技術不僅幫助編寫簡單明瞭的程式碼,而且使它更安全、更可靠。

什麼是現代C++?

在C++ 11釋出後,現代C++這個術語變得非常流行。這是什麼意思?首先,現代C++是一組模式和習語,旨在消除好的舊的“C類”,很多C++程式設計師都習慣了,特別是如果他們開始在C++程式設計11看起來更簡潔易懂,這是非常重要的。

當人們談論現代C++時,人們通常會想到什麼?並行性、編譯時計算、RAII、lambdas、ranges、concepts、modules和標準庫的其他同等重要的元件(例如,用於檔案系統的API)。這些都是非常酷的現代化,我們期待著在下一套標準中看到它們。不過,我想提醒大家注意新標準允許編寫更安全的程式碼的方式。當開發靜態分析器時,我們看到了大量的不同的錯誤,有時我們會忍不住想:“但是在現代C++中,這是可以避免的”。因此,我建議我們檢查PVS Studio在各種開源專案中發現的幾個錯誤。另外,我們還將看看如何修復它們。 Automatic type inference

In C++, the keywordsautoanddecltypewere added. Of course, you already know how they work.

在C++中,增加了關鍵字auto和decltype。當然,你已經知道它們是怎麼工作的了。

1
2
3
std::map<int, int> m;
auto it = m.find(42);
//C++98: std::map<int, int>::iterator it = m.find(42); 

It's very convenient to shorten long types, without losing the readability of the code. However, these keywords become quite expansive, together with templates: there is no need to specify the type of the returning value withauto

anddecltype.

在不損失程式碼可讀性的情況下,縮短長型別非常方便。但是,這些關鍵字和模板擴充套件性很強:不需要使用auto和decltype指定返回值的型別。

But let's go back to our topic. Here is an example of a64-bit error:

1
2
3
string str = .....;
unsigned n = str.find("ABC");
if (n != string::npos)

In a 64-bit application, the value ofstring::nposis greater than the maximum value ofUINT_MAX, which can be represented by a variable ofunsignedtype. It could seem that this is a case whereautocan save us from this kind of problem: the type of thenvariable isn't important to us, the main thing is that it can accommodate all possible valuesof string::find. And indeed, if we rewrite this example withauto, the error is gone:

在64位應用程式中,string::npos的值大於UINT_MAX的最大值,該值可以由無符號型別的變量表示。這似乎是一個auto可以將我們從這類問題中拯救出來的例子:n變數的型別對我們來說並不重要,主要的是它可以容納string::find的所有可能值。實際上,如果我們用auto重寫這個示例,錯誤就消失了:

1
2
3
string str = .....;
auto n = str.find("ABC");
if (n != string::npos)

But not everything is as simple. Usingautois not a panacea, and there are many pitfalls related to its use. For example, you can write the code like this:

但並不是所有事情都那麼簡單。使用auto並不是萬能的,它的使用也存在許多陷阱。例如,您可以編寫如下程式碼:

1
2
auto n = 1024 * 1024 * 1024 * 5;
char* buf = new char[n];

Autowon't save us from the integer overflow and there will be less memory allocated for the buffer than 5GiB.

Autoalso isn't of any great help when it comes to a very common error: an incorrectly written loop. Let's look at an example:

Auto不會將我們從整數溢位中拯救出來,而且為緩衝區分配的記憶體將少於5GiB。

當遇到一個非常常見的錯誤:一個寫得不正確的迴圈時,Auto也沒有任何幫助。讓我們看一個例子:

1
2
3
std::vector<int> bigVector;
for (unsigned i = 0; i < bigVector.size(); ++i)
{ ... }

For large size arrays, this loop becomes an infinity loop. It's no surprise that there are such errors in the code: they reveal themselves in very rare cases, for which there were no tests.

Can we rewrite this fragment withauto?

對於大型陣列,此迴圈變為無限迴圈。程式碼中存在這樣的錯誤並不奇怪:它們在非常罕見的情況下會暴露自己,因為沒有測試。

我們能用auto重寫這個片段嗎?

1
2
3
std::vector<int> bigVector;
for (auto i = 0; i < bigVector.size(); ++i)
{ ... }

No. Not only is the error is still here. It has become even worse.

With simple typesautobehaves very badly. Yes, in the simplest cases (auto x = y) it works, but as soon as there are additional constructions, the behavior can become more unpredictable. What's worse, the error will be more difficult to notice, because the types of variables aren't that obvious at first glance. Fortunately it is not a problem for static analyzers: they don't get tired, and don't lose attention. But for us, as simple mortals it's better to specify the types explicitly. We can also get rid of the narrowing casting using other methods, but we'll speak about that later.

不,不僅錯誤還在這裡。情況變得更糟了。

對於簡單型別,auto的行為非常糟糕。是的,在最簡單的情況下(auto x=y),它可以工作,但是一旦有了額外的構造,行為就會變得更加不可預測。更糟糕的是,這個錯誤更難被注意到,因為變數的型別乍一看就不那麼明顯。幸運的是,對於靜態分析器來說這不是一個問題:它們不會感到疲倦,也不會失去注意力。但對我們來說,簡單來說,最好顯式地指定型別。我們也可以使用其他方法來消除縮小範圍的情況,但稍後我們將討論這個問題。

Dangerous count of

One of the "dangerous" types in C++ is an array. Often when passing it to the function, programmers forget that it is passed as a pointer, and try to calculate the number of elements withsizeof.

危險計數

C++中的一種“危險”型別是陣列。通常在將它傳遞給函式時,程式設計師會忘記它是作為指標傳遞的,並嘗試使用sizeof計算元素的數量。

1
2
3
4
5
6
7
8
9
#define RTL_NUMBER_OF_V1(A) (sizeof(A)/sizeof((A)[0]))
#define _ARRAYSIZE(A) RTL_NUMBER_OF_V1(A)
int GetAllNeighbors( const CCoreDispInfo *pDisp,
 int iNeighbors[512] ) {
 ....
 if ( nNeighbors < _ARRAYSIZE( iNeighbors ) ) 
iNeighbors[nNeighbors++] = pCorner->m_Neighbors[i];
 .... 
} 

Note: This code is taken from the Source Engine SDK.

PVS-Studio warning:V511The sizeof() operator returns size of the pointer, and not of the array, in 'sizeof (iNeighbors)' expression. Vrad_dll disp_vrad.cpp 60

Such confusion can arise because of specifying the size of an array in the argument: this number means nothing to the compiler, and is just a hint to the programmer.

The trouble is that this code gets compiled, and the programmer is unaware that something is not right. The obvious solution would be to use metaprogramming:

注意:此程式碼取自源引擎SDK。

VS Studio警告:V511 size of()運算子返回“sizeof(iNeighbors)”表示式中指標的大小,而不是陣列的大小。虛擬現實動態連結庫Vrad_dll disp_vrad.cpp 60個

這種混淆可能是由於在引數中指定陣列的大小而引起的:這個數字對編譯器來說沒有任何意義,對程式設計師來說只是一個警示。

問題是這段程式碼被編譯了,而程式設計師卻不知道有些事情不對。顯而易見的解決方案是使用超程式設計:

1
2
3
4
template < class T, size_t N ><br>constexpr size_t countof( const T (&array)[N] ) {
 return N;
}
countof(iNeighbors); //compile-time error 

If we pass to this function, not an array, we get a compilation error. In C ++17 you can usestd::size.

In C++11, the functionstd::extentwasadded, but it isn't suitable ascountof,because it returns0for inappropriate types.

如果我們傳遞給這個函式,而不是一個數組,就會得到一個編譯錯誤。在C++17中,可以使用std::size。

在C++ 11中,函式std::extent被新增,但是它不匹配作為countof,因為它返回不匹配型別的0。

std::extent<decltype(iNeighbors)>();//=> 0

You can make an error not only withcountof,but withsizeofas well.

1
2
3
4
5
6
7
8
VisitedLinkMaster::TableBuilder::TableBuilder(
 VisitedLinkMaster* master,
 const uint8 salt[LINK_SALT_LENGTH])
 : master_(master),
 success_(true) {
 fingerprints_.reserve(4096);
 memcpy(salt_, salt, sizeof(salt));
}

Note: This code is taken from Chromium.

PVS-Studio warnings:

  • V511The sizeof() operator returns size of the pointer, and not of the array, in 'sizeof (salt)' expression. browser visitedlink_master.cc 968
  • V512A call of the 'memcpy' function will lead to underflow of the buffer 'salt_'. browser visitedlink_master.cc 968

As you can see, the standard C++ arrays have a lot of problems. This is why you should usestd::array: in the modern C++ its API is similar tostd::vectorand other containers, and it's harder to make an error when using it.

注:此程式碼取自Chromium。

PVS-Studio警告:

•V511 size of()運算子返回“sizeof(salt)”表示式中指標的大小,而不是陣列的大小。瀏覽器visitedlink_master.cc 968。

•V512呼叫“memcpy”函式將導致緩衝區“salt_”下溢。瀏覽器visitedlink_master.cc 968。正如你所看到的,標準的C++陣列有很多問題。這就是為什麼你應該使用STD::SART:在現代C++中,它的API類似於STD::vector和其他容器,使用它時出錯更難。

1
2
3
4
void Foo(std::array<uint8, 16> array)
{
 array.size(); //=> 16
}

How to make a mistake in a simple for

One more source of errors is a simpleforloop. You may think, "Where can you make a mistake there? Is it something connected with the complex exit condition or saving on the lines of code?" No, programmers make error in the simplest loops. Let's take a look at the fragments from the projects:

1
2
3
4
5
6
7
8
9
10
11
const int SerialWindow::kBaudrates[] = { 50, 75, 110, .... };
SerialWindow::SerialWindow() : ....
{
 ....
 for(int i = sizeof(kBaudrates) / sizeof(char*); --i >= 0;)
 {
 message->AddInt32("baudrate", kBaudrateConstants[i]); 
....
 }
}

Note: This code is taken from Haiku Operation System.

PVS-Studio warning:V706Suspicious division: sizeof (kBaudrates) / sizeof (char *). Size of every element in 'kBaudrates' array does not equal to divisor. SerialWindow.cpp 162

We have examined such errors in detail in the previous chapter: the array size wasn't evaluated correctly again. We can easily fix it by usingstd::size:

1
2
3
4
5
6
7
8
9
10
const int SerialWindow::kBaudrates[] = { 50, 75, 110, .... };
 
SerialWindow::SerialWindow() : ....
{
 ....
 for(int i = std::size(kBaudrates); --i >= 0;) {
 message->AddInt32("baudrate", kBaudrateConstants[i]); 
....
 }
}

But there is a better way. Let's take a look at one more fragment.

1
2
3
4
5
6
7
8
9
10
11
inline void CXmlReader::CXmlInputStream::UnsafePutCharsBack(
 const TCHAR* pChars, size_t nNumChars)
{
 if (nNumChars > 0)
 {
 for (size_t nCharPos = nNumChars - 1;
 nCharPos >= 0;
 --nCharPos)
 UnsafePutCharBack(pChars[nCharPos]);
 }
}

Note: This code is taken from Shareaza.

PVS-Studio warning:V547Expression 'nCharPos >= 0' is always true. Unsigned type value is always >= 0. BugTrap xmlreader.h 946

It's a typical error when writing a reverse loop: the programmer forgot that the iterator of an unsigned type and the check always returntrue. You might think, "How come? Only novices and students make such mistakes. We, professionals, don't." Unfortunately, this is not completely true. Of course, everyone understands that(unsigned >= 0)-true. Where do such errors come from? They often occur as a result of refactoring. Imagine this situation: the project migrates from the 32-bit platform to 64-bit. Previously,int/unsignedwas used for indexing and a decision was made to replace them withsize_t/ptrdiff_t. But in one fragment they accidentally used an unsigned type instead of a signed one.

What shall we do to avoid this situation in your code? Some people advise the use of signed types, as in C# or Qt. Perhaps, it could be a way out, but if we want to work with large amounts of data, then there is no way to avoidsize_t.Is there any more secure way to iterate through array in C++? Of course there is. Let's start with the simplest one: non-member functions. There are standard functions to work with collections, arrays andinitializer_list;their principle should be familiar to you.

1
2
3
4
5
6
char buf[4] = { 'a', 'b', 'c', 'd' };
for (auto it = rbegin(buf);
 it != rend(buf);
 ++it) {
 std::cout << *it;
}

Great, now we do not need to remember the difference between a direct and reverse cycle. There is also no need to think about whether we use a simple array or an array - the loop will work in any case. Using iterators is a great way to avoid headaches, but even that is not always good enough. It is best to use therange-based for loop:

1
2
3
4
char buf[4] = { 'a', 'b', 'c', 'd' };
for (auto it : buf) {
 std::cout << it;
}

Of course, there are some flaws in therange-based for:it doesn't allow flexible management of the loop, and if there is more complex work with indexes required, thenforwon't be of much help to us. But such situations should be examined separately. We have quite a simple situation: we have to move along the items in the reverse order. However, at this stage, there are already difficulties. There are no additional classes in the standard library forrange-based for. Let's see how it could be implemented:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
template <typename T>
struct reversed_wrapper {
 const T& _v;
 
reversed_wrapper (const T& v) : _v(v) {}
 auto begin() -> decltype(rbegin(_v))
 {
 return rbegin(_v);
 }
 auto end() -> decltype(rend(_v))
 {
 return rend(_v);
 }
};
 
template <typename T>
reversed_wrapper<T> reversed(const T& v)
{
 return reversed_wrapper<T>(v);
}

In C++14 you can simplify the code by removing thedecltype. You can see howautohelps you write template functions -reversed_wrapperwill work both with an array, andstd::vector.

Now we can rewrite the fragment as follows:

1
2
3
4
char buf[4] = { 'a', 'b', 'c', 'd' };
for (auto it : reversed(buf)) {
 std::cout << it;
}

What's great about this code? Firstly, it is very easy to read. We immediately see that the array of the elements is in the reverse order. Secondly, it's harder to make an error. And thirdly, it works with any type. This is much better than what it was.

You can useboost::adaptors::reverse(arr)inboost.

But let's go back to the original example. There, the array is passed by a pair pointer-size. It is obvious that our idea withreversedwill not work for it. What shall we do? Use classes likespan/array_view. In C++17 we havestring_view, and I suggest using that:

1
2
3
4
5
void Foo(std::string_view s);
std::string str = "abc";
Foo(std::string_view("abc", 3));
Foo("abc");
Foo(str);

string_viewdoes not own the string, in fact it's a wrapper around theconst char*and the length. That's why in the code example, the string is passed by value, not by the reference. A key feature of thestring_viewis compatibility with strings in various string presentations:const char*,std::stringand non-null terminatedconst char*.

As a result, the function takes the following form:

1
2
3
4
5
6
inline void CXmlReader::CXmlInputStream::UnsafePutCharsBack(
 std::wstring_view chars)
{
 for (wchar_t ch : reversed(chars))
 UnsafePutCharBack(ch);
}

Passing to the function, it's important to remember that the constructorstring_view(const char*)is implicit, that's why we can write like this:

Foo(pChars);

Not this way:

Foo(wstring_view(pChars, nNumChars));

A string that thestring_viewpoints to, does not need to be null- terminated, the very namestring_view::datagives us a hint about this, and it is necessary to keep that in mind when using it. When passing its value to a function fromcstdlib,which is waiting for a C string, you can get undefined behavior. You can easily miss it, if in most cases that you are testing, there isstd::stringor null-terminated strings used.

Enum

Let's leave C++ for a second and think about good old C. How is security there? After all, there are no problems with implicit constructor calls and operators, or type conversion, and there are no problems with various types of the strings. In practice, errors often occur in the simplest constructions: the most complicated ones are thoroughly reviewed and debugged, because they cause some doubts. At the same time programmers forget to check simple constructions. Here is an example of a dangerous structure, which came to us from C:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
enum iscsi_param {
 ....
 ISCSI_PARAM_CONN_PORT,
 ISCSI_PARAM_CONN_ADDRESS,
 ....
};
 
enum iscsi_host_param {
 ....
 ISCSI_HOST_PARAM_IPADDRESS,
 ....
};
int iscsi_conn_get_addr_param(....,
 enum iscsi_param param, ....)
{
 ....
 switch (param) {
 case ISCSI_PARAM_CONN_ADDRESS:
 case ISCSI_HOST_PARAM_IPADDRESS:
 ....
 }
 
return len;
}

An example of the Linux kernel. PVS-Studio warning:V556The values of different enum types are compared: switch(ENUM_TYPE_A) { case ENUM_TYPE_B: ... }. libiscsi.c 3501

Pay attention to the values in theswitch-case: one of the named constants is taken from a different enumeration. In the original, of course, there is much more code and more possible values and the error isn't so obvious. The reason for that is lax typing of enum - they may be implicitly casting to int, and this leaves a lot of room for errors.

In C++11 you can, and should, useenum class:such a trick won't work there, and the error will show up at the compilation stage. As a result, the following code does not compile, which is exactly what we need:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
enum class ISCSI_PARAM {
 ....
 CONN_PORT,
 CONN_ADDRESS,
 ....
};
 
enum class ISCSI_HOST {
 ....
 PARAM_IPADDRESS,
 ....
};
int iscsi_conn_get_addr_param(....,
 ISCSI_PARAM param, ....)
{
 ....
 switch (param) {
 case ISCSI_PARAM::CONN_ADDRESS:
 case ISCSI_HOST::PARAM_IPADDRESS:
 ....
 }
 
return len;
}

The following fragment is not quite connected with the enum, but has similar symptoms:

1
2
3
4
5
6
7
void adns__querysend_tcp(....) {
 ...
 if (!(errno == EAGAIN || EWOULDBLOCK || 
errno == EINTR || errno == ENOSPC ||
 errno == ENOBUFS || errno == ENOMEM)) {
 ...
}

Note: This code is taken from ReactOS.

Yes, the values oferrnoare declared as macros, which is bad practice in C++ (in C as well), but even if the programmer usedenum, it wouldn't make life easier. The lost comparison will not reveal itself in case ofenum(and especially in case of a macro). At the same timeenum classwould not allow this, as there will were no implicit casting to bool.

Initialization in the constructor

But back to the native C++ problems. One of them reveals when there is a need to initialize the object in the same way in several constructors. A simple situation: there is a class, two constructors, one of them calls another. It all looks pretty logical: the common code is put into a separate method - nobody likes to duplicate the code. What's the pitfall?

1
2
3
4
5
6
7
8
9
Guess::Guess() {
 language_str = DEFAULT_LANGUAGE;
 country_str = DEFAULT_COUNTRY;
 encoding_str = DEFAULT_ENCODING;
}
Guess::Guess(const char * guess_str) {
 Guess();
 ....
}

Note: This code is taken from LibreOffice.

PVS-Studio warning:V603The object was created but it is not being used. If you wish to call constructor, 'this->Guess::Guess(....)' should be used. guess.cxx 56

The pitfall is in the syntax of the constructor call. Quite often it gets forgotten, and the programmer creates one more class instance, which then gets immediately destroyed. That is, the initialization of the original instance isn't happening. Of course, there are 1001 ways to fix this. For example, we can explicitly call the constructor viathis,or put everything into a separate function:

1
2
3
4
5
6
7
8
9
10
11
Guess::Guess(const char * guess_str)
{
 this->Guess();
 ....
}
 
Guess::Guess(const char * guess_str)
{
 Init();
 ....
}

By the way, an explicit repeated call of the constructor, for example, viathisis a dangerous game, and we need to understand what's going on. The variant with the Init() is much better and clearer. For those who want to understand the details of these "pitfalls" better, I suggest looking at chapter 19, "How to properly call one constructor from another", from thisbook.

But it is best to use the delegation of the constructors here. So we can explicitly call one constructor from another in the following way:

1
2
3
4
Guess::Guess(const char * guess_str) : Guess()
{
 ....
}

Such constructors have several limitations. First: delegated constructors take full responsibility for the initialization of an object. That is, it won't be possible to initialize another class field with it in the initialization list:

1
2
3
4
5
6
Guess::Guess(const char * guess_str)
 : Guess(), 
m_member(42)
{
 ....
}

And of course, we have to make sure that the delegation doesn't create a loop, as it will be impossible to exit it. Unfortunately, this code gets compiled:

1
2
3
4
5
6
7
8
9
10
11
Guess::Guess(const char * guess_str)
 : Guess(std::string(guess_str))
{
 ....
}
Guess::Guess(std::string guess_str)
 : Guess(guess_str.c_str())
{
 ....
}