多线程操作全局变量,必须考虑同步问题,否则可能出现数据不一致, 甚至触发coredump.
前段时间, 遇到一个多线程操作了全局的vector的问题, 程序崩了。场景是这样的:某全局配置参数保存在一个vector中,需要定时更新(更新线程), 另外的工作线程去读取配置。 这种场景是非常普遍的。
在该场景中,程序没有枷锁,概率coredump, 实际情况是,服务跑了一段时间后,必然coredump. 很显然, 更新线程执行clear,然后在push_back操作时, 会导致工作线程的vector迭代器失效, 内存错误。
本文中, 我从实例和代码的层面来说一下, 在C++ STL中, vector并不是线程安全的, 大家使用的时候, 要多加小心。 为了简便起见, 不采用上面的原场景, 而是仅仅以push_back为例:
来看一段程序:
#include <pthread.h>
#include <unistd.h>#include <iostream>#include <vector>#define N 2using namespace std;vector<int> g_v;pthread_mutex_t mutex;void* fun(void *p){ for(int i = 0; i < 100000; i++) { //pthread_mutex_lock(&mutex); g_v.push_back(i); //pthread_mutex_unlock(&mutex); } return NULL;}int main(){ pthread_t threads[ N]; pthread_mutex_init(&mutex, NULL); for(int i = 0; i < N; i++) { pthread_create(&threads[i], NULL, fun, NULL); } for(int i = 0; i < N; i++) { pthread_join(threads[i],NULL); } cout << "ok" << endl; return 0;} 编译: g++ test.cpp -lpthread -g运行3次:
taoge:~> ./a.out
oktaoge:~> ./a.out Segmentation fault (core dumped)taoge:~> ./a.out ok 可见, 程序概率core dump. 来调试一下:taoge:~> gdb a.out core.9775
GNU gdb 6.6Copyright (C) 2006 Free Software Foundation, Inc.GDB is free software, covered by the GNU General Public License, and you arewelcome to change it and/or distribute copies of it under certain conditions.Type "show copying" to see the conditions.There is absolutely no warranty for GDB. Type "show warranty" for details.This GDB was configured as "i586-suse-linux"...Using host libthread_db library "/lib/libthread_db.so.1".warning: Can't read pathname for load map: Input/output error.Reading symbols from /lib/libonion.so...done.Loaded symbols for /lib/libonion.soReading symbols from /lib/libpthread.so.0...done.Loaded symbols for /lib/libpthread.so.0Reading symbols from /usr/lib/libstdc++.so.6...done.Loaded symbols for /usr/lib/libstdc++.so.6Reading symbols from /lib/libm.so.6...done.Loaded symbols for /lib/libm.so.6Reading symbols from /lib/libgcc_s.so.1...done.Loaded symbols for /lib/libgcc_s.so.1Reading symbols from /lib/libc.so.6...done.Loaded symbols for /lib/libc.so.6Reading symbols from /lib/libdl.so.2...done.Loaded symbols for /lib/libdl.so.2Reading symbols from /lib/ld-linux.so.2...done.Loaded symbols for /lib/ld-linux.so.2Core was generated by `./a.out'.Program terminated with signal 11, Segmentation fault.#0 0x08048cc0 in __gnu_cxx::new_allocator<int>::construct (this=0x804a200, __p=0xb6cc2000, __val=@0xb7ce2464) at /usr/include/c++/4.1.2/ext/new_allocator.h:104104 { ::new(__p) _Tp(__val); }(gdb) bt#0 0x08048cc0 in __gnu_cxx::new_allocator<int>::construct (this=0x804a200, __p=0xb6cc2000, __val=@0xb7ce2464) at /usr/include/c++/4.1.2/ext/new_allocator.h:104#1 0x08049846 in std::vector<int, std::allocator<int> >::push_back (this=0x804a200, __x=@0xb7ce2464) at /usr/include/c++/4.1.2/bits/stl_vector.h:606#2 0x08048bde in fun (p=0x0) at test.cpp:16#3 0xb7f471eb in start_thread () from /lib/libpthread.so.0#4 0xb7da97fe in clone () from /lib/libc.so.6(gdb) f 2#2 0x08048bde in fun (p=0x0) at test.cpp:1616 g_v.push_back(i);(gdb) i localsi = 63854(gdb) i argsp = (void *) 0x0(gdb) f 1#1 0x08049846 in std::vector<int, std::allocator<int> >::push_back (this=0x804a200, __x=@0xb7ce2464) at /usr/include/c++/4.1.2/bits/stl_vector.h:606606 this->_M_impl.construct(this->_M_impl._M_finish, __x);(gdb) i localsNo locals.(gdb) i argsthis = (std::vector<int,std::allocator<int> > * const) 0x804a200__x = (const int &) @0xb7ce2464: 63854(gdb) p this$1 = (std::vector<int,std::allocator<int> > * const) 0x804a200(gdb) p *this$2 = {<std::_Vector_base<int,std::allocator<int> >> = { _M_impl = {<std::allocator<int>> = {<__gnu_cxx::new_allocator<int>> = {<No data fields>}, <No data fields>}, _M_start = 0xb6c81008, _M_finish = 0xb6cc2000, _M_end_of_storage = 0xb6cc1008}}, <No data fields>}(gdb) 重点关注frame 1, 其中有:_M_start, _M_finish, _M_end_of_storage, 熟悉vector底层动态分配的朋友, 应该能猜出这三个变量的含义, _M_start指向vector头, _M_finish指向vector尾, _M_end_of_storage指向预分配内存的尾。 来看下vector的push_back函数源码:void
push_back(const value_type& __x) { if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) { _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish, __x); ++this->_M_impl._M_finish; } else#if __cplusplus >= 201103L _M_emplace_back_aux(__x);#else _M_insert_aux(end(), __x);#endif } 可以看到, 在单线程环境下, 执行push_back的时候, _M_finish总是逐渐去追逐最后的_M_end_of_storage,,容量不够时继续扩_M_end_of_storage, 总之,_M_finish不会越过_M_end_of_storage. 但是, 在多线程环境下, 当_M_finish比_M_end_of_storage小1时,可能会出现多线程同时满足this->_M_impl._M_finish != this->_M_impl._M_end_of_storage, 然后同时执行++this->_M_impl._M_finish, 这样,_M_finish就越过了_M_end_of_storage, 如我们实验中的例子那样。越界操作导致有coredump。 当然, 具体是否越过, 是概率性的, 我们要避免这种未定义行为。怎么办呢? 可以考虑加锁, 把上述程序的注释取消, 也就是加了互斥锁(mutex), 实际多次运行发现, 再也没有coredump了。
还有一个问题: 上面的结论是_M_finish越过了_M_end_of_storage, 导致coredump, 那如果让_M_end_of_storage不被越过呢? 理论上认为,不会core dump, 如下:
#include <pthread.h>
#include <unistd.h>#include <iostream>#include <vector>#define N 2using namespace std;vector<int> g_v;pthread_mutex_t mutex;void* fun(void *p){ for(int i = 0; i < 100000; i++) { //pthread_mutex_lock(&mutex); g_v.push_back(i); //pthread_mutex_unlock(&mutex); } return NULL;}int main(){ g_v.reserve(999999); // pay attention pthread_t threads[ N]; pthread_mutex_init(&mutex, NULL); for(int i = 0; i < N; i++) { pthread_create(&threads[i], NULL, fun, NULL); } for(int i = 0; i < N; i++) { pthread_join(threads[i],NULL); } cout << "ok" << endl; return 0;} 编译并运行多次, 未见coredump. 尽管如此, 也不能完全保证上述操作的结果符合预期的逻辑, 毕竟,多线程还在操作着非原子的push_back呢。
最后,回到我遇到的那个问题,定时更新配置,可以考虑加锁。如果不用锁, 该怎么实现呢? 可以考虑用两个vector, 轮换使用,更新的vector不去读, 当前的读的vector不更新,然后轮换当前vector. 我见过很多地方都是这么用的。
类似的问题还有很多很多, 坑, 就在那里, 不多不少。 书本Effective STL第12 条如是说:切勿对STL 容器的线程安全性有不切实际的依赖!
双缓冲队列文章,注意双缓冲队列不是完全不加锁 而是尽可能的减少加减锁