内存排序

内存排序是指CPU访问主存时的顺序。可以是编译器编译时产生,也可以是CPU在运行时产生。反映了内存操作重排序,乱序执行,从而充分利用不同内存的总线带宽。

现代处理器大都是乱序执行。因此需要内存屏障以确保多线程的同步。

编译时内存排序

编译时内存屏障

这些内存屏障阻止编译器在编译时乱序指令,但在运行时无效。

  • GNU内联汇编语句
asm volatile("" ::: "memory");

或者

__asm__ __volatile__ ("" ::: "memory");

阻止GCC编译器跨越它乱序读/写指令。[1]

  • C11/C++11
atomic_signal_fence(memory_order_acq_rel);

阻止编译器跨越它乱序读/写指令。[2]

__memory_barrier()

指令。[3][4]

_ReadWriteBarrier()

运行时内存排序

  • happens-before:按照程序的代码序执行
  • synchronized-with:不同线程间,对于同一个原子操作,需要同步关系,store()操作一定要先于 load(),也就是说 对于一个原子变量x,先写x,然后读x是一个同步的操作

对称多处理器(SMP)系统

对称多处理器(SMP)系统有多个内存一致模型。

  • 顺序一致(Sequential consistency):同一个线程的原子操作还是按照happens-before关系,但不同线程间的执行关系是任意
  • 松弛一致(Relaxed consistency,允许某种类型的重排序):如果某个操作只要求是原子操作,除此之外,不需要其它同步的保障,就可以使用 Relaxed ordering。程序计数器是一种典型的应用场景
  • 弱一致(Weak consistency):读写任意排序,受显式的内存屏障限制。
内存排序在一些架构的情况[6][7]
类型 AlphaARMv7MIPSLoongISAPA-RISCPOWERSPARC RMOSPARC PSOSPARC TSOx86x86 oostoreAMD64IA-64z/Architecture
Loads reordered after loads YY 架构本身不规定
微架构/芯片的实现决定
YYYYYY
Loads reordered after stores YY YYYYYY
Stores reordered after stores YY YYYYYYY
Stores reordered after loads YY YYYYYYYYYYY
Atomic reordered with loads YY YYY
Atomic reordered with stores YY YYYY
Dependent loads reordered Y
Incoherent instruction cache pipeline YY YYYYYYY

某些老的x86有更弱内存序。[8]

SPARC 内存序:

  • SPARC TSO = total store order (default)
  • SPARC RMO = relaxed-memory order (not supported on recent CPUs)
  • SPARC PSO = partial store order (not supported on recent CPUs)

硬件内存屏障

lfence (asm), void _mm_lfence(void)
sfence (asm), void _mm_sfence(void)[9]
mfence (asm), void _mm_mfence(void)[10]
sync (asm)
sync (asm)
mf (asm)
  • POWER
dcs (asm)
dmb (asm)
dsb (asm)
isb (asm)

编译器对硬件内存屏障的支持

  • GCC,[12] version 4.4.0 and later,[13] has __sync_synchronize.
  • C11/C++11 atomic_thread_fence()支持一条命令
  • Microsoft Visual C++[14] has MemoryBarrier().
  • Sun Studio Compiler Suite[15] has __machine_r_barrier, __machine_w_barrier and __machine_rw_barrier.

参见

  • 内存模型 (程序设计)

参考文献

  1. . [2018-12-06]. (原始内容存档于2011-07-24).
  2. . [2018-12-06]. (原始内容存档于2020-08-10).
  3. . [2018-12-06]. (原始内容存档于2011-07-24).
  4. Intel(R) C++ Compiler Intrinsics Reference 页面存档备份,存于
    Creates a barrier across which the compiler will not schedule any data access instruction. The compiler may allocate local data in registers across a memory barrier, but not global data.
  5. Visual C++ Language Reference _ReadWriteBarrier 页面存档备份,存于
  6. (PDF). [2018-12-06]. (原始内容存档 (PDF)于2020-10-31).
  7. Memory Barriers: a Hardware View for Software Hackers 页面存档备份,存于, Figure 5 on Page 16
  8. Table 1. Summary of Memory Ordering 页面存档备份,存于, from "Memory Ordering in Modern Microprocessors, Part I"
  9. . [2018-12-06]. (原始内容存档于2019-06-13).
  10. . [2018-12-06]. (原始内容存档于2019-09-05).
  11. . [2020-12-20]. (原始内容存档于2020-06-19).
  12. . [2018-12-06]. (原始内容存档于2017-11-08).
  13. . [2018-12-06]. (原始内容存档于2020-10-31).
  14. . [2018-12-06]. (原始内容存档于2017-04-04).
  15. Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 2, Memory Barriers and Memory Fence 页面存档备份,存于

进一步阅读

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.