A write buffer is a type of data buffer that can be used to hold data being written from the cache to main memory or to the next cache in the memory hierarchy to improve performance and reduce latency. It is used in certain CPU cache architectures like Intel's x86 and AMD64.[1] In multi-core systems, write buffers destroy sequential consistency. Some software disciplines, like C11's data-race-freedom,[2] are sufficient to regain a sequentially consistent view of memory.
A variation of write-through caching is called buffered write-through.
Use of a write buffer in this manner frees the cache to service read requests while the write is taking place. It is especially useful for very slow main memory in that subsequent reads are able to proceed without waiting for long main memory latency. When the write buffer is full (i.e. all buffer entries are occupied), subsequent writes still have to wait until slots are freed. Subsequent reads could be served from the write buffer. To further mitigate this stall, one optimization called write buffer merge may be implemented. Write buffer merge combines writes that have consecutive destination addresses into one buffer entry. Otherwise, they would occupy separate entries which increases the chance of pipeline stall.
A victim buffer is a type of write buffer that stores dirty evicted lines in write-back caches[note 1] so that they get written back to main memory. Besides reducing pipeline stall by not waiting for dirty lines to write back as a simple write buffer does, a victim buffer may also serve as a temporary backup storage when subsequent cache accesses exhibit locality, requesting those recently evicted lines, which are still in the victim buffer.
The store buffer was invented by IBM during Project ACS between 1964 and 1968,[3] but it was first implemented in commercial products in the 1990s.
Notes
- ↑ Write-through caches don't need write the evicted cache lines as they are written to main memory when the cache is written.
References
- ↑ Owens, Scott, Susmit Sarkar, and Peter Sewell. "A better x86 memory model: x86-TSO." Theorem Proving in Higher Order Logics. Springer Berlin Heidelberg, 2009. 391-407.
- ↑ Oberhauser, Jonas. "A Simpler Reduction Theorem for x86-TSO." Verified Software: Theories, Tools, and Experiments. Springer International Publishing, 2015. 142-164
- ↑ Cocke, John (2007). "The search for performance in scientific processors". ACM Turing Award Lectures. p. 1987. doi:10.1145/1283920.1283945. ISBN 978-1-4503-1049-9.