The Utility of Fast Active Messags on Many-Core Chips
R. Curtis Harting, Vishal Parikh and William Dally, Stanford
This poster discusses the implementation and benefits of user-level active messaging on many-core processors. In these threaded, cache-coherent processors we have found that data movement and thread communication are a major factor in causing energy inefficiency and performance degradation. Without active messaging, communication between cores triggers many coherency messages to be sent on the network as cache lines move from one writing core to another. Sending active messages to the data, however, eliminates this spurious data movement and energy consumption.
We detail three different types of common programming idioms that active messaging speeds up and makes more efficient: reductions, object contention, and data walks. Reductions, for example, are implemented via a hierarchy of destination objects, propagating values up and back down the tree. At each intermediate level of the tree, an arbitrary reduction computation can be done before another message is sent. The poster also describes an API and implementation of active messages that allow for fast, yet flexible, execution. Finally, we evaluate four benchmarks that have been written to use active messaging and demonstrate faster performance, less energy usage, and better scalability when compared to a pthreads implementation.
Previous page: A Compact Vector Processor for FPGA Applications
Next page: Efficient Fetch Mechanism by Employing Instruction Register


