SeaStar Interconnect: Balanced Bandwidth for Scalable Performance
From AcaWiki
Citation: Ron Brightwell, Kevin T. Pedretti, Keith D. Underwood, T. Hudson (2006/07/05) SeaStar Interconnect: Balanced Bandwidth for Scalable Performance. IEEE Micro (RSS)
DOI (original publisher): 10.1109/MM.2006.65
Semantic Scholar (metadata): 10.1109/MM.2006.65
Sci-Hub (fulltext): 10.1109/MM.2006.65
Internet Archive Scholar (search for fulltext): SeaStar Interconnect: Balanced Bandwidth for Scalable Performance
Download: https://ieeexplore.ieee.org/document/1650179, https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.150.9132&rep=rep1&type=pdf
Tagged: Computer Science
(RSS) computer architecture (RSS), computer networks (RSS)
Summary
Placeholder
Elsewhere
Problem
- Design a {balanced, scalable, reliable} interconnect for the Red Storm supercomputer.
- Balanced: Good bytes-to-flops ratio at each node.
- Scalable: Scale up to Red Storm supercomputer.
- Reliability: Transmit correct data.
Solution
- SeaStar
- Architecture
- Operating system integration
- Host-based mode: requests only served from host's kernel. Can trust the messages without verifying, but needs to involve kernel during transmissions.
- NIC-based mode: requests served from applications. Can't trust messages, but can process them without involving the kernel.
- Trusted portions of messages have to be kept in kernel memory.
- Flow protocol
- Needs to hide transmission failures.
- Assume the links themselves are always there, so we only need to handle failures stemming from "overprovisioning".
- In order to detect those failures, routers have to NACK packets they can't respond to. This involves creating yet more traffic on an already congested network.
- Data and control use different channels. This prevents deadlock; control can go even if the data is stuck.
- In the future, should use back-off scheme.
- Programming interfaces: