ASIC Clouds: Specializing the Datacenter
Citation: Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, Michael Bedford (2016/07/18) ASIC Clouds: Specializing the Datacenter. Annual International Symposium on Computer Architecture, ISCA (RSS)
DOI (original publisher): 10.1109/ISCA.2016.25
Semantic Scholar (metadata): 10.1109/ISCA.2016.25
Sci-Hub (fulltext): 10.1109/ISCA.2016.25
Internet Archive Scholar (search for fulltext): ASIC Clouds: Specializing the Datacenter
Download: https://ieeexplore.ieee.org/abstract/document/7551392
Tagged: Computer Science
(RSS) Computer Architecture (RSS)
Summary
Accelerators have been shown to be effective against GPUs on a single node. This motivates the creation of ASIC clouds. The authors propose a generic model for turning accelerators into ASIC clouds while optimizing $ per op/s and W per op/s.
Theoretical and Practical Relevance
ASIC clouds are a promising technology for certain workloads, such as video transcoding, cryptocurrency mining, and neural network inference. I expect to see more of these in the future.
Problem
- Two trends in computing in the past decade:
- split between mobile and cloud
- dark silicon (see Dark silicon and the end of multicore scaling)
- ASIC clouds can satisfy demand given these two trends.
- How to design optimal ASIC clouds?
- Determine ASIC parameters, server parameters, datacenter layout parameters
- Minimizing total cost of ownership (TCO), non-recurring expense (NRE)
- Applications: Cryptocurrency mining, video transcoding, neural network inference service
Solution
- Model:
- Replicated Compute Accelerators (RCA)
- ASIC core:
- multiple RCAs
- DRAM controllers
- On-ASIC network
- Control plane, sends work to RCAs using on-ASIC network.
- Interface to server: eDRAM, hypertransport
- ASIC server: 1-unit tall, 19-inch wide, rack-mounted blades with printed circuit boards (PCB)
- Multiple ASIC cores
- Power-supply unit (PSU)
- General-purpose control processor: FPGA, microcontroller, possibly CPU
- On-PCB network: could be 4-pin SPI interface, high-bandwidth HyperTransport, RapidIO or QPI links
- Interface to rack: could be PCI-e (e.g. Convey HC1 and HC2), commodity1/10/40 GigE interfaces, high speed point-to-point 10-20 gbps serial links (e.g. MS Catapult SL3).
- Heat sinks: Flip-chip designs have heat sinks on each chip, and wire-bonded QFNs have heat sinks on the PCB backside
- ASIC server rack:
- 42 ASIC servers
- Fans
- ASIC machine room:
- Many ASIC server racks
- Optimization:
- Start with the RTL design of an RCA.
- Use Synopsis IC and PrimeTime to determine RCA specs (power density, perf density, critical paths @ nominal voltage)
- Use ANSYS Icepak (CFD) to determine heat sink dimensions, fin {thickness, number material}, and fan air volume with maximum power dissipation.
- Use power density and CFD to optimize how many RCAs per lane, and how many ASICs per lane (thus the RCAs per ASICs).
- Repeat, optimizing for $ per op/s and W per op/s.
Evaluation
- Try on real-world use cases.
- Consult with "industry veterans."
Conclusion
- When to go ASIC cloud?
We propose the two-for-two rule. If the cost per year (i.e. the TCO) for running the computation on an existing cloud exceeds the NRE by 2X, and you can get at least a 2X TCO per op/s improvement, then going ASIC Cloud is likely to save money.
- Older processes, such as 40nm, can still be valuable, since they are cheaper to implement compared to 28nm ($ per op/s).
- ASIC clouds are promising for doing more computation, faster, with less power.