Citation: Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, Michael Bedford (2016/07/18) ASIC Clouds: Specializing the Datacenter. Annual International Symposium on Computer Architecture, ISCA (RSS)
DOI (original publisher): 10.1109/ISCA.2016.25
Semantic Scholar (metadata): 10.1109/ISCA.2016.25
Sci-Hub (fulltext): 10.1109/ISCA.2016.25
Internet Archive Scholar (search for fulltext): ASIC Clouds: Specializing the Datacenter
Download: https://ieeexplore.ieee.org/abstract/document/7551392
Tagged: Computer Science (RSS) Computer Architecture (RSS)

Summary

Accelerators have been shown to be effective against GPUs on a single node. This motivates the creation of ASIC clouds. The authors propose a generic model for turning accelerators into ASIC clouds while optimizing $ per op/s and W per op/s.

Theoretical and Practical Relevance

ASIC clouds are a promising technology for certain workloads, such as video transcoding, cryptocurrency mining, and neural network inference. I expect to see more of these in the future.

Problem

Two trends in computing in the past decade:
- split between mobile and cloud
- dark silicon (see Dark silicon and the end of multicore scaling)
ASIC clouds can satisfy demand given these two trends.
How to design optimal ASIC clouds?
- Determine ASIC parameters, server parameters, datacenter layout parameters
- Minimizing total cost of ownership (TCO), non-recurring expense (NRE)
Applications: Cryptocurrency mining, video transcoding, neural network inference service

Solution

Model:
1. Replicated Compute Accelerators (RCA)
2. ASIC core:
  - multiple RCAs
  - DRAM controllers
  - On-ASIC network
  - Control plane, sends work to RCAs using on-ASIC network.
  - Interface to server: eDRAM, hypertransport
3. ASIC server: 1-unit tall, 19-inch wide, rack-mounted blades with printed circuit boards (PCB)
  - Multiple ASIC cores
  - Power-supply unit (PSU)
  - General-purpose control processor: FPGA, microcontroller, possibly CPU
  - On-PCB network: could be 4-pin SPI interface, high-bandwidth HyperTransport, RapidIO or QPI links
  - Interface to rack: could be PCI-e (e.g. Convey HC1 and HC2), commodity1/10/40 GigE interfaces, high speed point-to-point 10-20 gbps serial links (e.g. MS Catapult SL3).
  - Heat sinks: Flip-chip designs have heat sinks on each chip, and wire-bonded QFNs have heat sinks on the PCB backside
4. ASIC server rack:
  - 42 ASIC servers
  - Fans
5. ASIC machine room:
  - Many ASIC server racks
Optimization:
1. Start with the RTL design of an RCA.
2. Use Synopsis IC and PrimeTime to determine RCA specs (power density, perf density, critical paths @ nominal voltage)
3. Use ANSYS Icepak (CFD) to determine heat sink dimensions, fin {thickness, number material}, and fan air volume with maximum power dissipation.
4. Use power density and CFD to optimize how many RCAs per lane, and how many ASICs per lane (thus the RCAs per ASICs).
5. Repeat, optimizing for $ per op/s and W per op/s.

Evaluation

Try on real-world use cases.
Consult with "industry veterans."

Conclusion

When to go ASIC cloud?
- We propose the two-for-two rule. If the cost per year (i.e. the TCO) for running the computation on an existing cloud exceeds the NRE by 2X, and you can get at least a 2X TCO per op/s improvement, then going ASIC Cloud is likely to save money.
Older processes, such as 40nm, can still be valuable, since they are cheaper to implement compared to 28nm ($ per op/s).
ASIC clouds are promising for doing more computation, faster, with less power.

ASIC Clouds: Specializing the Datacenter

Summary

Theoretical and Practical Relevance

Problem

Solution

Evaluation

Conclusion

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

New

Discussion

Help

Tools