ASIC Clouds: Specializing the Datacenter

From AcaWiki
Jump to: navigation, search

Citation: Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, Michael Bedford (2016/07/18) ASIC Clouds: Specializing the Datacenter. Annual International Symposium on Computer Architecture, ISCA (RSS)
DOI (original publisher): 10.1109/ISCA.2016.25
Semantic Scholar (metadata): 10.1109/ISCA.2016.25
Sci-Hub (fulltext): 10.1109/ISCA.2016.25
Internet Archive Scholar (search for fulltext): ASIC Clouds: Specializing the Datacenter
Tagged: Computer Science (RSS) Computer Architecture (RSS)


Accelerators have been shown to be effective against GPUs on a single node. This motivates the creation of ASIC clouds. The authors propose a generic model for turning accelerators into ASIC clouds while optimizing $ per op/s and W per op/s.

Theoretical and Practical Relevance

ASIC clouds are a promising technology for certain workloads, such as video transcoding, cryptocurrency mining, and neural network inference. I expect to see more of these in the future.


  • Two trends in computing in the past decade:
  • ASIC clouds can satisfy demand given these two trends.
  • How to design optimal ASIC clouds?
    • Determine ASIC parameters, server parameters, datacenter layout parameters
    • Minimizing total cost of ownership (TCO), non-recurring expense (NRE)
  • Applications: Cryptocurrency mining, video transcoding, neural network inference service


  • Model:
    1. Replicated Compute Accelerators (RCA)
    2. ASIC core:
      • multiple RCAs
      • DRAM controllers
      • On-ASIC network
      • Control plane, sends work to RCAs using on-ASIC network.
      • Interface to server: eDRAM, hypertransport
    3. ASIC server: 1-unit tall, 19-inch wide, rack-mounted blades with printed circuit boards (PCB)
      • Multiple ASIC cores
      • Power-supply unit (PSU)
      • General-purpose control processor: FPGA, microcontroller, possibly CPU
      • On-PCB network: could be 4-pin SPI interface, high-bandwidth HyperTransport, RapidIO or QPI links
      • Interface to rack: could be PCI-e (e.g. Convey HC1 and HC2), commodity1/10/40 GigE interfaces, high speed point-to-point 10-20 gbps serial links (e.g. MS Catapult SL3).
      • Heat sinks: Flip-chip designs have heat sinks on each chip, and wire-bonded QFNs have heat sinks on the PCB backside
    4. ASIC server rack:
      • 42 ASIC servers
      • Fans
    5. ASIC machine room:
      • Many ASIC server racks
  • Optimization:
    1. Start with the RTL design of an RCA.
    2. Use Synopsis IC and PrimeTime to determine RCA specs (power density, perf density, critical paths @ nominal voltage)
    3. Use ANSYS Icepak (CFD) to determine heat sink dimensions, fin {thickness, number material}, and fan air volume with maximum power dissipation.
    4. Use power density and CFD to optimize how many RCAs per lane, and how many ASICs per lane (thus the RCAs per ASICs).
    5. Repeat, optimizing for $ per op/s and W per op/s.


  • Try on real-world use cases.
  • Consult with "industry veterans."


  • When to go ASIC cloud?
    • We propose the two-for-two rule. If the cost per year (i.e. the TCO) for running the computation on an existing cloud exceeds the NRE by 2X, and you can get at least a 2X TCO per op/s improvement, then going ASIC Cloud is likely to save money.

  • Older processes, such as 40nm, can still be valuable, since they are cheaper to implement compared to 28nm ($ per op/s).
  • ASIC clouds are promising for doing more computation, faster, with less power.