A parallel hashed Oct-Tree N-body algorithm

From AcaWiki
Jump to: navigation, search


Citation: Micheal S. Warren, John K. Salmon (1993/10) A parallel hashed Oct-Tree N-body algorithm. ACM/IEEE conference on Supercomputing (RSS)
DOI: 10.1145/169627.169640
Semantic Scholar: 10.1145/169627.169640
Download: https://dl.acm.org/doi/10.1145/169627.169640
Tagged: Computer Science (RSS) Parallel Systems (RSS), Scientific Computing (RSS)


Summary:

The N-body problem requires calculating body-to-body interactions to solve exactly. Instead, one can form an algorithm by clustering bodies together, and considering the cluster as a single representative body, with series corrections. However, efficient implementations require an oct-tree data-structure and a novel oct-tree hashing scheme.

Theoretical and practical relevance:

Oct-tree hashing unlocks the door for efficient N-body simulations and other problems in computational physics.



Problem

  1. Simulate the N-body problem
    • "Force" can be gravitational, Coulombic, or anything else that is particle-to-particle dependent.
  2. Exact solution involves interactions. Thus, we need to use approximations.
    • Evaluate the field at every point on a grid.
      • Making the grid high-precision is expensive.
    • Group particles into hierarchical cells, treat the cells as a single bigger particle, with some truncated series corrections. Since the division is hierarchical, recurse within each cell.
      • In general, this is more accurate when the cells are smaller, other cells are far away, and the cell has smaller moments (the moment is a measure of how heavy the cell is far away from its center).
      • How to split particles to maintain precision without sacrificing speed?

Other solutions

  • Appel's method: partitions particles into spherical cells, .
    • Apparently B-H is slower in theory but faster in practice.
  • Barnes-Hut method (B-H): partitions particles into oct-trees, .
    • No error bound before this paper.
    • No efficient implementation before this paper.
  • Fast Multipole Method (FMM):
    • Also fast in theory, slow in practice. The "crossover point" when FMM beats the naïve is 70 thousand particles.

Contributions

  • B-H error bound: the particle division and error bound has to be "data-dependent" or else it will be really high.
    • This paper derives such a bound.
  • But if the division is data-dependent, it is difficult to know which non-local cells are required before computation.
    • Remember on supercomputers, the memory is distributed across tens of thousands of nodes. They use message-passing.
  • Instead, make it easier/faster to retrieve non-local data during computation.
    • Prior work used an Oct-tree. An oct-tree is a 3-D extension of a quad-tree. In a quad-tree, each node has a 2-D coordinate. Each node has 0 or 4 children: one must lie northeast of the parent, one northwest, one southeast, and one southwest. As such, they hierarchically partition 2-D space.
    • Prior work stores this table as a pointer to a block containing child pointers and a data pointer. One has to do many pointer dereferences to get to the leaves.
    • This work turns each coordinate into a key and uses a hashtable to map keys to data. The hashing function alluded to in the title is the main-contribution of this paper.
      • The hash intertwines the digits of their coordinates and truncates the most-significant digits.
      • Binary coordinate representation for oct-tree hashing.png
      • This way, children can access their parent, by just truncating their digits Parents can get to their children by appending some digits (appending 100 gets the upper-deck northeastern child, 001 gets the lower-deck southeastern child, etc.).
      • Hashing uses the least-significant digits, so two points with the same hash (AKA a hash-collision) will be separated by a distance of at least . This the opposite of locality-sensitive hashing. This way, collisions are handled by different machines, which spreads out the load.
      • Counting up in digits iterates over every key in depth-first order. This sequence of points creates a space-filling fractal called, the Z-order curve.
      • Z-curve.svg.png
    • Care must be taken to balance the oct-tree, to have good performance. One can sort the list of keys, and cut it in halves recursively, weighted by how much work we think each node will be.
      • Unfortunately, the Z-order curve has spatial discontinuities, so two points can be close in Z-order, but be far away in space. Maybe use Hilbert curve. So one processor's allocation of cells may interact with a lot of other processors.
    • In practice, iteration in breadth-first order will hide the latency of requesting spatially-disparate children; the requests can go in parallel to (probably) different processors.
  • These techniques can also be used for fluid mechanics.