A parallel hashed Oct-Tree N-body algorithm
Citation: Micheal S. Warren, John K. Salmon (1993/10) A parallel hashed Oct-Tree N-body algorithm. ACM/IEEE conference on Supercomputing (RSS)
DOI (original publisher): 10.1145/169627.169640
Semantic Scholar (metadata): 10.1145/169627.169640
Sci-Hub (fulltext): 10.1145/169627.169640
Internet Archive Scholar (fulltext): A parallel hashed Oct-Tree N-body algorithm
Tagged: Computer Science (RSS) Parallel Systems (RSS), Scientific Computing (RSS)
The N-body problem requires calculating body-to-body interactions to solve exactly. Instead, one can form an algorithm by clustering bodies together, and considering the cluster as a single representative body, with series corrections. However, efficient implementations require an oct-tree data-structure and a novel oct-tree hashing scheme.
Theoretical and practical relevance:
Oct-tree hashing unlocks the door for efficient N-body simulations and other problems in computational physics.
- Simulate the N-body problem
- "Force" can be gravitational, Coulombic, or anything else that is particle-to-particle dependent.
- Exact solution involves interactions. Thus, we need to use approximations.
- Evaluate the field at every point on a grid.
- Making the grid high-precision is expensive.
- Group particles into hierarchical cells, treat the cells as a single bigger particle, with some truncated series corrections. Since the division is hierarchical, recurse within each cell.
- In general, this is more accurate when the cells are smaller, other cells are far away, and the cell has smaller moments (the moment is a measure of how heavy the cell is far away from its center).
- How to split particles to maintain precision without sacrificing speed?
- Evaluate the field at every point on a grid.
- Appel's method: partitions particles into spherical cells, .
- Apparently B-H is slower in theory but faster in practice.
- Barnes-Hut method (B-H): partitions particles into oct-trees, .
- No error bound before this paper.
- No efficient implementation before this paper.
- Fast Multipole Method (FMM):
- Also fast in theory, slow in practice. The "crossover point" when FMM beats the naïve is 70 thousand particles.
- B-H error bound: the particle division and error bound has to be "data-dependent" or else it will be really high.
- This paper derives such a bound.
- But if the division is data-dependent, it is difficult to know which non-local cells are required before computation.
- Remember on supercomputers, the memory is distributed across tens of thousands of nodes. They use message-passing.
- Instead, make it easier/faster to retrieve non-local data during computation.
- Prior work used an Oct-tree. An oct-tree is a 3-D extension of a quad-tree. In a quad-tree, each node has a 2-D coordinate. Each node has 0 or 4 children: one must lie northeast of the parent, one northwest, one southeast, and one southwest. As such, they hierarchically partition 2-D space.
- Prior work stores this table as a pointer to a block containing child pointers and a data pointer. One has to do many pointer dereferences to get to the leaves.
- This work turns each coordinate into a key and uses a hashtable to map keys to data. The hashing function alluded to in the title is the main-contribution of this paper.
- The hash intertwines the digits of their coordinates and truncates the most-significant digits.
- This way, children can access their parent, by just truncating their digits Parents can get to their children by appending some digits (appending 100 gets the upper-deck northeastern child, 001 gets the lower-deck southeastern child, etc.).
- Hashing uses the least-significant digits, so two points with the same hash (AKA a hash-collision) will be separated by a distance of at least . This the opposite of locality-sensitive hashing. This way, collisions are handled by different machines, which spreads out the load.
- Counting up in digits iterates over every key in depth-first order. This sequence of points creates a space-filling fractal called, the Z-order curve.
- Care must be taken to balance the oct-tree, to have good performance. One can sort the list of keys, and cut it in halves recursively, weighted by how much work we think each node will be.
- Unfortunately, the Z-order curve has spatial discontinuities, so two points can be close in Z-order, but be far away in space. Maybe use Hilbert curve. So one processor's allocation of cells may interact with a lot of other processors.
- In practice, iteration in breadth-first order will hide the latency of requesting spatially-disparate children; the requests can go in parallel to (probably) different processors.
- These techniques can also be used for fluid mechanics.