Molecular dynamics on 1000's of GPUs

We have extended HOOMD-blue to run on thousand's of GPUs. The new domain decomposition scheme was benchmarked on the Titan computer, on Blue Waters and on the Wilkes Cluster with GPUDIrect RDMA. With the new capabilities, released in HOOMD-blue 1.0, simulations can now achieve up to 50x speed up and more on GPU supercomputers and clusters. In our paper in Computer Physics Communications we describe in detail how the scaling and MPI communication has been implemented in the code.