A performance predictor for implementation selection of parallelized static and temporal graph algorithms

Abstract

Abstract Task-based execution of graph workloads allows various ordered and unordered implementations, with tasks representing dependencies between graph vertices and edges. This work explores graph algorithms in the context of ordered and unordered task-based implementations, that trade-off work-efficiency with parallelism. The monotonicity of convergent graph solutions is the reason behind the trade-off between work-efficiency and parallelism. This trade-off results in variable performance-based choices within and across different machines (CPUs and GPUs), graph algorithms, implementations (ordered, relaxed, and unordered). Input graphs also augment this choice space, with this work analyzing temporally changing graphs in addition to the static graphs explored by prior works. These algorithmic and architectural choices are first explored in this work, and it is seen that different graph workload-input combinations perform optimally on diverse architectural configurations. The resulting choice space is analyzed and this work represents it in the form of characteristic variables that correlate with each choice space. Using these characteristic variables, this work proposes analytical and neural network models to correlate these choice spaces to find the best performing implementation. The variables and the prediction models proposed in this work are also integrated with a state-of-the-art performance predictor on a multiaccelerator setup, and shows geometric performance gains of 54% on a CPU, 14% on a GPU, and 31.5% in a multiaccelerator setup over baseline implementations without performance prediction.

Publication
Concurrency and Computation: Practice and Experience