HELIX: Automatic Parallelization of Irregular Programs for Chip Multiprocessing

Simone Campanoni, Timothy Jones, Glenn Holloway, Vijay Janapa Reddi, Gu-Yeon Wei, David Brooks

Code Generation and Optimization (CGO), March, 2012

We describe and evaluate HELIX, a new technique for automatic loop parallelization that assigns successive iterations of a loop to separate threads. We show that the inter-thread communication costs forced by loop-carried data dependences can be mitigated by code optimization, by using an effective heuristic for selecting loops to parallelize, and by using helper threads to prefetch synchronization signals. We have implemented HELIX as part of an optimizing compiler framework that automatically selects and parallelizes loops from general sequential programs. The framework uses an analytical model of loop speedups, combined with profile data, to choose loops to parallelize. On a six-core Intel Core i7-980X, HELIX achieves speedups averaging 2.25, with a maximum of 4.12, for thirteen C benchmarks from SPEC CPU2000.

[ Paper ] [ Talk ]

@inproceedings{Campanoni:2012:HAP:2259016.2259028,
 author = {Simone Campanoni and
           Timothy Jones and 
           Glenn Holloway and 
           Vijay Janapa Reddi and 
           Gu-Yeon Wei and 
           David Brooks},
 title = {HELIX: Automatic Parallelization of Irregular Programs for Chip Multiprocessing},
 booktitle = {Proceedings of the Tenth International Symposium on Code Generation and Optimization},
 series = {CGO '12},
 year = {2012},
 isbn = {978-1-4503-1206-6},
 location = {San Jose, California},
 pages = {84--93},
 numpages = {10},
 url = {http://doi.acm.org/10.1145/2259016.2259028},
 doi = {10.1145/2259016.2259028},
 acmid = {2259028},
 publisher = {ACM},
 address = {New York, NY, USA},
}