-
Balance1 (source):
inner loops get smaller ("triangular matrix")
unbalanced distribution of workload
-
Balance2 (source):
SCHEDULE(STATIC, blocksize)
partition index range in blocks of fixed length
(blocksize)
blocks are dealt out to threads in round robin way
-
Balance3 (source):
SCHEDULE(DYNAMIC, blocksize)
partition index range in blocks of fixed length
blocksize (as above)
when a thread completes a block, it gets the next one
important for loops with strongly varying times per iteration
-
(even better load balancing in our example)