Introduction
The above trade-off is achieved using Task Granularity which is defined as the ratio between task computation time(R) and task communication overhead(C). Fine-grain tasks correspond to small (R/C) and coarse-grain tasks for high (R/C). Since there is a direct correlation between program efficiency and granularity, the technique for optimization is to cluster a set of fine-grain tasks into coarser-grain partitions.
Partitioning algorithms
The goal is to eliminate the overhead caused by intertask communication while maintaining a high degree of parallelism. There are 3 generic classes:
Critical Path partitioning :
Since fine-grain tasks relying on the critical path of the program have
to be executed sequentially anyway, we identify those nodes on such a path and cluster
them into a partition, which can be assigned to a PE for execution. This method is
called Vertical Layered Partioning.
Here we start with a fine-grain DAG, sort it into disjoint horizontal layers where nodes in such layers are executed in parallel, while maintaining the linear precedence among the nodes. Acritical path of the DAG is now identified and the nodes are clustered into a vertical layer, and mark the nodes as processed. We then repetetively carry out this step until no unprocessed nodes exist.
Communication Delay elimination partitioning : Here we use the communication overhead as a criteria for clustering tasks into partitions. The approach is to cluster the node as well as its succesors, provided the communication overhead is not prolonged. However we note that the concurrence among the succesors of a node is lost due to clustering. Hence we adopt the above technique, only when the saving in the communication overhead outweighs the serialization of parallel tasks. The advantage of this scheme is that it provides an upper bound on the number of allocated processors.
Task Duplication partitioning
Task duplication allows one to eliminate communication costs by duplicating the tasks among the PEs. This not only eliminates communication costs but also preserves the original program parallelism. The disadvantages are due to increased space requirements and increased synchronization overheads.