SimGrid  3.12
Versatile Simulation of Distributed Systems
Introduction to SimGrid

SimGrid is a toolkit that provides core functionalities for the simulation of distributed applications in heterogeneous distributed environments.

The specific goal of the project is to facilitate research in the area of distributed and parallel application scheduling on distributed computing platforms ranging from simple network of workstations to Computational Grids.

Scenario

The goal of this practical session is to illustrate various usage of the MSG interface. To this end we will use the following simple setting:

Assume we have a (possibly large) bunch of (possibly large) data to process and which originally reside on a server (a.k.a. master). For sake of simplicity, we assume all input file require the same amount of computation. We assume the server can be helped by a (possibly large) set of worker machines. What is the best way to organize the computations ?

Although this looks like a very simple setting it raises several interesting questions:

  • Which algorithm should the master use to send workload?

    The most obvious algorithm would be to send tasks to workers in a round-robin fashion. This is the initial code we provide you.

    A less obvious but probably more efficient approach would be to set up a request mechanism where a client first ask for tasks, which allows the server to decide which request to answer and possibly to send the tasks to the fastest machines. Maybe you can think of a smarter mechanism...

  • How many tasks should the client ask for?

    Indeed, if we set up a request mechanism so that workers only send request whenever they have no more task to process, they are likely to be poorly exploited since they will have to wait for the master to consider their request and for the input data to be transferred. A client should thus probably request a pool of tasks but if it requests too many tasks, it is likely to lead to a poor load-balancing...

  • How is the quality of such algorithm dependent on the platform characteristics and on the task characteristics?

    Whenever the input communication time is very small compared to processing time and workers are homogeneous, it is likely that the round-robin algorithm performs very well. Would it still hold true when transfer time is not negligible and the platform is, say, a volunteer computing system ?

  • The network topology interconnecting the master and the workers may be quite complicated. How does such a topology impact the previous result?

    When data transfers are the bottleneck, it is likely that a good modeling of the platform becomes essential. In this case, you may want to be able to account for complex platform topologies.

  • Do the algorithms depend on a perfect knowledge of this topology?

    Should we still use a flat master worker deployment or should we use a

  • How is such an algorithm sensitive to external workload variation?

    What if bandwidth, latency and power can vary with no warning? Shouldn't you study whether your algorithm is sensitive to such load variations?

  • Although an algorithm may be more efficient than another, how does it interfere with other applications?

    As you can see, this very simple setting may need to evolve way beyond what you initially imagined.

    Premature optimization is the root of all evil. – D.E.Knuth

    Furthermore, writing your own simulator is much harder than you may imagine. This is why you should rely on an established and flexible one.

The following figure is a screenshot of triva visualizing a SimGrid simulation of two master worker applications (one in light gray and the other in dark gray) running in concurrence and showing resource usage over a long period of time.

sc3-description.png
Test

Prerequisites

Of course, you need to install SimGrid before taking this tutorial. Please refer to the relevant Section: Installing Simgrid.

Tutorials

A lot of information on how to install and use Simgrid are provided by the online documentation and by several tutorials:

Recommended Steps

Installing Viva

This software will be useful to make fancy graph or treemap visualizations and get a better understanding of simulations. You will first need to install pajeng:

1 sudo apt-get install git cmake build-essential libqt4-dev libboost-dev freeglut3-dev ;
2 git clone https://github.com/schnorr/pajeng.git
3 cd pajeng && mkdir -p build && cd build && cmake ../ -DCMAKE_INSTALL_PREFIX=$HOME && make -j install
4 cd ../../

Then you can install viva.

1 sudo apt-get install libboost-dev libconfig++-dev libconfig8-dev libgtk2.0-dev freeglut3-dev
2 git clone https://github.com/schnorr/viva.git
3 cd viva && mkdir -p build_graph && cd build_graph && cmake ../ -DTUPI_LIBRARY=ON -DVIVA=ON -DCMAKE_INSTALL_PREFIX=$HOME && make -j install
4 cd ../../

Installing Paje

This software provides a Gantt-chart visualization.

1 sudo apt-get install paje.app

Installing Vite

This software provides a Gantt-chart visualization.

1 sudo apt-get install vite

Let's get started

Setting up and Compiling

The corresponding archive with all source files and platform files can be obtained here.

1 tar zxf msg-tuto.tgz
2 cd msg-tuto/src
3 make

As you can see, there is already a nice Makefile that compiles everything for you. Now the tiny example has been compiled and it can be easily run as follows:

1 ./masterworker0 platforms/platform.xml deployment0.xml 2>&1

If you create a single self-content C-file named foo.c, the corresponding program will be simply compiled and linked with SimGrid by typing:

1 make foo

For a more "fancy" output, you can try:

1 ./masterworker0 platforms/platform.xml deployment0.xml 2>&1 | simgrid-colorizer

For a really fancy output, you should use viva/triva:

1 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:yes \
2  --cfg=tracing/uncategorized:yes --cfg=viva/uncategorized:uncat.plist
3 LANG=C ; viva simgrid.trace uncat.plist

For a more classical Gantt-Chart visualization, you can produce a Paje trace:

1 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:yes \
2  --cfg=tracing/msg/process:yes
3 LANG=C ; Paje simgrid.trace

Alternatively, you can use vite.

1 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:yes \
2  --cfg=tracing/msg/process:yes --cfg=tracing/basic:yes
3 vite simgrid.trace

Getting Rid of Workers in the Deployment File

In the previous example, the deployment file deployment0.xml is tightly connected to the platform file platform.xml and a worker process is launched on each host:

1 <?xml version='1.0'?>
2 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
3 <platform version="3">
4 
5  <process host="Tremblay" function="master">
6  <argument value="20"/>
7  <argument value="50000000"/>
8  <argument value="1000000"/>
9  <argument value="Jupiter"/>
10  <argument value="Fafard"/>
11  <argument value="Ginette"/>
12  <argument value="Bourassa"/>
13  <argument value="Tremblay"/>
14  </process>
15 
16  <process host="Tremblay" function="worker" on_failure="RESTART"/>
17  <process host="Jupiter" function="worker" on_failure="RESTART"/>
18  <process host="Fafard" function="worker" on_failure="RESTART"/>
19  <process host="Ginette" function="worker" on_failure="RESTART"/>
20  <process host="Bourassa" function="worker" on_failure="RESTART"/>
21 </platform>

This is ok as the platform is rather small but will be painful when using larger platforms. Instead, modify the simulator masterworker0.c into masterworker1.c so that the master launches a worker process on all the other machines at startup. The new deployment file deployment1.xml should thus now simply be:

1 <?xml version='1.0'?>
2 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
3 <platform version="3">
4 
5  <process host="Tremblay" function="master">
6  <argument value="20"/>
7  <argument value="50000000"/>
8  <argument value="1000000"/>
9  </process>
10 </platform>

To this end you may need the following MSG functions (click on the links to see their descriptions):

msg_process_t MSG_process_create(const char *name, xbt_main_func_t code,
void *data, msg_host_t host);
Note
It may avoid bugs later to avoid launching a worker on the master host so you probably want to remove it from the host list.

The data field of the MSG_process_create can be used to pass a channel name that will be private between master and workers (e.g., master_name:worker_name). Adding the master_name in the channel name will allow to easily have several masters and a worker per master on each machine. To this end, you may need to use the following functions:

If you are not too familiar with string manipulation in C, you may want to use the following functions (see the C reference for details):

char *strcpy(char *dest, const char *src);
char *strcat(char *dest, const char *src);

Setting up a Time Limit Mechanism

In the current version, the number of tasks is defined through the worker arguments. Hence, tasks are created at the very beginning of the simulation. Instead, create tasks as needed and provide a time limit indicating when it stops sending tasks. To this end, you will obviously need to know what time it is:

double MSG_get_clock(void);

Otherwise, a quite effective way of terminating the simulation would be to use some of the following functions:

int MSG_process_killall(int reset_PIDs);

Anyway, the new deployment deployment2.xml file should thus look like this:

1 <?xml version='1.0'?>
2 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
3 <platform version="3">
4  <process host="Tremblay" function="master">
5  <argument value="3600"/>
6  <argument value="50000000"/>
7  <argument value="1000000"/>
8  </process>
9 </platform>

It may also be a good idea to transform most of the XBT_INFO into XBT_DEBUG (e.g., keep the information on the total number of tasks processed). These debug messages can be activated as follows:

1 ./masterworker2 platforms/platform.xml deployment2.xml --log=msg_test.thres:debug

Using the Tracing Mechanism

SimGrid can trace all resource consumption and the outcome can be displayed with viva as illustrated in the section intro_setup. However, when several masters are deployed, it is hard to understand what happens.

1 <?xml version='1.0'?>
2 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
3 <platform version="3">
4  <process host="Tremblay" function="master">
5  <argument value="3600"/>
6  <argument value="50000000"/>
7  <argument value="10"/>
8  </process>
9  <process host="Fafard" function="master">
10  <argument value="3600"/>
11  <argument value="50000000"/>
12  <argument value="10"/>
13  </process>
14  <process host="Jupiter" function="master">
15  <argument value="3600"/>
16  <argument value="50000000"/>
17  <argument value="10"/>
18  </process>
19 </platform>

So let's use categories to track more precisely who does what and when:

void TRACE_category(const char *category);
void MSG_task_set_category (msg_task_t task, const char *category);

The outcome can then be visualized as follows:

1 ./masterworker3 platforms/platform.xml deployment3.xml --cfg=tracing:yes\
2  --cfg=tracing/categorized:yes --cfg=viva/categorized:viva_cat.plist
3 LANG=C; viva simgrid.trace viva_cat.plist

Right now, you should realize that nothing is behaving like you expect. Most workers are idle even though input data are ridiculous and there are several masters deployed on the platform. Using a Gantt-chart visualization may help:

1 ./masterworker3 platforms/platform.xml deployment3.xml --cfg=tracing:yes \
2  --cfg=tracing/msg/process:yes
3 LANG=C; Paje simgrid.trace

OK, so it should now be obvious that round robin is actually very bad.

Improving the Scheduling

Instead of a round-robin scheduling, let's implement a first-come first-served mechanism. To this end, workers need to send a tiny request first. A possible way to implement such a request with MSG is to send on a specific channel (e.g., the name of the master name) a task with payload 0 and whose attached data is the worker name. This way, the master can keep track of which workers are idle and willing to work.

To know whether it has pending requests, the master can use the following [function][fn:7]:

int MSG_task_listen(const char *alias);

If so, it should get the request and push the corresponding host into a dynar so that they can later be retrieved when sending a real [task][fn:7].

xbt_dynar_t xbt_dynar_new(const unsigned long elm_size,
void_f_pvoid_t const free_f);
void xbt_dynar_push(xbt_dynar_t const dynar, const void *src);
void xbt_dynar_shift(xbt_dynar_t const dynar, void *const dst);
unsigned long xbt_dynar_length(const xbt_dynar_t dynar);

As you will soon realize, with such simple mechanisms, simple deadlocks will soon appear. They can easily be removed with a simple polling mechanism, hence the need for the following [function][fn:7]:

As you should quickly realize, on the simple previous example, it will double the throughput of the platform but will be quite ineffective when input size of the tasks is not negligible anymore.

From this, many things can easily be added. For example, you could:

  • add a performance measurement mechanism;
  • enable the master to make smart scheduling choices using measurement information;
  • allow workers to have several pending requests so as to overlap communication and computations as much as possible;
  • ...

Using More Elaborate Platforms

SimGrid offers a rather powerful platform modeling mechanism. The src/examples/platforms/ repository comprises a variety of platforms ranging from simple to elaborate. Associated to a good visualization tool to ensure your simulation is meaningful, they can allow you to study to which extent your algorithm scales...

What is the largest number of tasks requiring 50e6 flops and 1e5 bytes that you manage to distribute and process in one hour on g5k.xml (you should use deployment_general.xml)?

TODO: Points to improve for the next time

  • Propose equivalent exercises and skeleton in java.
  • Propose a virtualbox image with everything (simgrid, paje, viva, ...) already set up.
  • Ease the installation on mac OS X (binary installer) and windows.
  • Explain that programming in C or java and having a working development environment is a prerequisite.