Launch Plugin API
Overview
This document describes the launch plugin that is responsible for launching a parallel task in SLURM and the API that defines them. It is intended as a resource to programmers wishing to write their own launch plugin.
const char plugin_name[]="launch SLURM plugin"
const char
plugin_type[]="launch/slurm"
- aprunUse Cray's aprun command to launch tasks - used on Cray systems with ALPS installed.
- poeUse IBM's poe command to launch tasks - used on systems IBM's parallel environment (PE) installed.
- runjobUse IBM's runjob command to launch tasks - used on BlueGene/Q systems.
- slurmUse SLURM's default launching infrastructure
The programmer is urged to study src/plugins/launch/slurm/launch_slurm.c for a sample implementation of a SLURM launch plugin.
API Functions
int init (void)
Description:
Called when the plugin is loaded, before any other functions are
called. Put global initialization here.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void fini (void)
Description:
Called when the plugin is removed. Clear any allocated storage here.
Returns: None.
Note: These init and fini functions are not the same as those described in the dlopen (3) system library. The C run-time system co-opts those symbols for its own initialization. The system _init() is called before the SLURM init(), and the SLURM fini() is called before the system's _fini().
int launch_p_setup_srun_opt(char **rest)
Description:
Sets up the srun operation.
Arguments:
rest: extra parameters on the
command line not processed by srun
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int launch_p_handle_multi_prog_verify(int command_pos)
Description:
Is called to verify a multi-prog file if verifying needs to be done.
Arguments:
command_pos: to be used with
global opt variable to tell which spot the command is in opt.argv.
Returns:
1 if handled, or
0 if not.
int launch_p_create_job_step(srun_job_t *job, bool use_all_cpus, void (*signal_function)(int), sig_atomic_t *destroy_job)
Description:
Creates the job step.
Arguments:
job: the job to run.
use_all_cpus: choice whether to use
all cpus.
signal_function: function that
handles the signals coming in.
destroy_job: pointer to a global
flag signifying if the job was canceled while allocating.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
launch_p_step_launch(srun_job_t *job, slurm_step_io_fds_t *cio_fds, uint32_t *global_rc)
Description:
Launches the job step.
Arguments:
job: the job to launch.
cio_fds: filled in io descriptors
global_rc: srun global return code.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int launch_p_step_wait(srun_job_t *job, bool got_alloc)
Description:
Waits for the job to be finished.
Arguments:
job: the job to wait for.
got_alloc: if the resource
allocation was created inside srun.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int launch_p_step_terminate(void)
Description:
Terminates the job step.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void launch_p_print_status(void)
Description:
Gets the status of the job.
void launch_p_fwd_signal(int signal)
Description:
Sends a forward signal to any underlying tasks.
Arguments:
signal: the signal that needs to be sent.
Last modified 8 May 2014