Job Submission¶

SbatchMan supports launching jobs using a YAML file or through the Python API. This guide will walk you through both methods, allowing you to manage complex job configurations and launch them efficiently.

The first rule is...

Let SbatchMan do the hard work. In your program, you can print all your results directly to the standard output. You can later fetch and parse them easily using SbatchMan Python API.

Launching Jobs using a YAML File¶

This section explains how to use SbatchMan to launch jobs defined in a YAML file. This is particularly useful for managing complex experiments with multiple configurations and parameters.

Important Note

Whatever relative path in the YAML file will be relative to the directory where the sbatchman launch command is run.

The `launch` Command¶

To launch a batch of jobs, use the launch command with the --file option:

sbatchman launch --file experiments.yaml

Note

The launch command will NOT run again "identical" jobs unless forced or if previous jobs are archived.

Tip

Your experiments.yaml may generate thousands of jobs. To avoid issued with SLURM submission limits or avoid flooding the queue, you can set a limit that tells SbatchMan not to queue more than that amount of jobs at any time. You can set the limit with the sbatchman set-max-jobs CLI command.

Batch File Structure¶

The batch submission file is a YAML file that defines global variables and a list of job templates. SbatchMan will generate a job for each unique combination of parameters.

Top-Level Keys¶

The file has the following main top-level keys:

sequential: If set to true, will ensure that jobs are scheduled sequentially.
configs: (optional) Path(s) to configuration file(s). If set, it is equivalent to run sbatchman configure -f <file> --overwrite before the job submission. Note: absolute paths will remain unchanged. Realative paths will use as base directory the location of the file that includes it.
variables: Defines global variables applicable to all jobs.
include_variables(optional) Path(s) to file(s) containing a block of variables (without variables: top-level key). Variable values are overwritten according to the order in which they are included. The current file variables block will always be the last one to be merged. Note: absolute paths will remain unchanged. Realative paths will use as base directory the location of the file that includes it.
cluster_name: Optional parameter that defines the cluster configuration to use.
jobs: A list of job templates.

The `jobs` Block¶

This is a list where each item defines a job template. Each template can have the following keys:

config: The name of the configuration to use. This can be dynamic, using variables (e.g., gpu_partition_{gpu_number}).
command: A command template for the job.
preprocess: An optional command to run before the main command.
postprocess: An optional command to run after the main command.
variables: A dictionary of variables that apply only to this job template.
cluster_name: Optional parameter that defines the cluster configuration to use.
config_jobs: A list of variations for this job template. Each variation will generate one or more jobs.

command, preprocess and postprocess can be also set at top-level.

Tip

In general, higher-level variables and blocks can be overwritten by redeclaring them for more specific scopes.

Tip

In these commands you can always assume that the following environment variables are set: - SBATCHMAN_JOB_DIR -> directory where the job data will be saved. - SBATCHMAN_WD -> directory from which the launch command is run.

The `config_jobs` Block¶

Each entry in the config_jobs list defines a specific set of runs for a configuration and must contain a tag. It can also contain:

tag: To define the tag to be assigned to generated jobs.
variables: To define or override variables for this specific variation.
command: To provide a command that overrides the job template's command.
preprocess: To provide a command that sets or overrides the preprocess command.
postprocess: To provide a command that sets or overrides the postprocess command.
cluster_name: To define or override the cluster configuration for this specific variation.

The `variables` Block¶

This section defines variables that will be used to generate different job configurations. The final set of jobs is the Cartesian product of all applicable variable values.

Variables can be defined in three ways:

As a list of values:

variables:
  float_var: [0.01, 0.001]
  str_var: ['String1', 'String2']

As a path to a file: SbatchMan will treat each line in the file as a value for the variable.
```
variables:
  dataset: "datasets.txt"
```
As a path to a directory: SbatchMan will treat each file in the directory as a value for the variable.
```
variables:
  dataset: "datasets/"
```
If the datasets/ directory contains:
```
datasets/
  └── data1.csv
  └── data2.csv
```
Then dataset will have two possible values: absolute/path/to/datasets/data1.csv and absolute/path/to/datasets/data2.csv.
A new variable dataset_filename (in general *_filename) will be automatically generated. This variable will only contain the stem of the file, in this example: data1 and data2.
You can find an example here https://github.com/ThomasPasquali/SbatchManTutorial/blob/main/yaml_files/jobs/dir_var.yaml

As 'per_cluster':

variables:
  ncpus:
    default: [1] # optional default value
    per_cluster:
      cluster1: [1, 2, 4]
      cluster2: [1, 2, 4, 8, 16]

In this example, the ncpus variable value(s) will be automatically selected based on the cluster name set with the sbatchman set-cluster-name <name> command.

As 'map':

variables:
  type_size:
    default: [4] # optional default value
    map:
      float:  [4]
      double: [8]
      uint32: [4]
      int32:  [4]

The substitution wildcard uses the following syntax: {<map_variable>[<variable_to_be_used_as_key>]} (e.g., {type_size[data_type]}). You can specify multiple values in the values array.

Important Note

DO NOT use an absolute path in the definition of job tags.
SbatchMan internally uses tags as part of paths. Including special characters (e.g, /?*$) in the jobs tag, will mess up the path of the directories where SbatchMan will store job results.

The `sequential` Global Flag¶

When this flag is set (sequential: true), SbatchMan will ensure that only one job is running at the time e.g. job0 ---> job2 --> job3 --> job1 ...

This may be useful for benchmarking i.e. it ensures that jobs using the network do not create noise influencing each other.
Another use-case is building target with different Makefile variables e.g. MYVAR={var} make mytarget.

This is the default behavior when using the local scheduler.

Warning

There is no guarantee about the order of the jobs.
For SLURM, this is internally implemented using the --dependency=afterany:$prev_job_id option (PBS has a similar option)

The `command`, `preprocess`, `postprocess`, and `check` Blocks¶

You can specify commands to run before and after your main job using the preprocess and postprocess keys. These can be set globally, per experiment, or per tag, and support variable substitution just like command.

command: The main command to run for the job. Its exit code will be used to determine the job status.
preprocess: (Optional) Command to run before command.
postprocess: (Optional) Command to run after command.
check: (Optional) Command to run after postprocess. It will override command exit status.

Example:

command:      'python train.py --lr {learning_rate} --data {dataset}"'
preprocess:   'echo "Starting job with dataset {dataset}"'
postprocess:  'echo "Finished job with dataset {dataset}"'
check:        '[[ -f "train_result_{learning_rate}_{dataset}" ]]'

You can override preprocess, postprocess, and check at any level in the hierarchy, just like command.

Hierarchy and Variable wildcards¶

Dynamic Names: conf, tag, command, preprocess, postprocess names/values can use placeholders (e.g., my_{nGPUs}gpu_config) to automatically generate a list of distinct values.
Overrides: Each experiment can have its own command, preprocess, postprocess, or variables block, which will override settings from higher levels in the hierarchy.

Important Note

The number of combinations that dynamic names that will be generated, depend on the variables-dependencies that the names/values have. More about this in the following example.

Example¶

Here is a complete example of a YAML batch submission file:

variables:
  dataset: datasets/   # This is a directory; each file name will be used as a value
  nGPUs: gpus.txt      # This is a file; each line is a value
  trials: [100, 200]   # List of explicit values
  flag: ['--flag1', '--flag2']

# Top-level commands
command: python run.py --input {dataset} --runs {trials} --gpus {nGPUs} {flag}
preprocess: echo "Preparing dataset {dataset}"
postprocess: echo "Cleaning up after {dataset}"

jobs:
  - config: my_{nGPUs}gpu_config # Dynamically generate the configuration name
    # Uses the global command and variables
    config_jobs:
      - tag: flag_{flag} # This will run with ['--flag1', '--flag2']

      - tag: custom_flag # This will run with only ['--flag3']
        variables:
          flag: ['--flag3'] # Overwrite top-level flag variable

  - config: other_config
    # Custom variables for other_config
    variables:
      trials: [300, 400] # Overwrite top-level trials variable

    # Custom command and preprocess for other_config
    command: python custom.py --file {dataset} --runs {trials}
    preprocess: echo "Custom preprocess for config custom_exp_{dataset}"
    # Keep top-level postprocessing

    config_jobs:
      - tag: custom_program
        variables:
          dataset: datasets/test/ # Datasets for the custom.py

      - tag: custom_program1
        variables:
          dataset: datasets/test1/
        # Overwrite command only for tag custom_program1
        command: python custom_1.py --file {dataset} --runs {trials}

What will this example run?¶

Configurations: my_{nGPUs}gpu_config

Variables used:

dataset: from datasets/ directory
- Example files: data1.csv, data2.csv → 2 values
nGPUs: from gpus.txt
- Example lines: 1, 2 → 2 values
trials: [100, 200] → 2 values
flag:
- For tag flag_{flag}: ['--flag1', '--flag2'] → 2 values
- For tag custom_flag: ['--flag3'] → 1 value

Combinations:

Tag: flag_{flag} → 2 (datasets) × 2 (GPUs) × 2 (trials) × 2 (flags) = 16 jobs
Tag: custom_flag → 2 (datasets) × 2 (GPUs) × 2 (trials) × 1 (flag) = 8 jobs

✅ Total from this template: 24 jobs

Configuration: other_config

Variables and overrides:

trials: [300, 400]
Custom command and preprocess
Custom datasets variable:
- custom_program: from datasets/test/
  - Example files: testA.csv, testB.csv → 2 values
- custom_program1: from datasets/test1/
  - Example files: sample1.csv, sample2.csv, sample3.csv → 3 values

Combinations:

Tag: custom_program → 2 (datasets) × 2 (GPUs) × 2 (flags) × 2 (trials) = 16 jobs
Tag: custom_program1 → 3 (datasets) × 2 (GPUs) × 2 (flags) × 2 (trials) = 24 jobs

✅ Total from this template: 40 jobs

✅ Grand Total: 64 jobs

Launching Jobs with the Python API¶

You can launch jobs programmatically using the Python API. This is useful for integrating SbatchMan into larger workflows or scripts.

Launching a Single Job¶

To launch a single job, use the api.launch_job function. You need to provide a configuration name and the command to execute.

import sbatchman as sbm

try:
  # Launch a single job using the 'cpu_small' configuration
  job = sbm.launch_job(
    config_name="cpu_small",
    command="python my_script.py --data /path/to/data",
    tag="single_run_test"      # Optional: to group related jobs
    # Cluster name will be automatically detected from your SbatchMan configuration file
  )
  print(f"Successfully launched job {job.job_id} in {job.exp_dir}")
except Exception as e:
  print(f"An error occurred: {e}")

For more details, refer to the API page.

Launching Multiple Jobs from a File¶

The launch_jobs_from_file function takes the path to a YAML file and launches all the jobs defined within it.

from pathlib import Path
import sbatchman as sbm

# Path to your batch jobs file
jobs_file = Path("experiments.yaml")

try:
  # Launch the jobs
  launched_jobs = sbm.launch_jobs_from_file(jobs_file)
  print(f"Successfully launched {len(launched_jobs)} jobs.")
  for job in launched_jobs:
    print(f"  - Job ID: {job.job_id}, Config: {job.config_name}, Tag: {job.tag}")
except Exception as e:
    print(f"An error occurred: {e}")

The dry_run parameter allows to get the list of jobs without submitting them.
This way you can programmatically modify the jobs and later submit them using job_submit(Job).

For more details, refer to the API page.

Job Submission¶

Launching Jobs using a YAML File¶

The launch Command¶

Batch File Structure¶

Top-Level Keys¶

The jobs Block¶

The config_jobs Block¶

The variables Block¶

The sequential Global Flag¶

The command, preprocess, postprocess, and check Blocks¶

Hierarchy and Variable wildcards¶

Example¶

What will this example run?¶

Launching Jobs with the Python API¶

Launching a Single Job¶

Launching Multiple Jobs from a File¶

The `launch` Command¶

The `jobs` Block¶

The `config_jobs` Block¶

The `variables` Block¶

The `sequential` Global Flag¶

The `command`, `preprocess`, `postprocess`, and `check` Blocks¶