Job Submission¶
SbatchMan supports launching jobs using a YAML file or through the Python API. This guide will walk you through both methods, allowing you to manage complex job configurations and launch them efficiently.
The first rule is...
Let SbatchMan do the hard work. In your program, you can print all your results directly to the standard output. You can later fetch and parse them easily using SbatchMan Python API.
Launching Jobs using a YAML File¶
This section explains how to use SbatchMan to launch jobs defined in a YAML file. This is particularly useful for managing complex experiments with multiple configurations and parameters.
Important Note
Whatever relative path in the YAML file will be relative to the directory where the sbatchman launch command is run.
The launch Command¶
To launch a batch of jobs, use the launch command with the --file option:
sbatchman launch --file experiments.yaml
Note
The launch command will NOT run again "identical" jobs unless forced or if previous jobs are archived.
Tip
Your experiments.yaml may generate thousands of jobs. To avoid issued with SLURM submission limits or avoid flooding the queue, you can set a limit that tells SbatchMan not to queue more than that amount of jobs at any time. You can set the limit with the sbatchman set-max-jobs CLI command.
Batch File Structure¶
The batch submission file is a YAML file that defines global variables and a list of job templates. SbatchMan will generate a job for each unique combination of parameters.
Top-Level Keys¶
The file has the following main top-level keys:
sequential: If set totrue, will ensure that jobs are scheduled sequentially.variables: Defines global variables applicable to all jobs.cluster_name: Optional parameter that defines the cluster configuration to use.jobs: A list of job templates.
The jobs Block¶
This is a list where each item defines a job template. Each template can have the following keys:
config: The name of the configuration to use. This can be dynamic, using variables (e.g.,gpu_partition_{gpu_number}).command: A command template for the job.preprocess: An optional command to run before the maincommand.postprocess: An optional command to run after the maincommand.variables: A dictionary of variables that apply only to this job template.cluster_name: Optional parameter that defines the cluster configuration to use.config_jobs: A list of variations for this job template. Each variation will generate one or more jobs.
command, preprocess and postprocess can be also set at top-level.
Tip
In general, higher-level variables and blocks can be overwritten by redeclaring them for more specific scopes.
The config_jobs Block¶
Each entry in the config_jobs list defines a specific set of runs for a configuration and must contain a tag. It can also contain:
tag: To define the tag to be assigned to generated jobs.variables: To define or override variables for this specific variation.command: To provide a command that overrides the job template's command.preprocess: To provide a command that sets or overrides the preprocess command.postprocess: To provide a command that sets or overrides the postprocess command.cluster_name: To define or override the cluster configuration for this specific variation.
The variables Block¶
This section defines variables that will be used to generate different job configurations. The final set of jobs is the Cartesian product of all applicable variable values.
Variables can be defined in three ways:
- As a list of values:
variables: float_var: [0.01, 0.001] str_var: ['String1', 'String2'] - As a path to a file: SbatchMan will treat each line in the file as a value for the variable.
variables: dataset: "datasets.txt" - As a path to a directory: SbatchMan will treat each file in the directory as a value for the variable.
If the
variables: dataset: "datasets/"datasets/directory contains:Thendatasets/ └── data1.csv └── data2.csvdatasetwill have two possible values:absolute/path/to/datasets/data1.csvandabsolute/path/to/datasets/data2.csv.
A new variabledataset_filename(in general*_filename) will be automatically generated. This variable will only contain the stem of the file, in this example:data1anddata2.
You can find an example here https://github.com/ThomasPasquali/SbatchManTutorial/blob/main/yaml_files/jobs/dir_var.yaml - As 'per-cluster':
In this example, the
variables: ncpus: default: [1] # 'optional default value' per-cluster: cluster1: [1, 2, 4] cluster2: [1, 2, 4, 8, 16]ncpusvariable value(s) will be automatically selected based on the cluster name set with thesbatchman set-cluster-name <name>command.
Important Note
DO NOT use an absolute path in the definition of job tags.
SbatchMan internally uses tags as part of paths. Including special characters (e.g, /?*$) in the jobs tag, will mess up the path of the directories where SbatchMan will store job results.
The sequential Global Flag¶
When this flag is set (sequential: true), SbatchMan will ensure that only one job is running at the time e.g. job0 ---> job2 --> job3 --> job1 ...
This may be useful for benchmarking i.e. it ensures that jobs using the network do not create noise influencing each other.
Another use-case is building target with different Makefile variables e.g. MYVAR={var} make mytarget.
This is the default behavior when using the local scheduler.
Warning
There is no guarantee about the order of the jobs.
For SLURM, this is internally implemented using the --dependency=afterany:$prev_job_id option (PBS has a similar option)
The command, preprocess, and postprocess Blocks¶
You can specify commands to run before and after your main job using the preprocess and postprocess keys. These can be set globally, per experiment, or per tag, and support variable substitution just like command.
command: The main command to run for the job.preprocess: (Optional) Command to run before the main job.postprocess: (Optional) Command to run after the main job.
Example:
command: python train.py --lr {learning_rate} --data {dataset}
preprocess: echo "Starting job with dataset {dataset}"
postprocess: echo "Finished job with dataset {dataset}"
You can override preprocess and postprocess at any level in the hierarchy, just like command.
Hierarchy and Variable wildcards¶
- Dynamic Names:
conf,tag,command,preprocess,postprocessnames/values can use placeholders (e.g.,my_{nGPUs}gpu_config) to automatically generate a list of distinct values. - Overrides: Each experiment can have its own
command,preprocess,postprocess, orvariablesblock, which will override settings from higher levels in the hierarchy.
Important Note
The number of combinations that dynamic names that will be generated, depend on the variables-dependencies that the names/values have. More about this in the following example.
Example¶
Here is a complete example of a YAML batch submission file:
variables:
dataset: datasets/ # This is a directory; each file name will be used as a value
nGPUs: gpus.txt # This is a file; each line is a value
trials: [100, 200] # List of explicit values
flag: ['--flag1', '--flag2']
# Top-level commands
command: python run.py --input {dataset} --runs {trials} --gpus {nGPUs} {flag}
preprocess: echo "Preparing dataset {dataset}"
postprocess: echo "Cleaning up after {dataset}"
jobs:
- config: my_{nGPUs}gpu_config # Dynamically generate the configuration name
# Uses the global command and variables
config_jobs:
- tag: flag_{flag} # This will run with ['--flag1', '--flag2']
- tag: custom_flag # This will run with only ['--flag3']
variables:
flag: ['--flag3'] # Overwrite top-level flag variable
- config: other_config
# Custom variables for other_config
variables:
trials: [300, 400] # Overwrite top-level trials variable
# Custom command and preprocess for other_config
command: python custom.py --file {dataset} --runs {trials}
preprocess: echo "Custom preprocess for config custom_exp_{dataset}"
# Keep top-level postprocessing
config_jobs:
- tag: custom_program
variables:
dataset: datasets/test/ # Datasets for the custom.py
- tag: custom_program1
variables:
dataset: datasets/test1/
# Overwrite command only for tag custom_program1
command: python custom_1.py --file {dataset} --runs {trials}
What will this example run?¶
Configurations: my_{nGPUs}gpu_config
Variables used:
dataset: fromdatasets/directory- Example files:
data1.csv,data2.csv→ 2 values
- Example files:
nGPUs: fromgpus.txt- Example lines:
1,2→ 2 values
- Example lines:
trials:[100, 200]→ 2 valuesflag:- For tag
flag_{flag}:['--flag1', '--flag2']→ 2 values - For tag
custom_flag:['--flag3']→ 1 value
- For tag
Combinations:
- Tag:
flag_{flag}→2 (datasets) × 2 (GPUs) × 2 (trials) × 2 (flags)= 16 jobs - Tag:
custom_flag→2 (datasets) × 2 (GPUs) × 2 (trials) × 1 (flag)= 8 jobs
✅ Total from this template: 24 jobs
Configuration: other_config
Variables and overrides:
trials:[300, 400]- Custom
commandandpreprocess - Custom
datasetsvariable:custom_program: fromdatasets/test/- Example files:
testA.csv,testB.csv→ 2 values
- Example files:
custom_program1: fromdatasets/test1/- Example files:
sample1.csv,sample2.csv,sample3.csv→ 3 values
- Example files:
Combinations:
- Tag:
custom_program→2 (datasets) × 2 (GPUs) × 2 (flags) × 2 (trials)= 16 jobs - Tag:
custom_program1→3 (datasets) × 2 (GPUs) × 2 (flags) × 2 (trials)= 24 jobs
✅ Total from this template: 40 jobs
✅ Grand Total: 64 jobs
Launching Jobs with the Python API¶
You can launch jobs programmatically using the Python API. This is useful for integrating SbatchMan into larger workflows or scripts.
Launching a Single Job¶
To launch a single job, use the api.launch_job function. You need to provide a configuration name and the command to execute.
import sbatchman as sbm
try:
# Launch a single job using the 'cpu_small' configuration
job = sbm.launch_job(
config_name="cpu_small",
command="python my_script.py --data /path/to/data",
tag="single_run_test" # Optional: to group related jobs
# Cluster name will be automatically detected from your SbatchMan configuration file
)
print(f"Successfully launched job {job.job_id} in {job.exp_dir}")
except Exception as e:
print(f"An error occurred: {e}")
For more details, refer to the API page.
Launching Multiple Jobs from a File¶
The launch_jobs_from_file function takes the path to a YAML file and launches all the jobs defined within it.
from pathlib import Path
import sbatchman as sbm
# Path to your batch jobs file
jobs_file = Path("experiments.yaml")
try:
# Launch the jobs
launched_jobs = sbm.launch_jobs_from_file(jobs_file)
print(f"Successfully launched {len(launched_jobs)} jobs.")
for job in launched_jobs:
print(f" - Job ID: {job.job_id}, Config: {job.config_name}, Tag: {job.tag}")
except Exception as e:
print(f"An error occurred: {e}")
The dry_run parameter allows to get the list of jobs without submitting them.
This way you can programmatically modify the jobs and later submit them using job_submit(Job).
For more details, refer to the API page.