🧩 SbatchMan Internal API¶
If you wish, you can interact directly with SbatchMan internal API.
__all__
module-attribute
¶
__all__ = ['SbatchManError', 'ProjectNotInitializedError', 'ProjectExistsError', 'get_cluster_name', 'get_max_queued_jobs', 'set_max_queued_jobs', 'Job', 'Status', 'init_project', 'SlurmConfig', 'PbsConfig', 'LocalConfig', 'create_local_config', 'create_slurm_config', 'create_pbs_config', 'create_configs_from_file', 'launch_jobs_from_file', 'launch_job', 'job_submit', 'jobs_list', 'jobs_df', 'count_active_jobs', 'archive_jobs', 'delete_jobs', 'archive_job', 'unarchive_job', 'update_jobs_status']
Job
dataclass
¶
Job(config_name: str, cluster_name: str, exp_dir: str, command: str, status: str, scheduler: str, tag: str, job_id: int, queued_timestamp: str, exitcode: Optional[int] = None, preprocess: Optional[str] = None, postprocess: Optional[str] = None, archive_name: Optional[str] = None, variables: Optional[dict[str, Any]] = None, start_timestamp: Optional[str] = None, end_timestamp: Optional[str] = None)
get_job_base_path
¶
get_job_base_path() -> Path
Source code in src/sbatchman/core/job.py
107 108 109 110 111 | |
get_job_config
¶
get_job_config() -> BaseConfig
Returns the configuration of the job. It will specialize the class to either SlurmConfig, LocalConfig or PbsConfig
Source code in src/sbatchman/core/job.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | |
get_job_script_path
¶
get_job_script_path() -> Path
Source code in src/sbatchman/core/job.py
113 114 | |
get_metadata_path
¶
get_metadata_path() -> Path
Returns the path to the metadata.yaml file for this job. If the job is archived, it will return the path in the archive directory. Otherwise, it returns the path in the active experiments directory.
Source code in src/sbatchman/core/job.py
142 143 144 145 146 147 148 | |
get_run_time
¶
Returns the job runtime in seconds, or None if start or end time not available.
Source code in src/sbatchman/core/job.py
209 210 211 212 213 214 215 216 217 218 219 220 | |
get_stderr
¶
Returns the contents of the stderr log file for this job, or None if not found.
Source code in src/sbatchman/core/job.py
132 133 134 135 136 137 138 139 140 | |
get_stderr_path
¶
get_stderr_path() -> Path
Source code in src/sbatchman/core/job.py
129 130 | |
get_stdout
¶
Returns the contents of the stdout log file for this job, or None if not found.
Source code in src/sbatchman/core/job.py
119 120 121 122 123 124 125 126 127 | |
get_stdout_path
¶
get_stdout_path() -> Path
Source code in src/sbatchman/core/job.py
116 117 | |
get_time_in_queue
¶
Returns the time spent in queue in seconds, or None if not queued or start time not available.
Source code in src/sbatchman/core/job.py
195 196 197 198 199 200 201 202 203 204 205 206 207 | |
parse_command_args
¶
Parses the command string if it is a simple CLI command (no pipes, redirections, or shell operators). Returns (executable, args_dict, positional_args) where args_dict maps argument names to values, and positional_args is a list of positional arguments (not associated with any flag).
Source code in src/sbatchman/core/job.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |
write_job_id
¶
write_job_id()
Updates the job_id in the metadata.yaml file. This is used to update the job_id after the job has been submitted.
Source code in src/sbatchman/core/job.py
176 177 178 179 180 181 182 183 184 | |
write_job_status
¶
write_job_status()
Updates the status in the metadata.yaml file.
Source code in src/sbatchman/core/job.py
186 187 188 189 190 191 192 193 | |
write_metadata
¶
write_metadata(override_status=True)
Saves the current job state to its metadata.yaml file.
Source code in src/sbatchman/core/job.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 | |
ProjectExistsError
¶
ProjectExistsError(message='SbatchMan root present already. Enjoy using SbatchMan!')
Bases: SbatchManError
Raised when the SbatchMan root directory is already present.
Source code in src/sbatchman/exceptions.py
13 14 15 | |
ProjectNotInitializedError
¶
ProjectNotInitializedError(message="SbatchMan root not found. Please run 'sbatchman init' or specify a directory.")
Bases: SbatchManError
Raised when the SbatchMan root directory cannot be found.
Source code in src/sbatchman/exceptions.py
7 8 9 | |
archive_job
¶
Archives a single active job to the named archive, creating the archive if it does not already exist.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job
|
Job
|
The Job instance to archive. |
required |
archive_name
|
str
|
Name of the destination archive. The directory
|
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the job's experiment directory cannot be resolved. |
FileNotFoundError
|
If the source experiment directory does not exist. |
Source code in src/sbatchman/core/jobs_manager.py
392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 | |
archive_jobs
¶
archive_jobs(archive_name: str, overwrite: bool = False, cluster_name: Optional[str] = None, config_name: Optional[str] = None, tag: Optional[str] = None, status: Optional[List[Status]] = None) -> List[Job]
Archives jobs matching the filter criteria.
Source code in src/sbatchman/core/jobs_manager.py
357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 | |
count_active_jobs
¶
count_active_jobs() -> int
Counts the number of jobs that are currently queued or running by querying squeue.
Returns:
| Type | Description |
|---|---|
int
|
The number of jobs with QUEUED or RUNNING status. |
Source code in src/sbatchman/core/jobs_manager.py
583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 | |
create_configs_from_file
¶
Parses a YAML file to create a list of job configurations.
This function reads a YAML configuration file, processes variables, and generates a list of configuration objects.
The YAML file structure should be as follows:
- An optional variables section at the root to define substitution
variables. These can be single values, lists, or file paths with
wildcards (glob patterns).
- Cluster names as top-level keys.
- Each cluster must define a scheduler (e.g., 'slurm').
- Each cluster can have a default_conf dictionary to specify common
parameters for all jobs on that cluster.
- Each cluster must have a configs list, where each item is a
dictionary representing a job configuration.
The function expands configurations based on the variables used. If a configuration's name or parameters reference variables that are lists or expand from wildcards, it creates a Cartesian product of all possible variable combinations, generating a distinct configuration for each one.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
Path
|
The path to the YAML configuration file. |
required |
overwrite
|
bool
|
If True, indicates that existing configurations with the same name can be overwritten. Defaults to False. |
False
|
Returns:
| Type | Description |
|---|---|
List[BaseConfig]
|
List[BaseConfig]: A list of fully resolved configuration objects (e.g., SlurmConfig) created from the file. |
Raises:
| Type | Description |
|---|---|
ConfigurationError
|
If the file is not found, contains invalid YAML, or does not adhere to the required structure (e.g., missing 'scheduler' key, root is not a dictionary). |
Source code in src/sbatchman/core/config_manager.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
create_local_config
¶
create_local_config(name: str, cluster_name: Optional[str] = None, env: Optional[List[str]] = None, modules: Optional[List[str]] = None, time: Optional[str] = None, overwrite: bool = False) -> LocalConfig
Creates and saves a configuration file for local execution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the configuration. |
required |
cluster_name
|
Optional[str]
|
The name of the cluster this configuration belongs to. Defaults to the system's hostname. |
None
|
env
|
Optional[List[str]]
|
A list of environment variables to set. |
None
|
modules
|
Optional[List[str]]
|
A list of modules to load in sbatch scripts before running commands. |
None
|
time
|
Optional[str]
|
Walltime (e.g., 01-00:00:00). |
None
|
overwrite
|
bool
|
If True, overwrite an existing configuration with the same name. |
False
|
Returns:
| Type | Description |
|---|---|
LocalConfig
|
The path to the newly created configuration file. |
Source code in src/sbatchman/core/config_manager.py
174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 | |
create_pbs_config
¶
create_pbs_config(name: str, cluster_name: Optional[str] = None, queue: Optional[str] = None, cpus: Optional[int] = None, mem: Optional[str] = None, walltime: Optional[str] = None, env: Optional[List[str]] = None, custom_headers: Optional[List[str]] = None, overwrite: bool = False) -> PbsConfig
Creates and saves a PBS configuration file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the configuration. |
required |
cluster_name
|
Optional[str]
|
The name of the cluster this configuration belongs to. Defaults to the system's hostname. |
None
|
queue
|
Optional[str]
|
The PBS queue to submit the job to. |
None
|
cpus
|
Optional[int]
|
The number of CPUs to request. |
None
|
mem
|
Optional[str]
|
The amount of memory to request (e.g., "16gb", "100mb"). |
None
|
walltime
|
Optional[str]
|
The maximum wall time for the job (e.g., "24:00:00"). |
None
|
env
|
Optional[List[str]]
|
A list of environment variables to set. |
None
|
overwrite
|
bool
|
If True, overwrite an existing configuration with the same name. |
False
|
custom_headers
|
Optional[List[str]]
|
Custom scheduler headers (e.g., ['#SBATCH --my_header=my_value']) |
None
|
Returns:
| Type | Description |
|---|---|
PbsConfig
|
The path to the newly created configuration file. |
Source code in src/sbatchman/core/config_manager.py
200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 | |
create_slurm_config
¶
create_slurm_config(name: str, cluster_name: Optional[str] = None, partition: Optional[str] = None, nodes: Optional[str] = None, ntasks: Optional[str] = None, tasks_per_node: Optional[int] = None, cpus_per_task: Optional[int] = None, mem: Optional[str] = None, account: Optional[str] = None, time: Optional[str] = None, gpus: Optional[int] = None, constraint: Optional[str] = None, nodelist: Optional[Union[str, List[str]]] = None, exclude: Optional[List[str]] = None, qos: Optional[str] = None, reservation: Optional[str] = None, exclusive: Optional[bool] = False, modules: Optional[List[str]] = None, env: Optional[List[str]] = None, custom_headers: Optional[List[str]] = None, overwrite: bool = False) -> SlurmConfig
Creates and saves a SLURM configuration file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the configuration. |
required |
cluster_name
|
Optional[str]
|
The name of the cluster this configuration belongs to. Defaults to the system's hostname. |
None
|
partition
|
Optional[str]
|
The SLURM partition (queue) to submit the job to. |
None
|
nodes
|
Optional[str]
|
The number of nodes to request. |
None
|
ntasks
|
Optional[str]
|
The number of tasks to run. |
None
|
tasks_per_node
|
Optional[int]
|
The number of tasks per node. |
None
|
cpus_per_task
|
Optional[int]
|
The number of CPUs to request per task. |
None
|
mem
|
Optional[str]
|
The amount of memory to request (e.g., "16G", "100M"). |
None
|
account
|
Optional[str]
|
The account to charge for the job. |
None
|
time
|
Optional[str]
|
The maximum wall time for the job (e.g., "01-00:00:00"). |
None
|
gpus
|
Optional[int]
|
The number of GPUs to request. |
None
|
constraint
|
Optional[str]
|
Specific features required for the job's nodes. |
None
|
nodelist
|
Optional[Union[str, List[str]]]
|
A specific list of nodes to use (either a string or a list of strings to concatenate using "," as separator). |
None
|
exclude
|
Optional[List[str]]
|
A specific list of nodes NOT to use. |
None
|
qos
|
Optional[str]
|
The Quality of Service for the job. |
None
|
reservation
|
Optional[str]
|
The reservation to use for the job. |
None
|
exclusive
|
Optional[bool]
|
Enables the --exclusive flag (may not work on some clusters). |
False
|
modules
|
Optional[List[str]]
|
Modules to load with |
None
|
env
|
Optional[List[str]]
|
A list of environment variables to set. |
None
|
overwrite
|
bool
|
If True, overwrite an existing configuration with the same name. |
False
|
custom_headers
|
Optional[List[str]]
|
Custom scheduler headers (e.g., ['#SBATCH --my_header=my_value']) |
None
|
Returns:
| Type | Description |
|---|---|
SlurmConfig
|
The path to the newly created configuration file. |
Source code in src/sbatchman/core/config_manager.py
234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 | |
delete_jobs
¶
delete_jobs(cluster_name: Optional[str] = None, config_name: Optional[str] = None, tag: Optional[str] = None, id: Optional[int] = None, archive_name: Optional[str] = None, archived: bool = False, not_archived: bool = False, status: Optional[List[Status]] = None, variables: Optional[Dict[str, Any]] = None) -> int
Deletes jobs matching the filter criteria.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cluster_name
|
Optional[str]
|
Filter by cluster name. |
None
|
config_name
|
Optional[str]
|
Filter by configuration name. |
None
|
tag
|
Optional[str]
|
Filter by tag. |
None
|
id
|
Optional[int]
|
Filter by id (only one will be deleted). |
None
|
archive_name
|
Optional[str]
|
If provided, only delete jobs from this archive. |
None
|
archived
|
bool
|
If True, delete only archived jobs. |
False
|
not_archived
|
bool
|
If True, delete only active jobs. |
False
|
status
|
Optional[List[Status]]
|
Filter jobs by status. |
None
|
variables
|
Optional[Dict[str, Any]]
|
Filter jobs by variable values. |
None
|
Returns:
| Type | Description |
|---|---|
int
|
The number of deleted jobs. |
Source code in src/sbatchman/core/jobs_manager.py
460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 | |
job_submit
¶
job_submit(job: Job, force: bool = False, previous_job_id: Optional[int] = None, ignore_archived: bool = False, ignore_conf_in_dup_check: bool = False, ignore_commands_in_dup_check: bool = False)
Source code in src/sbatchman/core/launcher.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | |
jobs_df
¶
jobs_df(cluster_name: Optional[str] = None, config_name: Optional[str] = None, tag: Optional[str] = None, include_archived: bool = False) -> DataFrame
Returns a pandas DataFrame of jobs, with optional filtering. Args: cluster_name: Filter by cluster name. config_name: Filter by configuration name. tag: Filter by tag. include_archived: If True, include archived jobs in the DataFrame. Returns: A pandas DataFrame containing job metadata.
Source code in src/sbatchman/core/jobs_manager.py
332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 | |
jobs_list
¶
jobs_list(cluster_name: Optional[str] = None, config_name: Optional[str] = None, tag: Optional[str] = None, status: Optional[List[Status]] = None, archive_name: Optional[str] = None, from_active: bool = True, from_archived: bool = False, update_jobs: bool = True, variables: Optional[Dict[str, Any]] = None) -> List[Job]
Lists active and/or archived jobs, with optional filtering. Updates the status of active jobs by default. Args: cluster_name: Filter by cluster name. config_name: Filter by configuration name. tag: Filter by tag. status: Filter by a set of Status. archive_name: If provided, only include jobs from this archive. from_active: If True, include active jobs. from_archived: If True, include archived jobs. update_jobs: If True, update the status of active jobs before listing. variables: Filter by variable values. Returns: A list of Job objects matching the filter criteria. Raises: ArchiveExistsError: If an archive with the specified name already exists and overwrite is False.
Source code in src/sbatchman/core/jobs_manager.py
176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 | |
launch_job
¶
launch_job(config_name: str, command: str, cluster_name: Optional[str] = None, tag: str = 'notag', preprocess: Optional[str] = None, postprocess: Optional[str] = None, force: bool = False, previous_job_id: Optional[int] = None, variables: Optional[Dict[str, Any]] = None, dry_run: bool = False, max_queued_jobs: Optional[int] = None, ignore_archived: bool = False, ignore_conf_in_dup_check: bool = False, ignore_commands_in_dup_check: bool = False) -> Job
Launches an experiment based on a configuration name. Args: config_name: The name of the configuration to use. command: The command to run for this job. cluster_name: Optional; if not provided, will use the global cluster name. tag: A tag for this experiment run, used in directory structure. preprocess: Optional; a command to run before the main command. postprocess: Optional; a command to run after the main command. previous_job_id: Optional; if this is set, the job will be only launched after the previous is done. max_queued_jobs: Optional; if set, will wait before submitting if the queue has this many jobs. Returns: A Job object representing the launched job. Raises: ConfigurationError: If there is a mismatch in cluster names or if the cluster name is not set. ClusterNameNotSetError: If the cluster name is not set globally and not provided. ConfigurationNotFoundError: If the configuration file does not exist. JobSubmitError: If there is an error during job submission.
Source code in src/sbatchman/core/launcher.py
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 | |
launch_jobs_from_file
¶
launch_jobs_from_file(jobs_file_path: Path, force: bool = False, dry_run: bool = False, filter_tags: Optional[List[str]] = None, filter_variables: Optional[Dict[str, Any]] = None, ignore_archived: bool = False, ignore_conf_in_dup_check: bool = False, ignore_commands_in_dup_check: bool = False) -> List[Job]
Launches jobs based on a YAML configuration file. Args: jobs_file_path: Path to the YAML file containing job definitions. force: If True, will overwrite existing jobs with the same configuration. dry_run: If True, will return the list of jobs but will not launch them (warning: won't work for sequential jobs) filter_tags: If provided, only launch jobs whose tag matches one of these values. filter_variables: If provided, only launch jobs where variables match all key=value pairs. Returns: A list of Job objects representing the launched jobs. Raises: ConfigurationError: If the jobs file is not found or has invalid syntax.
Source code in src/sbatchman/core/launcher.py
521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 | |
unarchive_job
¶
unarchive_job(job: Job) -> None
Moves a single archived job back to the active experiments directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job
|
Job
|
The Job instance to restore. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the job has no archive_name or no exp_dir. |
FileNotFoundError
|
If the archived experiment directory does not exist. |
Source code in src/sbatchman/core/jobs_manager.py
428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 | |
update_jobs_status
¶
update_jobs_status() -> int
Updates the status of active jobs on the current cluster by querying the scheduler.
Returns:
| Type | Description |
|---|---|
int
|
The number of jobs whose status was updated. |
Source code in src/sbatchman/core/jobs_manager.py
562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 | |