Skip to content

Campaigns

Do you need to run multiple applications/benchmarks together?
If so, this is what you need!

Key Features - Multi-cluster execution - Multi-application execution - Resume interrupted campaigns - Automatic job monitoring - Centralized logging - Results collection

Note

With multi-cluster means that SbatchMan will run the same campaign with different cluster names. It does NOT run the campaigns on remote systems via SSH. This is useful in case you cluster is a collection of machines with diverse characteristics.

CLI Example

sbatchman campaign campaign.yaml -c cluster-a -c cluster-b

YAML Configuration Format

apps:
  - name: my_app
    # Required
    # Unique application identifier
    # Type: string

    dir: ./path/to/application_wd
    # Required
    # Working directory of the application
    # Type: string (path)

    blocking: false
    # Optional
    # If true, waits for all jobs to finish before moving to next app
    # Type: boolean
    # Default: false

    cluster_whitelist:
      - cluster-a
      - cluster-b
    # Optional
    # Only execute on these clusters
    # Type: list[string]

    cluster_blacklist:
      - cluster-c
    # Optional
    # Skip execution on these clusters
    # Type: list[string]

    configs:
      - configs.yaml
      - compile_configs.yaml
    # Optional
    # SbatchMan configuration files
    # Relative to `dir`
    # Type: list[string] or string

    steps:
      - name: compile
        # Required
        # Unique step name within the application
        # Type: string

        script: "rm -rf bin; mkdir bin"
        # Optional (if not set, `jobs` is required)
        # Shell command executed before launching jobs
        # Type: string

        jobs: compile_jobs.yaml
        # Optional (if not set, `script` is required)
        # SbatchMan jobs YAML file
        # Relative to `dir`
        # Type: string

        on_fails: terminate
        # Optional
        # Failure handling policy
        # Values:
        #   - terminate : stop entire campaign
        #   - continue  : continue to next step
        #   - skip      : skip remaining steps and move to next app
        # Default: terminate

      - name: experiments
        jobs: jobs.yaml
        on_fails: continue

  - name: 

Check out this example: https://github.com/ThomasPasquali/SbatchManTutorial/blob/main/campaign.yaml


Simplified Execution Flow

for cluster in clusters
  # sbatchman set-cluster-name <cluster>

  for app in apps:
    # cd app working directory
    # sbatchman init
    # sbatchman configure -f <config files>

    for step in app.steps:
      # <step script>
      # sbatchman launch -f <step jobs>

Results structure

path/to/app1
├── ...
└── SbatchMan

path/to/app2
├── ...
└── SbatchMan

Each SbatchMan directory may contain results from multiple clusters.

Tip

Multiple SbatchMan directories can be merged! This way you can access all jobs at once. To automate data collection check out the Results page.