Skip to content

Results Collection and Parsing

Tip

First, if you need to collect results from multiple remote machines, check out the Fetch page.

Once you have all you results in one SbatchMan project, you can use the jobs_list or jobs_to_dataframe() Python APIs to get your results.

Example

import sbatchman as sbm

jobs: List[sbm.Job] = sbm.jobs_list(status=[sbm.Status.COMPLETED])
# here you have ALL non-archived completed jobs

alternative

def job_filter(job: sbm.Job) -> bool:
    return not job.get_stdout() and job.clutser_name != 'cluster-I-dont-want'

def extract_problem_size(job: sbm.Job) -> dict:
    exe, positional, kwargs = job.parse_command_args()
    return {
        "executable": exe,
        "size": kwargs.get("size"),
    }

def extract_flops(job: sbm.Job) -> dict:
    stdout = job.get_stdout()
    m = re.search(r"FLOPS:\s*([0-9.eE+-]+)", stdout)
    if not m:
        return {}
    return {
        "flops": float(m.group(1))
    }

df = sbm.jobs_to_dataframe(
    status=[sbm.Status.COMPLETED],
    job_filter=job_filter,
    extractors=[
        extract_problem_size,
        extract_flops,
    ],
    include_job_fields=True,
    include_job_variables=True,
)

The resulting pandas.DataFrame columns are the union of: - User-defined: executable, size, flops - YAML variables: all variables used in the jobs' wildcards {var_name} - Metadata: config_name, cluster_name, status, tag, job_id, exitcode, archive_name, sbm_queue_time_s, sbm_run_time_s