When submitting jobs,
- you might want to specifiy a job name, which you can later use to easily identify the job. This is done with bsub -J jobname. The job name complements the job id and is not necessarily unique.
- you may want to specify the number of cores to use with the -n switch. For MPI jobs, you can currently request a maximum number of 32 cores (PROCLIMIT) per job. Other queues have no PROCLIMIT set on a job basis, but use different QJOB_LIMITs. The QJOB_LIMIT specifies the total number of cores a queue can use.
- you may want to use job groups. This makes it easier to manage multiple jobs. The job group does not have to exist before submitting the job. Submitting a job to the job group blue_jobs is done by bsub -g /blue_jobs -q queue-name job-script. Job groups may also be nested, which means that you can have sub-groups, like for example /blue_jobs/navy and /blue_jobs/cobalt. To stop all cobalt jobs, you then would use bstop -g /blue/cobalt 0.
For monitoring jobs, you might find the following useful:
- bhist displays a summary of the pending, suspended and running time of jobs for the user who invoked the command. Like with bjobs you can use -u all to see information on all user's jobs and -l for verbose output.
- bpeek job-id allows you, to look at the output, the job has produced so far.
If you need specific details about queue configuration, ask your systems administrator or take a look at the world readable configuration file at /lustre/soft/x86_64/lsf/conf/lsbatch/tuegrid/configdir/lsb.queues.
Every Job has an associated job file, which can be found at /lustre/soft/x86_64/lsf/work/tuegrid/logdir/info.