The cluster uses LSF by Platform as queueing system (currently in version 7.0.1). Some pdf manuals for this version are available on the cluster at /soft/x86_64/lsf/7.0/doc. You also might want to try the following links for comprehensive docs and a quick reference sheet. However, both links are for version 6.
To submit a job, you have to:
- Login on hpc-uni.uni-tuebingen.de with your ZDV account.
- Choose the right queue for your job. To list all queues, you can submit to, type bqueues -l -u your-zdv-id. Currently, we have the following queues:
- SHORT (up to 15 minutes): Used for testing
- NORMAL (up to 72 hours): Used for average jobs (default queue)
- MPI (up to 72 hours): Used for parallel jobs using the Message Passing Interface
- EXTRALONG (up to 1 month): Used for very long jobs
- VERYSHORT (up to 3 months): ZDV say, you should not use this queue (don't know why)
- Submit your job to the chosen queue with bsub -q queue-name job-script. Every job is assigned an unique job id automatically.
After submitting your jobs, you might want to:
- Check the status of your own jobs with the command bjobs. To see jobs from all users, use bjobs -u all. If your jobs are in a pending state (PEND), use bjobs -p to see the pending reason(s). If you use the -l switch on bjobs, you get verbose output.
- Change queued jobs with the command bmod. You can replace the whole job-script by typing bmod -Z new-job-script job-id. If you only want to change a job option, like for example -J use bmod -J new-value job-id. To reset the -J option to it's default submitted value (undo a bmod), use bmod -Jn job-id.
- Suspend and resume a running job with bstop job-id and bresume job-id. To kill a running job, use bkill job-id.
- Check the load for all cluster nodes with lsload -l. If you want to monitor the load over time (with periodic updates), you might want to use lsmon.
Most of the above commands have corresponding GUI counterparts, like e.g. xbsub, xlsmon and xlsbatch.