Zentrum für Datenverarbeitung (ZDV) (data center)

Singularity

What is Singularity?

Singularity is a container system for high performance computing. It allows users to create a reproducible environment for their applications (it is even possible to containerize whole operating systems). This makes it easier to develop applications locally and later deploy them on the server, where they will work exactly the same. Especially in the context of scientific work, ensuring reproducibility allows sharing programs between researchers as containers can simply be duplicated.

Creating custom Containers

If you only want to run or download an existing container, you can continue reading at Running Containers on BinAC.

In order to build custom containers you need to use a Linux system with root privileges. Tests with Linux subsystems on Windows have been unsuccessful but you can run Linux in a virtual machine.

Installation

First of all you need to install Singularity 3 on your machine (most packet managers only support older versions of Singularity):

https://www.sylabs.io/guides/3.1/user-guide/quick_start.html#quick-installation-steps

Building a simple container

With the command

$ sudo singularity build --sandbox <containername> library://centos

we create a mutable container with a complete CentOS installation in our current working directory. You should now find a folder called <containername> in it.

Using  exec --writable and shell --writable we can already run commands within the container, just as if we were running CentOS on our machine.

We could for example update the packet manager

$ sudo singularity exec --writable <containername> yum - y update

and open a shell to download ClustalW2.

$ sudo singularity shell --writable <containername>

In the shell type:

$ yum -y install epel-release wget tar && yum -y groupinstall 'Development Tools'
$ cd /
$ wget www.clustal.org/download/current/clustalw-2.1.tar.gz
$ tar xzf clustalw-2.1.tar.gz
$ cd  clustalw-2.1
$ ./configure && make && make install
$ cd .. && rm -rf /clustalw-2.1 && rm clustalw-2.1-tar.gz
$ mkdir /input
$ mkdir /output

ClustalW2 is used to calculate multiple sequence alignments for DNA and proteins.

Finally we can create an immutable image of our container:

$ sudo singularity build <immutable_container>.sif <mutable_container>

We can execute ClustalW2 with the follwing command:

$ singularity exec <immutable_container>.sif clustalw2

 

Now that you've got a feeling of how singularity works, it is time to introduce a better way to build and execute containers.

Definition Files

For the use of Singularity in production systems it is recommended to construct containers using definition files. Not only does it make the process less labour intensive but it also ensures that all containers built from the same definition file will be identical.

A file with this content

Bootstrap: library
From: centos

%post
    yum -y install epel-release wget tar
    yum -y groupinstall 'Development Tools'

    # Install ClustalW
    cd /
    wget www.clustal.org/download/current/clustalw-2.1.tar.gz
    tar xzf clustalw-2.1.tar.gz
    cd clustalw-2.1
    ./configure && make && make install
    cd .. && rm -rf /clustalw-2.1 && rm clustalw-2.1.tar.gz

    # Input and output directory
    mkdir /input
    mkdir /output

%runscript
    clustalw2 "$@"

will lead to the same container we manually built earlier. Use

$ sudo singularity build <new_container>.sif <definition_file>

to build it.

Definition files are split into sections which represent different stages of the construction process. E.g. %post is used to list the commands that should be executed immediately after the installation of the operating system.

The only thing new are runscripts. Commands listed in this section will be triggered, when singularity is called with the argument run.

Now,

$ singularity run <new_container>.sif

is enough to execute ClustalW2.

See https://www.sylabs.io/guides/3.0/user-guide/definition_files.html#definition-files for all the options that can be set in a definition file.

Running containers on BinAC

Since the Singularity-Modul is not loaded by default, we first have to locate and load it using modul avail and module load.

$ module avail
    devel/singularity/3.x.x
$ module load devel/singularity/3.x.x

 

Next, we need to upload the container from our machine onto BinAC or download one from an external service. A simple way to upload files is scp.

$ scp <containername>.sif <LoginID>@<binac_address>:<workspace>

It is important that the container is saved in a workspace directory. Container in home directories wont be executable. If you don't have a workspace already, create one.

Alternatively you can download existing containers form DockerHub or SingularityHub. After changing into you workspace directory

$ cd <workspace>

You can download a container using the schemes docker://<user>/container_name>:<tag> for DockerHub or shub://<user>/<container_name>:<tag> for SingularityHub. For example:

$ singularity pull docker://docker/whalesay:latest
$ singularity pull shub://GodloveD/lolcow:latest

Some biocontainer are only available on quay.io. Use the DockerHub scheme for those: docker://quay.io/biocontainers/<container_name>:<tag>.

Actually running containers

We will now execute our custom container we previously built. In case you didn't follow that part of the tutorial you can also find the container at docker://fbartusch/clustalw2:latest.

First we'll download and unpack small sample dataset.

After that we can compute multiple sequence alignments (in this case Homoglobin Alpha sequences of different species)

$ singularity exec clustalw2_latest.sif clustalw2
  -align -type=protein \
  -infile=./example_input/protein/example.fasta \
  -outfile=./example_protein.fasta.aln \
  -newtree=./example_protein.fasta.dnd

and view the results using cat

$ cat example_protein.fasta.aln

 

As you can see from this example, Singularity mounts the current working directory, so that files can be accessd via ./<name>. While it is not an issue during testing you should use absolute paths in real appliactions.

Batchjobs

Calculating alignments on the test dataset didn't take long but long running tasks will be automatically terminated to protect the login nodes of the cluster from overload. Therefore it is necessary to hand the program execution over to a scheduler running in the background whoose job it is to allocate resources on the cluster. We can do that using batchjobs. A jobscript for our ClustalW2 example could look like this:

#!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS -l walltime=0:01:0
module load devel/singularity/3.x.x
workspace=<workspace_verzeichnis>
singularity exec $workspace/clustalw2_latest.sif clustalw2 \
    -align -type=protein  \
    -infile=$workspace/example_input/protein/example.fasta \
    -outfile=$workspace/example_protein.fasta.aln \
    -newtree=$workspace/example_protein.fasta.dnd

 

We can pass it to the scheduler using

$ qsub -q tiny <jobscript>

Using GPUs

In most cases singularity containers can use GPUs just like any other program built for GPU usage.

First we will start an interactive job and probe for available GPUs outside a container.

$ qsub -I -q tiny -l nodes=1:ppn=1:gpus=1
$ module load devel/singularity/3.x.x
$ nvidia-smi

To check the availability within a container we will open a shell in the container and execute the same command there. We should get the same response.

$ [...]
$ singularity shell --nv clustalw2_latest.sif
$ nvidia-smi

The parameter -nv makes the NVIDIA driver available within the container.

Although the system driver is available within the container, each container still needs to provide it's own CUDA and other GPU related software.

Last but not least a small example for training a small machine learning model with containers and GPUs:

# download container

$ singularity pull docker://tensorflow/tensorflow:1.12.0-gpu-py3
# download tensorflow models repository
$ wget github.com/tensorflow/models/archive/v1.11.tar.gz
$ tar xzf v1.11.tar.gz
# start an interactive job
$ qsub -I -q tiny -l nodes=1:ppn=1:gpus=1
$ module load devel/singularity/3.0.1
# train the cifar10 model
$ singularity exec --nv tensorflow_1.12.0-gpu-py3.sif \
$ python models-1.11/tutorials/image/cifar10/cifar10_multi_gpu_train.py
# gpu usage can be watched from a second shell using nvidia-smi