Skip to main content

Fusion with AWS EKS and S3 object storage

Fusion streamlines the deployment of Nextflow pipeline in a Kubernetes cluster, because it replaces the need to configure and maintain a shared file system in your cluster.

Kubernetes config

You will need to create a namespace and a service account in your Kubernetes cluster to run the job submitted by the pipeline execution.

The following manifest shows the bare minimum configuration.

---
apiVersion: v1
kind: Namespace
metadata:
name: fusion-demo
---
apiVersion: v1
kind: ServiceAccount
metadata:
namespace: fusion-demo
name: fusion-sa
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::<YOUR ACCOUNT ID>:role/fusion-demo-role"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: fusion-demo
name: fusion-role
rules:
- apiGroups: [""]
resources: ["pods", "pods/status", "pods/log", "pods/exec"]
verbs: ["get", "list", "watch", "create", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
namespace: fusion-demo
name: fusion-rolebind
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: fusion-role
subjects:
- kind: ServiceAccount
name: fusion-sa

The AWS IAM role should provide read-write permission to the S3 bucket used as the pipeline work directory. For example:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": ["arn:aws:s3:::<YOUR-BUCKET>"]
},
{
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:PutObjectTagging",
"s3:DeleteObject"
],
"Resource": ["arn:aws:s3:::<YOUR-BUCKET>/*"],
"Effect": "Allow"
}
]
}

In the above policy replace <YOUR-BUCKET> with a bucket name of your choice.

Also, make sure that the role defines a trust relationship similar to this:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<YOUR ACCOUNT ID>:oidc-provider/oidc.eks.<YOUR REGION>.amazonaws.com/id/<YOUR CLUSTER ID>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.eu-west-2.amazonaws.com/id/<YOUR CLUSTER ID>:aud": "sts.amazonaws.com",
"oidc.eks.eu-west-2.amazonaws.com/id/<YOUR CLUSTER ID>:sub": "system:serviceaccount:fusion-demo:fusion-sa"
}
}
}
]
}

Nextflow configuration

The minimal Nextflow configuration looks like the following:

wave.enabled = true
fusion.enabled = true
process.executor = 'k8s'
k8s.context = '<YOUR K8S CLUSTER CONTEXT>'
k8s.namespace = 'fusion-demo'
k8s.serviceAccount = 'fusion-sa'

In the above snippet replace YOUR K8S CLUSTER CONTEXT with Kubernetes context in your Kubernetes config, and save it to a file named nextflow.config into the pipeline launching directory.

Then launch the pipeline execution with the usual run command:

nextflow run <YOUR PIPELINE SCRIPT> -w s3://<YOUR-BUCKET>/work

Replacing YOUR PIPELINE SCRIPT with the URI of your pipeline Git repository and YOUR-BUCKET with a S3 bucket of your choice.

To achieve best performance make sure to setup a SSD volumes as temporary directory. See the section SSD storage for details.