28 April 2022

End-to-end infra node installation of OCS on OCP

Overview

This guide aims to provide a fairly quick procedure for the task at hand and is based mostly on Red Hat’s training lab

It tries to include all the steps starting from an actual installation of the OCP cluster on AWS and put OCS on top of that, up until a performance test which can be run at the end to verify that we have a working environment.

📌 NOTE
We explicitly install the OCS part directly on infrastructure nodes, this will leave the 3 worker nodes available for any applications that will eventually make use this storage. Monitoring applications are also moved to these infra nodes.

Openshift installation on AWS cloud

Verify your AWS environment

We need to check that our AWS account is configured correctly:

export AWS_PROFILE=[your AWS username goes here]
aws configure get region

Example output:

eu-central-1

Create your install-config.yaml

openshift-install create install-config --log-level info

Example output:

? SSH Public Key <none>
? Platform  [Use arrows to move, enter to select, type to filter, ? for more help]
> aws
  azure
  gcp
  openstack
  ovirt
  vsphere

This will create a file such as the one below that will be used during the installation:

apiVersion: v1
baseDomain: example.com
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3
metadata:
  creationTimestamp: null
  name: CLUSTERID
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: eu-central-1
publish: External
pullSecret: '{"auths":{"cloud.openshift.com":{"auth":"...","email":"..."},"quay.io":{"auth":"...","email":"..."},"registry.connect.redhat.com":{"auth":"...","email":"..."},"registry.redhat.io":{"auth":"...","email":"..."}}}'

=== Actual installation of the cluster

Back up the install-config.yaml before you begin the installation

cp install-config.yaml install-config.default.yaml

⚠️ WARNING
openshift-install will remove this file when the installation starts

We are now ready to begin the installation:

openshift-install create cluster --log-level debug

Example output:

DEBUG OpenShift Installer 4.6.16                   
DEBUG Built from commit 8a1ec01353e68cb6ebb1dd890d684f885c33145a 
DEBUG Fetching Metadata...                         
DEBUG Loading Metadata...                          
DEBUG   Loading Cluster ID...   
...

You can check instance creation from the AWS side of things:

instances created from the AWS console

If all goes well you should get this:

Example output:

...
DEBUG Still waiting for the cluster to initialize: Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: kube-apiserver oauth endpoint https://10.0.140.185:6443/.well-known/oauth-authorization-server is not yet served and authentication operator keeps waiting (check kube-apiserver operator, and check that instances roll out successfully, which can take several minutes per instance) 
DEBUG Still waiting for the cluster to initialize: Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: need at least 3 kube-apiservers, got 2 
DEBUG Cluster is initialized                       
INFO Waiting up to 10m0s for the openshift-console route to be created... 
DEBUG Route found in openshift-console namespace: console 
DEBUG Route found in openshift-console namespace: downloads 
DEBUG OpenShift console route is created           
INFO Install complete!                            
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/$USER/install/auth/kubeconfig' 
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.CLUSTERID.example.com 
INFO Login to the console with user: "kubeadmin", and password: "***********" 
DEBUG Time elapsed per stage:                      
DEBUG     Infrastructure: 7m4s                     
DEBUG Bootstrap Complete: 9m59s                    
DEBUG                API: 7s                       
DEBUG  Bootstrap Destroy: 59s                      
DEBUG  Cluster Operators: 16m48s                   
INFO Time elapsed: 34m58s              

We can also check the installation progress as soon as the API is up:

export KUBECONFIG=/home/$USER/install/auth/kubeconfig
oc get co

Example output in case of successful installation:

NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.6.16    True 	     False         False	  2m59s
cloud-credential                           4.6.16    True        False         False	  29m
cluster-autoscaler                         4.6.16    True        False         False	  22m
config-operator                            4.6.16    True        False         False	  23m
console                                    4.6.16    True        False         False	  12m
csi-snapshot-controller                    4.6.16    True        False         False	  23m
dns                                        4.6.16    True        False         False	  22m
etcd                                       4.6.16    True        False         False	  22m
image-registry                             4.6.16    True        False         False	  17m
ingress                                    4.6.16    True        False         False	  16m
insights                                   4.6.16    True        False         False	  23m
kube-apiserver                             4.6.16    True        False         False	  21m
kube-controller-manager                    4.6.16    True        False         False	  21m
kube-scheduler                             4.6.16    True        False         False	  20m
kube-storage-version-migrator              4.6.16    True        False         False	  16m
machine-api                                4.6.16    True        False         False	  17m
machine-approver                           4.6.16    True        False         False	  23m
machine-config                             4.6.16    True        False         False	  21m
marketplace                                4.6.16    True        False         False	  22m
monitoring                                 4.6.16    True        False         False	  15m
network                                    4.6.16    True        False         False	  24m
node-tuning                                4.6.16    True        False         False	  23m
openshift-apiserver                        4.6.16    True        False         False	  18m
openshift-controller-manager               4.6.16    True        False         False	  22m
openshift-samples                          4.6.16    True        False         False	  17m
operator-lifecycle-manager                 4.6.16    True        False         False	  22m
operator-lifecycle-manager-catalog         4.6.16    True        False         False	  22m
operator-lifecycle-manager-packageserver   4.6.16    True        False         False	  18m
service-ca                                 4.6.16    True        False         False	  23m
storage                                    4.6.16    True        False         False	  23m

Deploy your storage backend using the OCS operator

Scale OCP cluster and add new infra worker nodes

In this section, you will first validate the OCP environment has 2 or 3 worker nodes before increasing the cluster size by additional 3 worker nodes for OCS resources. The NAME of your OCP nodes will be different than shown below.

oc get nodes -l node-role.kubernetes.io/worker -l '!node-role.kubernetes.io/master'

Example output:

NAME                                            STATUS   ROLES    AGE   VERSION
ip-10-0-129-119.eu-central-1.compute.internal   Ready    worker   17m   v1.19.0+e49167a
ip-10-0-185-158.eu-central-1.compute.internal   Ready    worker   20m   v1.19.0+e49167a
ip-10-0-209-48.eu-central-1.compute.internal    Ready    worker   17m   v1.19.0+e49167a

Now you are going to add 3 more OCP infra nodes to cluster using machinesets.

oc get machinesets -n openshift-machine-api

This will show you the existing machinesets used to create the 2 or 3 worker nodes in the cluster already. There is a machineset for each of 3 AWS Availability Zones (AZ). NOTE: In the case of only 2 workers one of the machinesets will not have any machines (i.e., DESIRED=0) created.

Example output:

NAME                                    DESIRED   CURRENT   READY   AVAILABLE   AGE
CLUSTERID-ltkvj-worker-eu-central-1a   1         1         1       1           32m
CLUSTERID-ltkvj-worker-eu-central-1b   1         1         1       1           32m
CLUSTERID-ltkvj-worker-eu-central-1c   1         1         1       1           32m

Create new MachineSets that will in turn create storage-specific nodes for your OCP cluster in the AWS AZs:

We are now ready to load two important variables for our OCS deployment.

CLUSTERID=$(oc get machineset -n openshift-machine-api -o jsonpath='{.items[0].metadata.labels.machine\.openshift\.io/cluster-api-cluster}')
RHCOS=$(aws ec2 describe-images --filters "Name=name,Values=rhcos-4*" --query 'sort_by(Images, &CreationDate)[-1].ImageId' --output text)

Having taken inspiration from here we will now create 3 new MachineSets that will run storage-specific infra nodes for your OCP cluster:

curl -s https://raw.githubusercontent.com/mikelo/mikelo.github.io/master/ocs/cluster-workerocs-eu-central-1-infra.yaml | sed -e "s/CLUSTERID/${CLUSTERID}/g" | sed -e "s/RHCOS/${RHCOS}/g" | oc apply -f -

Check that you have new machines created.

oc get machines -n openshift-machine-api | egrep 'NAME|workerocs'

Example output:

NAME                                             PHASE     TYPE         REGION         ZONE            AGE
$CLUSTERID-ltkvj-workerocs-eu-central-1a-7lrkm   Running   m5.4xlarge   eu-central-1   eu-central-1a   6m26s
$CLUSTERID-ltkvj-workerocs-eu-central-1b-jzsnz   Running   m5.4xlarge   eu-central-1   eu-central-1b   6m26s
$CLUSTERID-ltkvj-workerocs-eu-central-1c-hkj8n   Running   m5.4xlarge   eu-central-1   eu-central-1c   6m26s

They will be in Provisioning at first and eventually in a Running PHASE. NOTE: workerocs machines are using the AWS EC2 instance type m5.4xlarge which has 16 cpus and 64 GB memory.

Now you want to see if our new machines are added to the OCP cluster.

oc get machinesets -n openshift-machine-api | egrep 'NAME|workerocs'

Example output:

NAME                                       DESIRED   CURRENT   READY   AVAILABLE   AGE
$CLUSTERID-ltkvj-workerocs-eu-central-1a   1         1         1       1           7m22s
$CLUSTERID-ltkvj-workerocs-eu-central-1b   1         1         1       1           7m22s
$CLUSTERID-ltkvj-workerocs-eu-central-1c   1         1         1       1           7m22s

Check the nodes as shown below:

oc get nodes -l node-role.kubernetes.io/worker -l '!node-role.kubernetes.io/master'

Example output:

NAME                                            STATUS   ROLES          AGE     VERSION
ip-10-0-129-119.eu-central-1.compute.internal   Ready    worker         45m     v1.19.0+e49167a
ip-10-0-138-77.eu-central-1.compute.internal    Ready    infra,worker   3m17s   v1.19.0+e49167a
ip-10-0-181-225.eu-central-1.compute.internal   Ready    infra,worker   3m16s   v1.19.0+e49167a
ip-10-0-185-158.eu-central-1.compute.internal   Ready    worker         48m     v1.19.0+e49167a
ip-10-0-200-230.eu-central-1.compute.internal   Ready    infra,worker   3m19s   v1.19.0+e49167a
ip-10-0-209-48.eu-central-1.compute.internal    Ready    worker         45m     v1.19.0+e49167a

Installing the OCS operator

In this section you will be using three of the worker OCP 4 nodes to deploy OCS 4 using the OCS Operator in OperatorHub. The following will be installed:

An OCS OperatorGroup
An OCS Subscription
All other OCS resources (Operators, Ceph Pods, NooBaa Pods, StorageClasses)

Start with creating the openshift-storage namespace.

oc create namespace openshift-storage

You must add the monitoring label to this namespace. This is required to get prometheus metrics and alerts for the OCP storage dashboards. To label the openshift-storage namespace use the following command:

oc label namespace openshift-storage "openshift.io/cluster-monitoring=true"

📌 NOTE
The creation of the openshift-storage namespace, and the monitoring label added to this namespace, can also be done during the OCS operator installation using the Openshift Web Console.

In the Openshift Web Console, navigate to the Operators -> OperatorHub menu.

OCP OperatorHub

OCP OperatorHub

Now type openshift container storage in the Filter by keyword… box.

OCP OperatorHub filter on OpenShift Container Storage Operator

OCP OperatorHub Filter

Select OpenShift Container Storage Operator and then select Install.

OCP OperatorHub Install OpenShift Container Storage

OCP OperatorHub Install

On the next screen make sure the settings are as shown in this figure.

OCP Subscribe to OpenShift Container Storage

OCP OperatorHub Subscribe

Click Install.

Now you can go back to your terminal window to check the progress of the installation.

oc -n openshift-storage get csv

Example output:

NAME                  DISPLAY                       VERSION   REPLACES   PHASE
ocs-operator.v4.6.0   OpenShift Container Storage   4.6.0                Succeeded

Please wait until the operator PHASE changes to Succeeded

🔥 CAUTION
This will mark that the installation of your operator was successful. Reaching this state can take several minutes.

You will now also see new operator pods in openshift-storage namespace:

oc -n openshift-storage get pods

Example output:

NAME                                    READY   STATUS    RESTARTS   AGE
noobaa-operator-88798865f-hlwtt         1/1     Running   0          6m57s
ocs-metrics-exporter-5495fd48b9-xzxpm   1/1     Running   0          6m57s
ocs-operator-6fcc5f798f-gdkrx           1/1     Running   0          6m57s
rook-ceph-operator-8659478f5-qhghs      1/1     Running   0          6m57s

Now switch back to your Openshift Web Console for the remainder of the installation for OCS 4.

Select View Operator in figure below to get to the OCS configuration screen.

View Operator in openshift-storage namespace

View Operator in openshift-storage namespacee

OCS configuration screen

OCS configuration screen

On the top of the OCS configuration screen, scroll over to the right and click on Storage Cluster and then click on Create Storage Cluster to the far right. If you do not see Create Storage Cluster refresh your browser window.

Create Storage Cluster

Create Storage Cluster

The Create Storage Cluster screen will display.

Create Storage Cluster default settings

Create Storage Cluster default settings

Leave the default selection of Internal, gp2, 2 TiB and Encryption Disabled.

Create a new storage cluster

Create a new storage cluster

There should be 3 worker nodes already selected that had the OCS label applied in the last section. Execute command below and make sure they are all selected.

oc get nodes --show-labels | grep ocs |cut -d' ' -f1

Then click on the button Create below the dialog box with the 3 workers selected with a checkmark.

You can watch the deployment using the Openshift Web Console by going back to the Openshift Container Storage Operator screen and selecting All instances.

Please wait until all Pods are marked as Running in the CLI or until you see all instances shown below as Ready Status in the Web Console as shown in the following diagram:

OCS instance overview after cluster install is finished

OCS instance overview after cluster install is finished

oc -n openshift-storage get pods

Output when the cluster installation is finished

NAME                                                              READY   STATUS      RESTART
S   AGE
csi-cephfsplugin-875xd                                            3/3     Running     0
    23m
csi-cephfsplugin-bncsj                                            3/3     Running     0
    23m
csi-cephfsplugin-hjv77                                            3/3     Running     0
    23m
csi-cephfsplugin-lch4m                                            3/3     Running     0
    23m
csi-cephfsplugin-provisioner-6cfdc4bfbb-cklxs                     6/6     Running     0
    23m
csi-cephfsplugin-provisioner-6cfdc4bfbb-krkq5                     6/6     Running     0
    23m
csi-cephfsplugin-wtp4v                                            3/3     Running     0
    23m
csi-rbdplugin-7clqf                                               3/3     Running     0
    23m
csi-rbdplugin-8nllt                                               3/3     Running     0
    23m
csi-rbdplugin-d267h                                               3/3     Running     0
    23m
csi-rbdplugin-provisioner-b46dd5c7-vd58q                          6/6     Running     0
    23m
csi-rbdplugin-provisioner-b46dd5c7-z8mx6                          6/6     Running     0
    23m
csi-rbdplugin-tdj8f                                               3/3     Running     0
    23m
csi-rbdplugin-wp65b                                               3/3     Running     0
    23m
noobaa-core-0                                                     1/1     Running     0
    19m
noobaa-db-0                                                       1/1     Running     0
    19m
noobaa-endpoint-86cc5df669-ffqj2                                  1/1     Running     0
    16m
noobaa-operator-698746cd47-sp6w9                                  1/1     Running     0
    17h
ocs-metrics-exporter-78bc44687-pg4hk                              1/1     Running     0
    17h
ocs-operator-6d99bc6787-d7m9d                                     1/1     Running     0
    17h
rook-ceph-crashcollector-ip-10-0-147-230-7cbf854757-chlgs         1/1     Running     0
    20m
rook-ceph-crashcollector-ip-10-0-175-8-5779d5d5df-p6hkl           1/1     Running     0
    21m
rook-ceph-crashcollector-ip-10-0-209-53-7ccc4cc785-wjxzd          1/1     Running     0
    21m
rook-ceph-drain-canary-128c383c26627b938ab0fd7f47f58d33-665pbsg   1/1     Running     0
    19m
rook-ceph-drain-canary-84c954eec459013180f78efd0a35792c-7b6qdnj   1/1     Running     0
    19m
rook-ceph-drain-canary-ip-10-0-175-8.eu-central-1.compute.intrh526   1/1     Running     0
    19m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-756df8b4kp9kr   1/1     Running     0
    18m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-64585764bbg6b   1/1     Running     0
    18m
rook-ceph-mgr-a-5c74bb4b85-5x26g                                  1/1     Running     0
    20m
rook-ceph-mon-a-746b5457c-hlh7n                                   1/1     Running     0
    21m
rook-ceph-mon-b-754b99cfd-xs9g4                                   1/1     Running     0
    21m
rook-ceph-mon-c-7474d96f55-qhhb6                                  1/1     Running     0
    20m
rook-ceph-operator-59f7fb95d6-sdjd8                               1/1     Running     0
    17h
rook-ceph-osd-0-7d45696497-jwgb7                                  1/1     Running     0
    19m
rook-ceph-osd-1-6f49b665c7-gxq75                                  1/1     Running     0
    19m
rook-ceph-osd-2-76ffc64cd-9zg65                                   1/1     Running     0
    19m
rook-ceph-osd-prepare-ocs-deviceset-gp2-0-data-0-9977n-49ngd      0/1     Completed   0
    20m
rook-ceph-osd-prepare-ocs-deviceset-gp2-1-data-0-nnmpv-z8vq6      0/1     Completed   0
    20m
rook-ceph-osd-prepare-ocs-deviceset-gp2-2-data-0-mtbtj-xrj2n      0/1     Completed   0
    20m

The great thing about operators and OpenShift is that the operator has the intelligence about the deployed components built-in. And, because of the relationship between the CustomResource and the operator, you can check the status by looking at the CustomResource itself. When you went therough the UI dialogs, ultimately in the back-end an instance of a StorageCluster was created:

oc get storagecluster -n openshift-storage

Output when the cluster installation is finished

NAME                          PROVISIONER                             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
gp2 (default)                 kubernetes.io/aws-ebs                   Delete          WaitForFirstConsumer   true                   107m
gp2-csi                       ebs.csi.aws.com                         Delete          WaitForFirstConsumer   true                   107m
ocs-storagecluster-ceph-rbd   openshift-storage.rbd.csi.ceph.com      Delete          Immediate              true                   40m
ocs-storagecluster-cephfs     openshift-storage.cephfs.csi.ceph.com   Delete          Immediate              true                   40m
openshift-storage.noobaa.io   openshift-storage.noobaa.io/obc         Delete          Immediate              false                  34m

You can check the status of the storage cluster with the following:

oc get storagecluster -n openshift-storage ocs-storagecluster -o jsonpath='{.status.phase}{"\n"}'

If it says Ready, you can continue.

# Getting to know the Storage Dashboards

You can now also check the status of your storage cluster with the OCS specific Dashboards that are included in your Openshift Web Console. You can reach this by clicking on Overview on your left navigation bar, then selecting Persistent Storage on the top navigation bar of the content page.

Location of OCS Dashboards

Location of OCS Dashboards

📌 NOTE
If you just finished your OCS 4 deployment it could take 5-10 minutes for your Dashboards to fully populate. Different versions of OCP 4 may have minor differences in Dashboard sections and naming of Dashboards.

Storage Dashboard after successful storage installation

Storage Dashboard after successful storage installation


1	Health	Quick overview of the general health of the storage cluster
2	Details	Overview of the deployed storage cluster version and backend provider
3	Inventory	List of all the resources that are used and offered by the storage system
4	Events	Live overview of all the changes that are being done affecting the storage cluster
5	Utilization	Overview of the storage cluster usage and performance

OCS ships with a Dashboard for the Object Store service as well. From the Overview click on the Object Service on the top navigation bar of the content page.

OCS Multi-Cloud-Gateway Dashboard after successful installation

OCS Multi-Cloud-Gateway Dashboard after successful installation


1	Health	Quick overview of the general health of the Multi-Cloud-Gateway
2	Details	Overview of the deployed MCG version and backend provider including a link to the MCG Console
3	Buckets	List of all the ObjectBucket with are offered and ObjectBucketClaims which are connected to them
4	Resource Providers	Shows the list of configured Resource Providers that are available as backing storage in the MCG
5	Counters	Shows the current numbers of reads and writes issued against each provider
6	Events	Live overview of all the changes that are being done affecting the MCG

Once this is all healthy, you will be able to use the three new StorageClasses created during the OCS 4 Install:

ocs-storagecluster-ceph-rbd
ocs-storagecluster-cephfs
openshift-storage.noobaa.io

You can see these three StorageClasses from the Openshift Web Console by expanding the Storage menu in the left navigation bar and selecting Storage Classes. You can also run the command below:

oc -n openshift-storage get sc

Please make sure the three storage classes are available in your cluster before proceeding.

📌 NOTE
The NooBaa pod used the ocs-storagecluster-ceph-rbd storage class for creating a PVC for mounting to the db container.

Using the Rook-Ceph toolbox to check on the Ceph backing storage

Since the Rook-Ceph toolbox is not shipped with OCS, we need to deploy it manually.

You can patch the OCSInitialization ocsinit using the following command line:

oc patch OCSInitialization ocsinit -n openshift-storage --type json --patch  '[{ "op": "replace", "path": "/spec/enableCephTools", "value": true }]'
TOOLS_POD=$(oc get pods -n openshift-storage -l app=rook-ceph-tools -o name)
oc rsh -n openshift-storage pod/$TOOLS_POD ceph df

Example output

RAW STORAGE:
    CLASS     SIZE      AVAIL       USED        RAW USED     %RAW USED 
    ssd       6 TiB     6.0 TiB     101 MiB      3.1 GiB          0.05 
    TOTAL     6 TiB     6.0 TiB     101 MiB      3.1 GiB          0.05 
 
POOLS:
    POOL                                           ID     STORED      OBJECTS     USED        %USED     MAX AVAIL 
    ocs-storagecluster-cephblockpool                1      33 MiB          63     100 MiB         0       1.7 TiB 
    ocs-storagecluster-cephfilesystem-metadata      2     2.2 KiB          22      96 KiB         0       1.7 TiB 
    ocs-storagecluster-cephfilesystem-data0         3         0 B           0         0 B         0       1.7 TiB 

Finally, we are ready to move the monitoring applications to the infra nodes as well. This will enble us to free up resources for applications running on worker nodes. Ultimately this will incur us into minimizing the number of entitlements necessary to keep the cluster up and running.

oc apply -f https://raw.githubusercontent.com/mikelo/mikelo.github.io/master/ocs/cluster-monitoring-configmap.storage.yaml

Monitor the status of the newly applied configuration:

oc get pod -o wide

Example output:

NAME                                           READY   STATUS    RESTARTS   AGE     IP             NODE                                            NOMINATED NODE   READINESS GATES
alertmanager-main-0                            5/5     Running   0          42s     10.128.4.13    ip-10-0-223-56.eu-central-1.compute.internal    <none>           <none>
alertmanager-main-1                            5/5     Running   0          69s     10.130.2.14    ip-10-0-189-227.eu-central-1.compute.internal   <none>           <none>
alertmanager-main-2                            5/5     Running   0          95s     10.131.2.20    ip-10-0-139-155.eu-central-1.compute.internal   <none>           <none>
cluster-monitoring-operator-79b8bcd7d7-cmb56   2/2     Running   3          6h33m   10.128.0.4     ip-10-0-223-47.eu-central-1.compute.internal    <none>           <none>
grafana-649bb46c47-vkvqq                       2/2     Running   0          91s     10.131.2.22    ip-10-0-139-155.eu-central-1.compute.internal   <none>           <none>
kube-state-metrics-76ff46f884-ntgnx            3/3     Running   0          99s     10.131.2.16    ip-10-0-139-155.eu-central-1.compute.internal   <none>           <none>
node-exporter-2pc52                            2/2     Running   0          6h22m   10.0.194.74    ip-10-0-194-74.eu-central-1.compute.internal    <none>           <none>
node-exporter-7sqnf                            2/2     Running   0          5h48m   10.0.189.227   ip-10-0-189-227.eu-central-1.compute.internal   <none>           <none>
node-exporter-9sfvw                            2/2     Running   0          6h28m   10.0.132.238   ip-10-0-132-238.eu-central-1.compute.internal   <none>           <none>
node-exporter-b8df5                            2/2     Running   0          6h22m   10.0.129.227   ip-10-0-129-227.eu-central-1.compute.internal   <none>           <none>
node-exporter-bdbv9                            2/2     Running   0          6h22m   10.0.166.96    ip-10-0-166-96.eu-central-1.compute.internal    <none>           <none>
node-exporter-d549q                            2/2     Running   0          5h48m   10.0.223.56    ip-10-0-223-56.eu-central-1.compute.internal    <none>           <none>
node-exporter-m7ghp                            2/2     Running   0          6h28m   10.0.167.222   ip-10-0-167-222.eu-central-1.compute.internal   <none>           <none>
node-exporter-rxpvm                            2/2     Running   0          5h48m   10.0.139.155   ip-10-0-139-155.eu-central-1.compute.internal   <none>           <none>
node-exporter-tbl7v                            2/2     Running   0          6h28m   10.0.223.47    ip-10-0-223-47.eu-central-1.compute.internal    <none>           <none>
openshift-state-metrics-97b67f7bf-2gnbt        3/3     Running   0          99s     10.131.2.17    ip-10-0-139-155.eu-central-1.compute.internal   <none>           <none>
prometheus-adapter-5b95948cbb-td86s            1/1     Running   0          94s     10.131.2.21    ip-10-0-139-155.eu-central-1.compute.internal   <none>           <none>
prometheus-adapter-5b95948cbb-tdh7x            1/1     Running   0          84s     10.130.2.13    ip-10-0-189-227.eu-central-1.compute.internal   <none>           <none>
prometheus-k8s-0                               6/6     Running   1          51s     10.130.2.15    ip-10-0-189-227.eu-central-1.compute.internal   <none>           <none>
prometheus-k8s-1                               6/6     Running   1          89s     10.131.2.23    ip-10-0-139-155.eu-central-1.compute.internal   <none>           <none>
prometheus-operator-d4b8885b9-h6x9b            2/2     Running   0          99s     10.131.2.18    ip-10-0-139-155.eu-central-1.compute.internal   <none>           <none>
telemeter-client-769bccbc99-t7fsl              3/3     Running   0          96s     10.131.2.19    ip-10-0-139-155.eu-central-1.compute.internal   <none>           <none>
thanos-querier-bf7898547-r8gjd                 5/5     Running   0          6h20m   10.129.2.4     ip-10-0-129-227.eu-central-1.compute.internal   <none>           <none>
thanos-querier-bf7898547-rk7cd                 5/5     Running   0          6h20m   10.131.0.10    ip-10-0-166-96.eu-central-1.compute.internal    <none>

Note that pods are starting to gradually move to the infra nodes. Each monitoring component has tainted elements built inside of them, here’s a snippet for one of them:

    prometheusK8s:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - key: infra
        value: reserved
        effect: NoExecute
      - key: node.ocs.openshift.io/storage
        value: "true"
        effect: NoSchedule

Test an OCP application deployment using a CephFS volume

In this section the ocs-storagecluster-cephfs StorageClass will be used by an OCP application and a database Deployment to create an RWO (ReadWriteOnce) persistent storage.

In this case we are following this blog

📌 NOTE
we added a few twearks to the YAML file and the testing scripts to make it work for our cluster

Start by creating a new project:

oc new-project sysbench

Then use the MySQL/sysbench YAML to create the new StatefulSet.

oc apply -f https://raw.githubusercontent.com/mikelo/mikelo.github.io/master/ocs/OCS-FS-STS.yaml

Check that the PVC is created.

oc get pvc

Example output:

NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                AGE
mysql-ocs-fs-data-mysql-ocs-fs-0    Bound    pvc-0194b560-1593-4ae7-a527-9331e35e28c1   15Gi       RWO            ocs-storagecluster-cephfs   21s
mysql-ocs-fs-data-mysql-ocs-fs-1    Bound    pvc-c76a39da-40de-4b07-a836-7e20a15fb565   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-10   Bound    pvc-117ad8ae-5774-4891-b1fd-5fcfd90d52bc   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-11   Bound    pvc-67ce4daa-50f2-4e8b-b9ea-36b93710e8d7   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-12   Bound    pvc-f8cb5c94-7c56-406c-97fe-f9255d60b273   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-13   Bound    pvc-f5ac6748-65ba-45d0-8ce6-1d0aeeef48e9   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-14   Bound    pvc-472e4dfe-3fc7-4e84-8ab7-8113eefa34c9   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-15   Bound    pvc-904de214-e551-4034-965e-ce3862ff3c97   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-16   Bound    pvc-9e513ce0-0542-4e58-98ab-480bd27689c2   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-17   Bound    pvc-8123f988-ad05-4c13-a539-b9d215f6af6b   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-18   Bound    pvc-3397e003-fb04-45d3-b949-1aa9cd20e33f   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-19   Bound    pvc-c054e10f-22af-4ac6-96e4-8cbba1303d29   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-2    Bound    pvc-37444f2b-17cf-4198-8501-6638f0a9ef1c   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-3    Bound    pvc-065a8160-911c-4b77-ab91-3c9b92c8726d   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-4    Bound    pvc-94a5112d-9102-4a1f-be7a-b4d77897d719   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-5    Bound    pvc-4a84fee5-f132-465a-95fd-258ec1478067   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-6    Bound    pvc-c6037407-1bc1-46cb-8605-50351b40a812   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-7    Bound    pvc-c5106001-76f0-42d4-b864-9e667f234bb7   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-8    Bound    pvc-307b0056-3039-4752-9295-f871f9298f38   15Gi       RWO            ocs-storagecluster-cephfs   20s
mysql-ocs-fs-data-mysql-ocs-fs-9    Bound    pvc-e855f29c-99a8-43b2-bb6e-c46c4f730c48   15Gi       RWO            ocs-storagecluster-cephfs   20s

This step could take 5 or more minutes. Wait until there are 2 Pods in Running STATUS and 4 Pods in Completed STATUS as shown below.

oc get pods

Example output:

NAME              READY   STATUS    RESTARTS   AGE
mysql-ocs-fs-0    2/2     Running   0          3m40s
mysql-ocs-fs-1    2/2     Running   0          3m40s
mysql-ocs-fs-10   2/2     Running   0          3m40s
mysql-ocs-fs-11   2/2     Running   0          3m40s
mysql-ocs-fs-12   2/2     Running   0          3m40s
mysql-ocs-fs-13   2/2     Running   0          3m40s
mysql-ocs-fs-14   2/2     Running   0          3m39s
mysql-ocs-fs-15   2/2     Running   0          3m39s
mysql-ocs-fs-16   2/2     Running   0          3m39s
mysql-ocs-fs-17   2/2     Running   0          3m39s
mysql-ocs-fs-18   2/2     Running   0          3m39s
mysql-ocs-fs-19   2/2     Running   0          3m39s
mysql-ocs-fs-2    2/2     Running   0          3m40s
mysql-ocs-fs-3    2/2     Running   0          3m40s
mysql-ocs-fs-4    2/2     Running   0          3m40s
mysql-ocs-fs-5    2/2     Running   0          3m40s
mysql-ocs-fs-6    2/2     Running   0          3m40s
mysql-ocs-fs-7    2/2     Running   0          3m40s
mysql-ocs-fs-8    2/2     Running   0          3m40s
mysql-ocs-fs-9    2/2     Running   0          3m40s

Once the deployment is complete you can now test the application and the persistent storage on Ceph.

for pod in $(oc get pods|grep mysql|awk '{print $1}');do echo "pod $pod";oc rsh -c mysql-ocs-fs mysql-ocs-fs-9 mysql -uroot -ppassword -h localhost sysbench -e "SELECT count(1) from sbtest10;";done

Example output:

pod mysql-ocs-fs-0
mysql: [Warning] Using a password on the command line interface can be insecure.
+----------+
| count(1) |
+----------+
|  1000000 |
+----------+
pod mysql-ocs-fs-1
mysql: [Warning] Using a password on the command line interface can be insecure.
+----------+
| count(1) |
+----------+
|  1000000 |
+----------+

We’re now going to check again that our storage is actually being used as expected:

TOOLS_POD=$(oc get pods -n openshift-storage -l app=rook-ceph-tools -o name)
oc rsh -n openshift-storage pod/$TOOLS_POD ceph df

Example output

RAW STORAGE:
    CLASS     SIZE	AVAIL       USED       RAW USED     %RAW USED
    ssd       6 TiB     5.9 TiB     88 GiB	 91 GiB          1.49
    TOTAL     6 TiB     5.9 TiB     88 GiB	 91 GiB          1.49

POOLS:
    POOL                                           ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
    ocs-storagecluster-cephblockpool                1	   57 MiB          68     170 MiB         0	  1.7 TiB
    ocs-storagecluster-cephfilesystem-metadata      2     191 MiB         213     572 MiB      0.01	  1.7 TiB
    ocs-storagecluster-cephfilesystem-data0         3	   29 GiB      13.17k	   88 GiB      1.68	  1.7 TiB

This guide ends here. It is intentionally been kept short in order to have simple and quick end-to-end steps to get a working OCS solution running.

back

tags:

Michele Orlandi

Developer and architect focused on distributed solutions for the next unprecedented, game-changing, yet-to-be-announced, highly-anticipated hype

End-to-end infra node installation of OCS on OCP

Overview

Openshift installation on AWS cloud

Verify your AWS environment

Create your install-config.yaml

Deploy your storage backend using the OCS operator

Scale OCP cluster and add new infra worker nodes

Installing the OCS operator

Using the Rook-Ceph toolbox to check on the Ceph backing storage

Test an OCP application deployment using a CephFS volume