Skip to content

Opencast performance

We detected excessive time when uploading big files, taking hours for a single 50 GBytes file upload. So we decided to make some tests and compare the time required by mounting the shared disks using kernel instead of fuse.

https://clouddocs.web.cern.ch/file_shares/manual_cephfs.html

Test file:

From Zone Storage Path
ocworker-prod-08 cern-geneva-c Local /storage/opencast

curl --progress-bar --remote-name --location https://videos.cern.ch/api/files/44fa70f3-ca7a-4af4-843d-f1661d16e2c8/CERN-FOOTAGE-2020-055-001.mov?versionId=89da7b1a-965e-411c-ab2e-74d296276bbc&download -o CERN-FOOTAGE-2020-055-001.mov

Downloading:

File system Incoming rates (Mbps)
local 1228.81 Mbps
fuse 424.35 Mbps
kernel 1177.21 Mbps
Type Size Duration Frame rate Bit rate Resolution Frame count
Apple ProRes (iCodec Pro) 68.2 GB 11:36 25 782865090 3840x2160 17418

Ingest and encoding operations

Test Local

Operation Started Finished Total
Created 11:34:00 AM
WF Starts 11:51:07 AM 0h17m
Inspect 11:51:23 AM 12:02:00 PM 11m
Encode 720p 12:02:01 PM 12:26:55 PM 24m
Encode 1080p 12:27:03 PM 12:59:22 PM 32m
Encode 480p 12:59:35 PM 1:21:52 PM 22m
Encode 360p 1:22:06 PM 1:44:19 PM 22m
Encode 2160p 1:44:33 PM 2:43:19 PM 1h01m
2h55m

Test CEPHs fuse

Operation Started Finished Total
Created 00:53:00 PM
WF Starts 2:42:18 AM 1h49m
Inspect 2:42:27 AM 3:30:27 AM 48m
Encode 720p 3:30:27 AM 3:57:16 AM 27m
Encode 1080p 3:57:25 AM 4:31:49 AM 34m
Encode 480p 4:32:02 AM 4:56:55 AM 24m
Encode 360p 4:57:08 AM 5:21:01 AM 24m
Encode 2160p 5:21:15 AM 6:21:56 AM 1h0m
Exec. time 4:06:39

Test CEPHs kernel

Operation Started Finished Total
Created 9:59:00 AM
WF Starts 10:16:23 AM 17m
Inspect 10:16:28 AM 10:28:35 AM 12m
Encode 720p 10:28:36 AM 10:54:45 AM 26m
Encode 1080p 10:54:53 AM 11:28:33 AM 33m
Encode 480p 11:28:46 AM 11:51:58 AM 23m
Encode 360p 11:52:12 AM 12:14:35 AM 22m
Encode 2160p 12:14:48 PM 01:14:25 PM 1h
Exec. time 2:58:22

Comparision table

Operation Local CEPH fuse CEPH kernel
Ingest 0h17m 1h47m 17m
Inspect 11m 44m 12m
Encode 360p 22m 24m 23m
Encode 480p 22m 25m 22m
Encode 720p 24m 27m 25m
Encode 1080p 32m 34m 32m
Encode 2160p 1h01m 1h1m 1h1m
Exec time ~2h55m 4h06m 2h58m

Network rates (iptraf-ng)

Environment Local (kbps) CEPH fuse (kbps) CEPH kernel (kbps)
Test Local 542593.52 63879.06 547853.46

System setup

From fuse to kernel


  file {'/mnt/opencast_share':
    ensure => directory,
  }
  -> cephfs {'/mnt/opencast_share':
    cluster     => $ceph_opencast_cluster,
    remote_path => $ceph_opencast_remote,
    id          => $ceph_opencast_id,
    type        => 'kernel'
  }

/etc/fstab


none    /mnt/opencast_share     fuse.ceph       ceph.id=opencast_test_01_rw,ceph.conf=/etc/ceph/dwight.conf,ceph.client_mountpoint=/volumes/_nogroup/90e16532-f3e1-4c90-85a8-11e70572914e,x-systemd.device-timeout=30,x-systemd.mount-timeout=30,noatime,_netdev 


188.184.86.25:6790,188.184.94.56:6790,188.185.66.208:6790:/volumes/_nogroup/90e16532-f3e1-4c90-85a8-11e70572914e on /mnt/opencast_share type ceph (rw,noatime,seclabel,name=opencast_test_01_rw,secret=<hidden>,acl)

Files transfer operation

File copy during ingest operation

Environment Total (Mbps) pps Incoming (kbps) pps Outgoing (Mbps) pps
Prod fuse 1342.07 6881 1625.13 3503 1340.45 3378
Test kernel 4738.49 98373 19493.28 43018 4719.00 55354
  • https://clouddocs.web.cern.ch/file_shares/share_types.html

Conclusions

  • Kernel mount is x5 - x10 faster than fuse (from 65311.39 kbps - 97738.33 kbps during the ingest operation to 598169.62 kbps)

  • Ingest track improved. The time required goes from 1h50m-2h30m to 12m

  • The "heavy" inspect operation is improved. The time required goes from 45m-60m to 17m

  • kernel file copy is 2-3 times faster than fuse.

  • Bug detected in Centos Stream 8 when using CEPH kernel mount: Notified to the Storage Group. Debug and reported by Dan van der Ster [https://its.cern.ch/jira/browse/CRM-4180] cephfs: disable async dirops on stream 8 kernel 358 - CERN Central Jira Bug #54013:https://tracker.ceph.com/issues/54013 centos stream 8 kernel 358: async dirops causes Cannot write: Operation not permitted - Linux kernel client - Ceph

Actions

Set sestatus to permissive

getenforce
Permissive
  class { selinux:
    mode => 'permissive',
    type => 'targeted',
  }
[root@ocworker-test-2 ~]# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   permissive
Mode from config file:          permissive
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Memory protection checking:     actual (secure)
Max kernel policy version:      33

Select the appropriate ceph type


    file {'/mnt/opencast_share':
      ensure => directory,
      owner  => 'opencast',
      group  => 'opencast',
      mode   => '0775'    
    }    
    -> cephfs {'/mnt/opencast_share':
      cluster     => $ceph_opencast_cluster,
      remote_path => $ceph_opencast_remote,
      id          => $ceph_opencast_id,
      type        => 'kernel'
      #mount_options => 'wsync,rw'
    }

The mount options are by applied by default since the Bug was notified

Ensure that the -purge is included in yaml declaration: frontend.yaml worker.yaml ingest.yaml


pluginsync_filter_enable: True
pluginsync_filter:
  - purge

Unmount the share before puppet update


systemctl stop opencast
umount -f -l /mnt/opencast_share
umount -f -l /mnt/media_share
puppet agent -tv

For the worker also:

umount -f -l /mnt/master_share

A host reboot is recommended.