Opencast performance
We detected excessive time when uploading big files, taking hours for a single 50 GBytes file upload. So we decided to make some tests and compare the time required by mounting the shared disks using kernel instead of fuse.
https://clouddocs.web.cern.ch/file_shares/manual_cephfs.html
Test file:
From | Zone | Storage | Path |
---|---|---|---|
ocworker-prod-08 | cern-geneva-c | Local | /storage/opencast |
curl --progress-bar --remote-name --location https://videos.cern.ch/api/files/44fa70f3-ca7a-4af4-843d-f1661d16e2c8/CERN-FOOTAGE-2020-055-001.mov?versionId=89da7b1a-965e-411c-ab2e-74d296276bbc&download -o CERN-FOOTAGE-2020-055-001.mov
Downloading:
File system | Incoming rates (Mbps) |
---|---|
local | 1228.81 Mbps |
fuse | 424.35 Mbps |
kernel | 1177.21 Mbps |
Type | Size | Duration | Frame rate | Bit rate | Resolution | Frame count |
---|---|---|---|---|---|---|
Apple ProRes (iCodec Pro) | 68.2 GB | 11:36 | 25 | 782865090 | 3840x2160 | 17418 |
Ingest and encoding operations
Test Local
Operation | Started | Finished | Total |
---|---|---|---|
Created | 11:34:00 AM | ||
WF Starts | 11:51:07 AM | 0h17m | |
Inspect | 11:51:23 AM | 12:02:00 PM | 11m |
Encode 720p | 12:02:01 PM | 12:26:55 PM | 24m |
Encode 1080p | 12:27:03 PM | 12:59:22 PM | 32m |
Encode 480p | 12:59:35 PM | 1:21:52 PM | 22m |
Encode 360p | 1:22:06 PM | 1:44:19 PM | 22m |
Encode 2160p | 1:44:33 PM | 2:43:19 PM | 1h01m |
2h55m |
Test CEPHs fuse
Operation | Started | Finished | Total |
---|---|---|---|
Created | 00:53:00 PM | ||
WF Starts | 2:42:18 AM | 1h49m | |
Inspect | 2:42:27 AM | 3:30:27 AM | 48m |
Encode 720p | 3:30:27 AM | 3:57:16 AM | 27m |
Encode 1080p | 3:57:25 AM | 4:31:49 AM | 34m |
Encode 480p | 4:32:02 AM | 4:56:55 AM | 24m |
Encode 360p | 4:57:08 AM | 5:21:01 AM | 24m |
Encode 2160p | 5:21:15 AM | 6:21:56 AM | 1h0m |
Exec. time | 4:06:39 |
Test CEPHs kernel
Operation | Started | Finished | Total |
---|---|---|---|
Created | 9:59:00 AM | ||
WF Starts | 10:16:23 AM | 17m | |
Inspect | 10:16:28 AM | 10:28:35 AM | 12m |
Encode 720p | 10:28:36 AM | 10:54:45 AM | 26m |
Encode 1080p | 10:54:53 AM | 11:28:33 AM | 33m |
Encode 480p | 11:28:46 AM | 11:51:58 AM | 23m |
Encode 360p | 11:52:12 AM | 12:14:35 AM | 22m |
Encode 2160p | 12:14:48 PM | 01:14:25 PM | 1h |
Exec. time | 2:58:22 |
Comparision table
Operation | Local | CEPH fuse | CEPH kernel |
---|---|---|---|
Ingest | 0h17m | 1h47m | 17m |
Inspect | 11m | 44m | 12m |
Encode 360p | 22m | 24m | 23m |
Encode 480p | 22m | 25m | 22m |
Encode 720p | 24m | 27m | 25m |
Encode 1080p | 32m | 34m | 32m |
Encode 2160p | 1h01m | 1h1m | 1h1m |
Exec time | ~2h55m | 4h06m | 2h58m |
Network rates (iptraf-ng)
Environment | Local (kbps) | CEPH fuse (kbps) | CEPH kernel (kbps) |
---|---|---|---|
Test Local | 542593.52 | 63879.06 | 547853.46 |
System setup
From fuse to kernel
file {'/mnt/opencast_share':
ensure => directory,
}
-> cephfs {'/mnt/opencast_share':
cluster => $ceph_opencast_cluster,
remote_path => $ceph_opencast_remote,
id => $ceph_opencast_id,
type => 'kernel'
}
/etc/fstab
none /mnt/opencast_share fuse.ceph ceph.id=opencast_test_01_rw,ceph.conf=/etc/ceph/dwight.conf,ceph.client_mountpoint=/volumes/_nogroup/90e16532-f3e1-4c90-85a8-11e70572914e,x-systemd.device-timeout=30,x-systemd.mount-timeout=30,noatime,_netdev
188.184.86.25:6790,188.184.94.56:6790,188.185.66.208:6790:/volumes/_nogroup/90e16532-f3e1-4c90-85a8-11e70572914e on /mnt/opencast_share type ceph (rw,noatime,seclabel,name=opencast_test_01_rw,secret=<hidden>,acl)
Files transfer operation
File copy during ingest operation
Environment | Total (Mbps) | pps | Incoming (kbps) | pps | Outgoing (Mbps) | pps |
---|---|---|---|---|---|---|
Prod fuse | 1342.07 | 6881 | 1625.13 | 3503 | 1340.45 | 3378 |
Test kernel | 4738.49 | 98373 | 19493.28 | 43018 | 4719.00 | 55354 |
- https://clouddocs.web.cern.ch/file_shares/share_types.html
Conclusions
-
Kernel mount is x5 - x10 faster than fuse (from 65311.39 kbps - 97738.33 kbps during the ingest operation to 598169.62 kbps)
-
Ingest track improved. The time required goes from 1h50m-2h30m to 12m
-
The "heavy" inspect operation is improved. The time required goes from 45m-60m to 17m
-
kernel file copy is 2-3 times faster than fuse.
-
Bug detected in Centos Stream 8 when using CEPH kernel mount: Notified to the Storage Group. Debug and reported by Dan van der Ster [https://its.cern.ch/jira/browse/CRM-4180] cephfs: disable async dirops on stream 8 kernel 358 - CERN Central Jira Bug #54013:https://tracker.ceph.com/issues/54013 centos stream 8 kernel 358: async dirops causes Cannot write: Operation not permitted - Linux kernel client - Ceph
Actions
Set sestatus to permissive
getenforce
Permissive
class { selinux:
mode => 'permissive',
type => 'targeted',
}
[root@ocworker-test-2 ~]# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: permissive
Mode from config file: permissive
Policy MLS status: enabled
Policy deny_unknown status: allowed
Memory protection checking: actual (secure)
Max kernel policy version: 33
Select the appropriate ceph type
file {'/mnt/opencast_share':
ensure => directory,
owner => 'opencast',
group => 'opencast',
mode => '0775'
}
-> cephfs {'/mnt/opencast_share':
cluster => $ceph_opencast_cluster,
remote_path => $ceph_opencast_remote,
id => $ceph_opencast_id,
type => 'kernel'
#mount_options => 'wsync,rw'
}
The mount options are by applied by default since the Bug was notified
Ensure that the -purge is included in yaml declaration: frontend.yaml worker.yaml ingest.yaml
pluginsync_filter_enable: True
pluginsync_filter:
- purge
Unmount the share before puppet update
systemctl stop opencast
umount -f -l /mnt/opencast_share
umount -f -l /mnt/media_share
puppet agent -tv
For the worker also:
umount -f -l /mnt/master_share
A host reboot is recommended.