ããã¼ã£ã¡ã ï¼æè¿ã¯åãã¦ãªãã®ã«æãã¦å¤§å¤ã§ãã
é¨å ã®ãµã¼ãã¼ã«ã¯ Kubernetes ã¯ã©ã¹ã¿ãç«ã£ã¦ããã Persistent Volume (PV) ãæä¾ãã CSI Driver ã«ã¯ TopoLVM ã使ã£ã¦ãã¾ããã
TopoLVM èªä½ã¯é常ã«å®å®ãã¦ãããã¼ãã©ãã«ã ã£ãã®ã§ããï¼issue ã«ããã対å¿ãã¦ããã£ã¦å©ãã£ã¦ã¾ãï¼ã 1å¹´éç¨ãã¦ããã¨åç¬ã§ Dynamic Volume Provisioner ã¨ãã¦ä½¿ãã«ã¯è¶³ããªãé¢ãç®ç«ã£ã¦ãã¦ãã¾ããã
主ãªåé¡ã¯ PV ã® Node éã®ç§»åãç°¡åã«ã§ããªããã¨ã§ãã TopoLVM 㯠Node ã« LVM Logical Volume ãåãåºã㦠PV ã¨ãã¦æä¾ãããã®ãªã®ã§ãã¬ããªã±ã¼ã·ã§ã³æ©è½ã¯ããã¾ããããã®ããã PVC ãç´ã¤ãã Pod ã¯å¿ ã Logical Volume ãããç¹å®ã® Node ã«ç«ã¤ããã«ãªãã¾ãã
PVC ãç´ã¤ãã Pod ã Node éã§ç§»åããã«ã¯ PV ãç§»åå ã® Node ã§ä½¿ããå¿ è¦ãããã¾ããTopoLVM åä½ã§ PV ã®ç§»åãè¡ãã«ã¯æä½æ¥ã®ãã¤ã°ã¬ã¼ã·ã§ã³ãå¿ è¦ã§ãããã¢ããã°ã¬ã¼ããªã©ã®ã¡ã³ããã³ã¹æã¯ Pod ã®ç§»åã諦ãã¦åã«æ¢ããç¶æ ã«ãªã£ã¦ãã¾ããã
1å¹´ãã£ã¦ãã¼ããå¢ãããã¨ãããã TopoLVM ã忥ãã¦åæ£ã¹ãã¬ã¼ã¸ã«æãåºããã¨ã«ãã¾ããã
- ã¯ã©ã¦ããã¤ãã£ããªåæ£ã¹ãã¬ã¼ã¸ãæ¤è¨ããã
- Longhorn
- OpenEBS ï¼ãã£ããï¼
- ç§»è¡
ã¯ã©ã¦ããã¤ãã£ããªåæ£ã¹ãã¬ã¼ã¸ãæ¤è¨ããã
ä»å忣ã¹ãã¬ã¼ã¸ãé¸ã¶ã«ããã£ã¦éè¦ããç¹ã¯2ã¤ã§ãã
- éç¨ã楽ãªãã¨
- ä»åã¯ããã¾ã§ã k8s ãå¿«é©ã«ãããã®ã§ããããããæå¼·åæ£ã¹ãã¬ã¼ã¸ãéç¨ãããããã§ã¯ãªããã¨ã«ããæ¥½ã§ããã»ã©è¯ãã
- CSI Driver ã PV ã®åå¨ãæèããã«ãã¼ãã®è¿½å ãã¡ã³ããã³ã¹ãè¡ããã
- ä¾ãã° Node ä¸ã§å°ç¨ããã¤ã¹ãåãåºãã¨ãã£ããã¨ã¯é¢å
- 2å°ã§ãï¼å°ãªãã¨ããã°ã¬ç¶æ
ã§ï¼ä½¿ãããã¨
- ç¾æç¹ã§ Node ã¯3å°ãªã®ã§ã2å°ã«ãã°ã¬ãã¦ãã¨ãããã使ããå¿ è¦ããã
- 3å°ã®ç¶æ ã§ãã£ãã Drain ãã§ãããã¨ãå¿ é
CNCF ã°ããºã®ä¸ãã Rook, OpenEBS, Longhorn ããã£ããè¦ãå¾ã«ãã·ã³ãã«ã§éç¨ã楽ãã㪠Longhorn 㨠OpenEBS ãå®éã«ä½¿ã£ã¦ã¿ã¦ãLonghorn ãæ¡ç¨ãããã¨ã«ãã¾ããã
Longhorn
Longhorn 㯠Rancher Labs ãéçºãã忣ãããã¯ã¹ãã¬ã¼ã¸ã§ãä»ã¯ CNCF Incubating ããã¸ã§ã¯ãã§ããå®å¿æããããããªã㨠Web UI ãããã¾ãã
ä»çµã¿
ä»çµã¿ãããããªããã¨ã«ã¯ä½¿ãããããªãã®ã§ããã£ããè¦ã¦ã¿ã¾ããããã以éå ¨ã¦ã®ææ«ã«æé»ã® "ï¼ãã¶ãï¼" ãå«ã¾ãã¾ãã
Longhorn ã¯å Node ä¸ã®æå®ããããã¹ã« PV ã®å®ä½ã§ãã Replica ãä¿åãã¾ãããã® Replica ãæå®ãããæ°ä»¥ä¸ã® Node ã«ç½®ããããã¨ã§åé·æ§ã確ä¿ãã¾ãã
Pod ã« PV ãæä¾ããã®ã¯å Node ã«ç«ã¤ instance-manager-e 㨠Node ã®å½¹å²ã§ããinstance-manager-e 㯠PV ãèªã¿æ¸ãããããã® iSCSIï¼ã¹ã«ã¸ã¼ï¼ Target ã¨ãã¦æ¯ãèãããã®èå¾ã§å Replica (instance-manager-r) éã¨ã® Read / Write ãè¡ãã¾ãããã® iSCSI Target ã« Node ãã iSCSI ã»ãã·ã§ã³ãè²¼ããã¨ã§ Node ã« PV ã®ããã¤ã¹ãçã¾ãããã®ããã¤ã¹ãã³ã³ããã Volume ã¨ãã¦ä½¿ç¨ãã¾ãã
ãã£ããæ¸ãã¨ãããªæãã
[Pod] <-- local mount --> [Node (iSCSI initiator)] <-- iSCSI session --> [isntance-manager-r Pod (iSCSI target)] <-- TCP --> [Replica ããã Node ä¸ã® instance-manager-r]
Replica
Replica ã®ãã¼ã¿ã®å®ä½ã¯ ext4 ã xfs ã§ãã©ã¼ããããããã«ã¼ãããã¤ã¹ãå·®åãã£ã¹ã¯ã®å½¢ã§ä¿åããããã®ã§ãæå®ãããã£ã¬ã¯ããªã®ä¸ã« PVC ãã¨ã®ãã£ã¬ã¯ããªãåããã¦ä¿åããã¦ããã¾ãã
$ sudo tree /var/lib/longhorn
/var/lib/longhorn
âââ engine-binaries
â  âââ longhornio-longhorn-engine-v1.3.1
â  âââ longhorn
âââ longhorn-disk.cfg
âââ replicas
âââ pvc-a62e9db1-35be-44da-9f2c-03222ebd889c-6446a662
â  âââ revision.counter
â  âââ volume-head-001.img
â  âââ volume-head-001.img.meta
â  âââ volume.meta
â  âââ volume-snap-3adb6421-bc79-40c1-ad02-994a63b280fb.img
â  âââ volume-snap-3adb6421-bc79-40c1-ad02-994a63b280fb.img.meta
...
ãã®å·®åãã£ã¹ã¯ã¯ç¬èªå½¢å¼ããããåç´ã« mount ã³ãã³ãã§ãã¦ã³ããããã¨ã¯ã§ãã¾ããã longhorn-engine çµç±ã§ãã¼ã¿ãåãåºããã¨ãå¯è½ã§ãã
ãã®ããã«ã·ã³ãã«ã«ãã£ã¬ã¯ããªä»¥ä¸ã«ä¿åããã¦ããå½¢ãªã®ã§ãç¹å¥ãª Node ã®ã»ããã¢ãããããªãã¦ãåãã¦ãããã®ã楽*1ï¼
instance-manager-r,e 詳細
Longhorn ãå ¥ããã¨å Node ã« instance-manager-r 㨠instance-manager-e ã® Pod ãä¸ã¤ãã¤ç«ã¡ã¾ãã
instance-manager-r (Replica ã® r) 㯠instance-manager-e ï¼Engine ã® eï¼ ããã® Read / Write ãªã¯ã¨ã¹ãã«å¿ã㦠Replica ã¸ã® Read / Write ãè¡ãã¾ãã
ã½ã¼ã¹ã³ã¼ããèªãã ã¨ããã§ã¯ãEngine ãã Replica ã¸ã® Read ãªã¯ã¨ã¹ãã¯ã©ã¦ã³ãããã³å½¢å¼ã§é¸ã°ãã1ã¤ã® Replica ã¸è¡ããã Write ãªã¯ã¨ã¹ãã¯å ¨ã¦ã® Replica ã¸è¡ãããå ¨ã¦æ¸ãçµããã¾ã§å¾ ã¤ãã¨ããå½¢ã®ããã§ãã
- Read: longhorn-engine/replicator.go at 18fa3c566aedb4080dc50add7479a1a266aa8cc7 · longhorn/longhorn-engine · GitHub
- Write: longhorn-engine/multi_writer_at.go at 18fa3c566aedb4080dc50add7479a1a266aa8cc7 · longhorn/longhorn-engine · GitHub
Engine 㨠Replica éã®é信㯠TCP ä¸ã®ç¬èªãããã³ã«ã§è¡ããã¦ããããã§ãããããã iSCSI ã ã¨æã£ã¦ã人ãå¤ããã§ãã
Replica ã§ TCP server ãç«ã¡ä¸ãã¦ããã¨ãã: longhorn-engine/dataserver.go at 18fa3c566aedb4080dc50add7479a1a266aa8cc7 · longhorn/longhorn-engine · GitHub
iSCSI ã Node å¨ã詳細
å è¿°ããéã instance-manager-e Pod ã¯å¤åãã«ã¯ PV ãæä¾ãã iSCSI Target ã¨ãã¦æ¯ãèãã¾ããããã¦ãå Node ã iSCSI initiator ã¨ãã¦æ¯ãèã *2 ãNode ä¸ã«ãã£ã¦ãã instance-manager-e Pod ã¨ã»ãã·ã§ã³ãè²¼ãã¾ãã
$ sudo iscsiadm -m node 10.100.0.253:3260,1 iqn.2019-10.io.longhorn:pvc-90fd4ba4-e2c5-40ed-a1fe-9f0f5290753f 10.100.0.253:3260,1 iqn.2019-10.io.longhorn:pvc-be3d8c6f-9605-404f-a28b-a4e8785f5d12 10.100.0.253:3260,1 iqn.2019-10.io.longhorn:pvc-e5036d59-a3b6-4350-8bbd-72f8b8800ed2 $ sudo iscsiadm -m session tcp: [1] 10.100.0.253:3260,1 iqn.2019-10.io.longhorn:pvc-90fd4ba4-e2c5-40ed-a1fe-9f0f5290753f (non-flash) tcp: [2] 10.100.0.253:3260,1 iqn.2019-10.io.longhorn:pvc-be3d8c6f-9605-404f-a28b-a4e8785f5d12 (non-flash) tcp: [3] 10.100.0.253:3260,1 iqn.2019-10.io.longhorn:pvc-e5036d59-a3b6-4350-8bbd-72f8b8800ed2 (non-flash) $ sudo iscsiadm -m host tcp: [2] 10.100.0.216,[<empty>],<empty> <empty> tcp: [3] 10.100.0.216,[<empty>],<empty> <empty> tcp: [4] 10.100.0.216,[<empty>],<empty> <empty>
node ã session ã«åºã¦ããã¢ãã¬ã¹ã instance-manager-e Pod ã®ã¢ãã¬ã¹ã host ã«åºã¦ããã¢ãã¬ã¹ã¯ Node ãã Pod ã¸ã¢ã¯ã»ã¹ããã¨ãã®ãããã¯ã¼ã¯ã¤ã³ã¿ã¼ãã§ã¼ã¹ã«ã¤ããã¢ãã¬ã¹ã§ããPVC ãã¨ã« Node ãã Engine 㸠session ãè²¼ããã¦ãããã¨ããããã¾ãã
ã¡ãªã¿ã« iSCSI ã使ã£ã¦ããçç±ã¯ãããã¯ããã¤ã¹ã userspace ããæä¾ããããã®ææ®µãæ¤è¨ããçµæã ããã§ãã
ãã®æç¹ã§ PV ãèªã¿æ¸ãã§ããããã¤ã¹ã Node ä¸ã«çãã¾ãããã®ããã¤ã¹ãè²ã ã¨ãã¦ã³ãããã¦ãããä¸ã®ããã« Pod ã§ Volume ã¨ãã¦è¦ããããã«ãªãã¾ãã
$ mount
...
/dev/longhorn/pvc-e5036d59-a3b6-4350-8bbd-72f8b8800ed2 on /var/lib/kubelet/pods/d674d860-744b-4dd7-84b7-235d3f2eb01b/volumes/kubernetes.io~csi/pvc-e5036d59-a3b6-4350-8bbd-72f8b8800ed2/mount type ext4 (rw,relatime)
...
$ sudo crictl inspect ac0a56409c28d
...
{
"destination": "/var/lib/mysql",
"type": "bind",
"source": "/var/lib/kubelet/pods/d674d860-744b-4dd7-84b7-235d3f2eb01b/volumes/kubernetes.io~csi/pvc-e5036d59-a3b6-4350-8bbd-72f8b8800ed2/mount",
"options": [
"rbind",
"rprivate",
"rw"
]
},
...
ãããã¦çºããã¨æ¡å¤ã·ã³ãã«ãªä»çµã¿ã§åãããããã§ããï¼
Drain æã®æå
01~03 ã®3å°ã§æ§æããã¯ã©ã¹ã¿ã« Longhorn ãã¤ã³ã¹ãã¼ã«ãã Node ã Darin ãã¦ã©ã®ãããªæåã«ãªãã確ããã¦ã¿ã¾ããã
03 ã« PVC ã使ã£ã Pod ãããã02 㨠03 ã« replica ãããç¶æ ã§ 03 ã® Drain ãè¡ã£ãã¨ããããã°ãã PDB ã«ãã£ã¦ Drain ãæå¶ãããã®ã¡ã 02 ã«ä»£ããã® Pod ãç«ã£ã¦ç¡äº Drain ããã¾ããã
䏿¹ Replica ã«ã¤ãã¦ã¯ã Drain ããæ°åç«ããªã㨠01 ã«ã¯ä½ããã¾ããã§ããï¼æ°åéã¯åé·ã§ãªãç¶æ ã ã£ãï¼ã
ããã¯æ³å®ãããæåã®ããã§ãã¾ã Drain ç´å¾ã® longhorn-manager ã®ãã°ãè¦ã㨠Replica ã® Replenishment ãæ°åå¾ã«ãããã¨åºã¦ãã¾ããã
time="2022-08-29T17:28:34Z" level=debug msg="Replica replenishment is delayed until 2022-08-29 17:33:10 +0000 UTC"
ãã®å¾å®éã«æ°åå¾ ã¤ã¨ã01 ã«æ°ãã Replica ãä½ããã¾ããã
time="2022-08-29T17:33:11Z" level=info msg="Cannot find a reusable failed replicas" time="2022-08-29T17:33:11Z" level=debug msg="A new replica pvc-2dcbd883-26d7-44a4-ace4-5e94978a2440-r-2b7f57c0 will be replenished during rebuilding" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=02 owner=02 state=attached volume=pvc-2dcbd883-26d7-44a4-ace4-5e94978a2440 time="2022-08-29T17:33:11Z" level=debug msg="Schedule replica to node 01" dataDirectoryName=pvc-2dcbd883-26d7-44a4-ace4-5e94978a2440-d1ac5f6a disk=c25e7a5c-4d6a-4ab0-a229-c63316142ed2 diskPath=/var/lib/longhorn/ replica=pvc-2dcbd883-26d7-44a4-ace4-5e94978a2440-r-2b7f57c0
Replica ãå¢ããã¾ã§æ°åå¾ ã¤ã®ã¯ã Failed 㪠Replica ã復活ã㦠Reuse ã§ãããã©ãããå¾ ã¤ãããããã§ã*3ã
â¦ã¨ãããã¨ã§ã Drain ã¯ã¹ã ã¼ãºã«ã§ãããæ°åå¾ã«ã¯åé·æ§ã復活ãã¦ããã¾ãããVolume ã«ã¤ãã¦ç¹ã«æ°ã«ãã Node ãè½ã¨ããã¨ãã§ãã¦ä¾¿å©ï¼
ã¡ãªã¿ã«ããã¼ããè½ã¡ãã¨ããæ°åå¾ ã£ã¦å¾©æ´»ããªããã° Replica ãæ° Node ã«ç§»åããããã§ã*4ã
OpenEBS ï¼ãã£ããï¼
CNCF sandbox project ã§ãã OpenEBS ã® Jiva ã試ãã¦ã¿ã¾ãããä»çµã¿ã¯ã ããã Longhorn ã¨åãã§ãã
ãã ãããã¥ã¡ã³ããå¼±ãã£ãããPVC ãä½ããã¨ã« ï¼Replica æ° + 1ï¼ ã¤åã® Pod ãç«ã£ã¦*5ãªã½ã¼ã¹ã®ç¡é§é£ããéç¨ã®é¢åããæ°ã«ãªã£ãããNode ã® Drain ãã¹ã¿ãã¯ãã¦ãã¾ãåããªãã£ãã*6 ã ã£ãã㨠Longhorn ããéç¨ã大å¤ãããªã®ã§ããã¾ãããcStor ã Pool ã lost ããã¨ãã®ããã¥ã¡ã³ã ãèªãã¨å¤§å¤ããã«è¦ããããMayaStor ã¯ã¾ã æ©è½ä¸è¶³æãå¦ãã¾ããã§ããã
ç§»è¡
ã¨ããããã§ PV ã TopoLVM ãã Longhorn ã«ç§»è¡ãã¾ããç¹ã«ããã¼ã¸ãã«ãã£ã¦ããããã®ã¯ãªãããã§ãããç§»è¡ãå¿ è¦ãª PV ãããå¤ããªãã£ããã¨ããã£ã¦æ°åã§é å¼µããã¨ã«ãã¾ããã
- 䏿¦ãã¦ã³ããã¦ãã Pod ãåé¤ãã
- 䏿çã«ç§»è¡ç¨ã® PVC ãæ¢åã® PVC ã¨åãã¹ããã¯ã§ä½ã
- æ§ PV ã¨ç§»è¡ç¨ PV ããã¦ã³ãããé©å½ãª Pod ãç«ã¡ä¸ãã
- rsync PV éã®ãã¼ã¿ãåæãã
# rsync -a /before/ /after
- ç§»è¡ç¨ Pod ãåé¤ãã
- æ§ PVC ãåé¤ããåãååã®æ° PVC ã longhorn æå®ã§ä½ã
- ç§»è¡ç¨ PV ã¨æ° PV ããã¦ã³ããã¦åã³é©å½ãª Pod ãç«ã¡ä¸ãã
- rsync ã§ãã¼ã¿ãåæãã
- ç§»è¡ç¨ Pod 㨠PVC ãåé¤ãã
- åé¤ãã¦ãã Pod ã復活ããã
æ°åã§ãã
ããã㦠TopoLVM ã® PVC ãå ¨ã¦ Longhorn ã«ç§»ããã¨ãã§ãã Node ã® Drain ãæ°è»½ã«ã§ããã¯ã©ã¹ã¿ã«ãªãã¾ããã
Longhorn å°å ¥å¾ãï¼Longhorn ã¨é¢ä¿ãªãçç±ã§ï¼ Node ããªãã©ã¤ã³ã«ãªã£ããã¢ããã°ã¬ã¼ãããã¹ã£ã¦ Unschedulable ã®åµã«ãªã£ããã¨ãã©ãã«ãç¶çºããã®ã§ããããããªä¸ã§ã Longhorn 㯠Replica ã2ã¤ããç¶æ ããã¼ã¡ã³ãã§ç¶æãã¦ãããPVC ã使ã Pod ãã¡ãã㨠Healthy 㪠Node ã«ç§»åãã¦ããã¾ããã
ã¨ã«ããéç¨ã楽㪠Longhorn ãã¯ã³ãã¡ã ã§ãï¼
*1:å ¬å¼ã«ã¯å°ç¨ã®ãã£ã¹ã¯ã使ããã¨ãæ¨å¥¨ããã¦ãã¾ã https://longhorn.io/docs/1.3.1/best-practices/#node-and-disk-setup
*2:Longhorn ã®ã»ããã¢ããã§ Node ã¸ã® open-iscsi ã®ã¤ã³ã¹ãã¼ã«ãè¦æ±ããã
*3:è¨å®ã§èª¿æ´å¯è½ãhttps://longhorn.io/docs/1.3.0/references/settings/#replica-replenishment-wait-interval
*4:é常ã StatefulSet ã¯æåã§ Node ãåé¤ããªãéããªã¹ã±ã¸ã¥ã¼ã«ãããªã Statefulset pods are not evicted when we shutdown the node manually · Issue #103175 · kubernetes/kubernetes · GitHubã Longhorn ã¯ãã®å¯¾çã¨ã㦠StatefulSet ãåé¤ããæ©è½ããããèªåã§ Volume ã¨ã¨ãã«ãªã¹ã±ã¸ã¥ã¼ã«ã§ããããã«ãªã https://longhorn.io/docs/1.3.1/references/settings/#pod-deletion-policy-when-node-is-down
*5:+1 㯠Controller ï¼Longhorn ã§ãã Engineï¼ãiSCSI initiator 㯠Controller Pod ãç´ä»ã Service ã®ã¢ãã¬ã¹ã¨ã»ãã·ã§ã³ãå¼µããã¨ã«ãªããk8s ãã¤ãã£ããªæãã§ç¶ºéºã§ã¯ãããã ãã©ãâ¦
*6:éé¿ããæ©è½ã¯ããããã ãã3å°2ã¬ããªã«ããç¡çãã feat(operator): automate movement replicas when node is removed from cluster by shubham14bajpai · Pull Request #97 · openebs/jiva-operator · GitHub