Skip to content

Commit

Permalink
Follow-up edits for PR#3355
Browse files Browse the repository at this point in the history
  • Loading branch information
ahardin-rh committed Dec 19, 2016
1 parent 6f2c374 commit 692be2c
Showing 1 changed file with 140 additions and 82 deletions.
222 changes: 140 additions & 82 deletions admin_guide/high_availability.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,17 +20,53 @@ IP failover manages a pool of Virtual IP (VIP) addresses on a set of nodes. Ever
The VIPs must be routable from outside the cluster.
====

IP failover monitors a port on each VIP to determine whether the port is reachable on the node. If the port is not reachable, the VIP will not be assigned to the node. If the port is set to `0`, this check is suppressed. xref:../admin_guide/high_availability.adoc#check-notify[check script] that can do whatever testing is needed.

IP failover uses link:http://www.keepalived.org/[*Keepalived*] to host a set of externally accessible VIP addresses on a set of hosts. Each VIP is only serviced by a single host at a time. *Keepalived* uses the VRRP protocol to determine which host (from the set of hosts) will service which VIP. If a host becomes unavailable or if the service that *Keepalived* is watching does not respond, the VIP is switched to another host from the set. Thus, a VIP is always serviced as long as a host is available. The admin can provide a script, via the `--notify-script=` option, that is called whenver the state changes. Keepalived is in *MASTER* state when it is servicing the VIP, in *BACKUP* state when another node is servicing the VIP, or in *FAULT*` state when the check script fails. The xref:../admin_guide/high_availability.adoc#check-notify[notify script] is called with the new state whenever the state changes.

{product-title} supports creation of IP failover deployment configration, by running the `oadm ipfailover` command. The IP failover deployment configration specifies the set of VIP addresses, and the set of nodes on which to service them. A cluster can have multiple IP failover deployment configurations, with each managing its own set of unique VIP addresses. Each node in the IP failover configuration runs an ipfailover pod, and this pod runs *Keepalived*.

When using VIPs to access a pod with host networking (e.g. a router), the application pod should be running on all nodes that are running the ipfailover pods. This enables any of the ipfailover nodes to become the master and service the VIPs when needed. If application pods are not running on all nodes with ipfailover, either some ipfailover nodes will never service the VIPs or some application pods will never receive any traffic. Use the same selector and replication count, for both ipfailover and the application pods, to avoid this mismatch.

While using VIPs to access a service, any of the nodes can be in the ipfailover set of nodes, since the service is reachable on all nodes (no matter where the application pod is running). Any of the ipfailover nodes can become master at any time. The service can either use external IPs and a service port or it can use a nodePort.

When using external IPs in the service definition the VIPs are set to the external IPs and the ipfailover monitoring port is set to the service port. A nodePort is open on every node in the cluster and the service will load balance traffic from whatever node currently supports the VIP. In this case, the ipfailover monitoring port is set to the nodePort in the service definition.
IP failover monitors a port on each VIP to determine whether the port is
reachable on the node. If the port is not reachable, the VIP will not be
assigned to the node. If the port is set to `0`, this check is suppressed.
xref:check-notify[The *check* script] does the needed testing.

IP failover uses link:http://www.keepalived.org/[*Keepalived*] to host a set of
externally accessible VIP addresses on a set of hosts. Each VIP is only serviced
by a single host at a time. *Keepalived* uses the VRRP protocol to determine
which host (from the set of hosts) will service which VIP. If a host becomes
unavailable or if the service that *Keepalived* is watching does not respond,
the VIP is switched to another host from the set. Thus, a VIP is always serviced
as long as a host is available.

The administrator can provide a script via the `--notify-script=` option, which
is called whenever the state changes. *Keepalived* is in *MASTER* state when it
is servicing the VIP, in *BACKUP* state when another node is servicing the VIP,
or in *FAULT*` state when the *check* script fails. The
xref:check-notify[*notify* script] is called with the new state whenever the
state changes.

{product-title} supports creation of IP failover deployment configration, by
running the `oadm ipfailover` command. The IP failover deployment configration
specifies the set of VIP addresses, and the set of nodes on which to service
them. A cluster can have multiple IP failover deployment configurations, with
each managing its own set of unique VIP addresses. Each node in the IP failover
configuration runs an IP failover pod, and this pod runs *Keepalived*.

When using VIPs to access a pod with host networking (e.g. a router), the
application pod should be running on all nodes that are running the ipfailover
pods. This enables any of the ipfailover nodes to become the master and service
the VIPs when needed. If application pods are not running on all nodes with
ipfailover, either some ipfailover nodes will never service the VIPs or some
application pods will never receive any traffic. Use the same selector and
replication count, for both ipfailover and the application pods, to avoid this
mismatch.

While using VIPs to access a service, any of the nodes can be in the ipfailover
set of nodes, since the service is reachable on all nodes (no matter where the
application pod is running). Any of the ipfailover nodes can become master at
any time. The service can either use external IPs and a service port or it can
use a nodePort.

When using external IPs in the service definition the VIPs are set to the
external IPs and the ipfailover monitoring port is set to the service port. A
nodePort is open on every node in the cluster and the service will load balance
traffic from whatever node currently supports the VIP. In this case, the
ipfailover monitoring port is set to the nodePort in the service definition.

[IMPORTANT]
====
Expand Down Expand Up @@ -244,53 +280,77 @@ Each VIP in the set may end up being served by a different node.
====

[[check-notify]]
=== Check and Notify scripts

The *keepalived* port monitoring feature supports one or two scripts that are run on a configured interval to verify that the application is available.
The default script uses a simple tcp connect to verify that the application is running. This test is suppressed when the mointoring port is set to 0.
The admin may supply an additional script that does whatever verification is needed, for example, the script can test a web server by issuing a request and verifying the response. The script must exit with 0 for PASS and 1 for FAIL.

The admin provides the additional script, via the `--check-script=<script>` option. By default the check is done every 2 seconds, this can be changed using the `--check-interval=<seconds>` option.

For each VIP, *keepalived* keeps the state of the node. The VIP on the node may be in *MASTER*, *BACKUP*, or *FAULT* state. All VIPs on the node that are not in the *FAULT* state participate in the negotiation to decide who will be *MASTER* for the VIP. All of the losers enter the *BACKUP* state. When the check script on the *MASTER* fails the VIP enters the *FAULT* state and triggers a renegotiation. When the *BACKUP* fails just enter the *FAULT* state. When the check script once again passes on a VIP in *FAULT* state it exits *FAULT* and negotiates for *MASTER*. The resulting state is either *MASTER* or *BACKUP*.

The admin can provide an (optional) notify script is called with the new state whenever the state changes. *Keepalived* passes the following 3 parameters to the script:

* $1 - "GROUP"|"INSTANCE"
* $2 - name of group or instance
* $3 - the new state ("MASTER"|"BACKUP"|"FAULT")

These scripts run in the ipfailover pod and use the pod's filesystem, not the host filesystem. The options require the full path to the script. It is up to the admin to make the script available in the pod and to extract the results from running the notify script. The recommended approach for providing the scripts is to use a xref:../dev_guide/configmaps.adoc#dev-guide-configmaps[ConfigMap].

The full path names of the check and notify scripts are added to the *keepalived* config file, /etc/keepalived/keepalived.conf, which is loaded every time *keepalived* starts. The scripts can be added to the pod with a ConfigMap as follows.

First, create the desired script and create a ConfigMap to hold it. The script has no input arguments and must return 0 for OK and 1 for fail.

The check script, mycheckscript.sh::
=== Check and Notify Scripts

The *keepalived* port monitoring feature supports one or two scripts that are
run on a configured interval to verify that the application is available. The
default script uses a simple
xref:../install_config/install/prerequisites.adoc#required-ports[TCP
connection] to verify that the application is running. This test is suppressed
when the monitoring port is set to `0`. The administrator can supply an
additional script that does the needed verification. For example, the script can
test a web server by issuing a request and verifying the response. The script
must exit with `0` for *PASS* and `1` for *FAIL*.

The administrator provides the additional script, via the
`--check-script=<script>` option. By default, the check is done every two
seconds, but can be changed using the `--check-interval=<seconds>` option.

For each VIP, *keepalived* keeps the state of the node. The VIP on the node may
be in *MASTER*, *BACKUP*, or *FAULT* state. All VIPs on the node that are not in
the *FAULT* state participate in the negotiation to decide which will be
*MASTER* for the VIP. All of the losers enter the *BACKUP* state. When the
*check* script on the *MASTER* fails, the VIP enters the *FAULT* state and
triggers a renegotiation. When the *BACKUP* fails, the VIP enters the *FAULT*
state. When the *check* script passes again on a VIP in the *FAULT* state, it
exits *FAULT* and negotiates for *MASTER*. The resulting state is either
*MASTER* or *BACKUP*.

The administrator can provide an optional *notify* script, which is called
whenever the state changes. *Keepalived* passes the following three parameters
to the script:

* `$1` - "GROUP"|"INSTANCE"
* `$2` - Name of the group or instance
* `$3` - The new state ("MASTER"|"BACKUP"|"FAULT")

These scripts run in the IP failover pod and use the pod's file system, not the
host file system. The options require the full path to the script. The
administrator must make the script available in the pod to extract the results
from running the *notify* script. The recommended approach for providing the
scripts is to use a
xref:../dev_guide/configmaps.adoc#dev-guide-configmaps[ConfigMap].

The full path names of the *check* and *notify* scripts are added to the
*keepalived* configuration file, *_ /etc/keepalived/keepalived.conf_*, which is
loaded every time *keepalived* starts. The scripts can be added to the pod with
a ConfigMap as follows.

. Create the desired script and create a ConfigMap to hold it. The script
has no input arguments and must return `0` for *OK* and `1` for *FAIL*.
+
The check script, *_mycheckscript.sh_*:
+
[source,bash]
====
----
#!/bin/bash
# Whatever tests are needed
# E.g., send request and verify response
exit 0
----
====

Create the ConfigMap::
====
. Create the ConfigMap:
+
----
$ oc create configmap mycustomcheck --from-file=mycheckscript.sh
----
====

There are two approaches to adding the script to the pod: use `oc` commands or edit the deployment config.
. There are two approaches to adding the script to the pod: use `oc` commands or
edit the deployment configuration.

Using `oc` commands::
.. Using `oc` commands:
+
[source,bash]
====
----
$ oc env dc/ipf-ha-router \
OPENSHIFT_HA_CHECK_SCRIPT=/etc/keepalive/mycheckscript.sh
Expand All @@ -299,14 +359,12 @@ $ oc volume dc/ipf-ha-router --add --overwrite \
--mount-path=/etc/keepalive \
--source='{"configMap": { "name": "mycustomcheck"}}'
----
====
+
Editing the Ipf-HA-Router Deployment Configuration::
.. Editing the *ipf-ha-router* deployment configuration:
+
Use `oc edit dc ipf-ha-router` to edit the router deployment configuration
... Use `oc edit dc ipf-ha-router` to edit the router deployment configuration
with a text editor.
+
====
[source,yaml]
----
...
Expand All @@ -327,31 +385,31 @@ with a text editor.
name: config-volume
...
----
<1> In the `*spec.container.env*` field, add the `OPENSHIFT_HA_CHECK_SCRIPT` environment
variable to point to the mounted script file.
<2> Add the `*spec.container.volumeMounts*` field to create the mount point.
<3> Add a new `*spec.volumes*` field to mention the ConfigMap.
====
<1> In the `spec.container.env` field, add the `OPENSHIFT_HA_CHECK_SCRIPT`
environment variable to point to the mounted script file.
<2> Add the `spec.container.volumeMounts` field to create the mount point.
<3> Add a new `spec.volumes` field to mention the ConfigMap.
+
Save the changes and exit the editor. This restarts ipf-ha-router.



... Save the changes and exit the editor. This restarts *ipf-ha-router*.

[[kepalived-multicast]]
=== Keepalived Multicast
{product-title}'s ipfailover internally uses *keepalived*.

{product-title}'s IP failover internally uses *keepalived*.

[IMPORTANT]
====
Please ensure that multicast is enabled on the nodes labeled above and they can accept network traffic for 224.0.0.18 (the VRRP multicast IP address).
Ensure that *multicast* is enabled on the nodes labeled above and they can
accept network traffic for 224.0.0.18 (the VRRP multicast IP address).
====

Before starting the *keepalived* daemon, the startup script verifies the `iptables` rule that allows multicast traffic to flow.
If there is no such rule, the startup script creates a new rule and adds it to the iptables configuration.
Where this new rule gets added to the iptables configuration, depends on the `--iptables-chain=` option.
If there is an `--iptables-chain=` option specified, the rule gets added to the specified chain in the option;
otherwise the rule is added to the `INPUT` chain.
Before starting the *keepalived* daemon, the startup script verifies the
`iptables` rule that allows multicast traffic to flow. If there is no such
rule, the startup script creates a new rule and adds it to the IP tables
configuration. Where this new rule gets added to the IP tables configuration
depends on the `--iptables-chain=` option. If there is an `--iptables-chain=`
option specified, the rule gets added to the specified chain in the option.
Otherwise, the rule is added to the `INPUT` chain.

[IMPORTANT]
====
Expand All @@ -360,7 +418,7 @@ The `iptables` rule must be present whenever there is one or more *keepalived* d

The `iptables` rule can be removed after the last *keepalived* daemon terminates. The rule is not automatically removed.

You can manually manage the `iptables` rule on each of the nodes. It only gets created when none is present (as long as ipfailover is not created with the -`-iptable-chain=""` option).
You can manually manage the `iptables` rule on each of the nodes. It only gets created when none is present (as long as ipfailover is not created with the -`-iptable-chain=""` option).

[IMPORTANT]
====
Expand Down Expand Up @@ -394,55 +452,56 @@ done;
| Option | Variable Name | Default | Notes

|`--watch-port`
|`*OPENSHIFT_HA_MONITOR_PORT*`
|`OPENSHIFT_HA_MONITOR_PORT`
|80
|The ipfailover pod tries to open a TCP connection to this port on each VIP. If connection is established, the service is considered to be running. If this port is set to 0, the test always passes.

|`--interface`
|`*OPENSHIFT_HA_NETWORK_INTERFACE*`
|`OPENSHIFT_HA_NETWORK_INTERFACE`
|
|The interface name for the ip failover to use, to send VRRP traffic. By default, `eth0` is used.
|The interface name for ipfailover to use, to send VRRP traffic. By default, `eth0` is used.

|`--replicas`
|`*OPENSHIFT_HA_REPLICA_COUNT*`
|`OPENSHIFT_HA_REPLICA_COUNT`
|2
|Number of replicas to create. This must match `spec.replicas` value in ipfailover deployment configration.
|Number of replicas to create. This must match `spec.replicas` value in ipfailover deployment configuration.

|`--virtual-ips`
|`*OPENSHIFT_HA_VIRTUAL_IPS*`
|`OPENSHIFT_HA_VIRTUAL_IPS`
|
|The list of IP address ranges to replicate. This must be provided. E.g., 1.2.3.4-6,1.2.3.9
Please see xref:../admin_guide/high_availability.adoc#ha-vrrp-id-offset[this discussion] for more details.
|The list of IP address ranges to replicate. This must be provided. (For example, 1.2.3.4-6,1.2.3.9.)
See xref:../admin_guide/high_availability.adoc#ha-vrrp-id-offset[this discussion] for more details.

|`--vrrp-id-offset`
|`*OPENSHIFT_HA_VRRP_ID_OFFSET*`
|`OPENSHIFT_HA_VRRP_ID_OFFSET`
|0
|Please see xref:../admin_guide/high_availability.adoc#ha-vrrp-id-offset[VRRP Id Offset] discussion for more details.
|See xref:../admin_guide/high_availability.adoc#ha-vrrp-id-offset[VRRP ID Offset] discussion for more details.

|`--iptables-chain`
|`*OPENSHIFT_HA_IPTABLES_CHAIN*`
|`OPENSHIFT_HA_IPTABLES_CHAIN`
|INPUT
|The name of the iptables chain, to automatically add an `iptables` rule to allow the VRRP traffic on. If the value is not set, an `iptables` rule will not be added. If the chain does not exist, it is not created.

|`--check-script`
|`*OPENSHIFT_HA_CHECK_SCRIPT*`
|`OPENSHIFT_HA_CHECK_SCRIPT`
|
|Full path name in the pod filesystem of a script that will be periodically run to verify the application is operating. Please see xref:../admin_guide/high_availability.adoc#check-notify[this discussion] for more details.
|Full path name in the pod file system of a script that is periodically run to verify the application is operating. Please see xref:../admin_guide/high_availability.adoc#check-notify[this discussion] for more details.

|`--check-interval`
|`*OPENSHIFT_HA_CHECK_INTERVAL*`
|`OPENSHIFT_HA_CHECK_INTERVAL`
|2
|The period, in seconds that the check script is run.
|The period, in seconds, that the check script is run.

|`--notify-script`
|`*OPENSHIFT_HA_NOTIFY_SCRIPT*`
|`OPENSHIFT_HA_NOTIFY_SCRIPT`
|
|Full path name in the pod filesystem of a script that is run whenever the state changes. Please see xref:../admin_guide/high_availability.adoc#check-notify[this discussion] for more details.
|Full path name in the pod file system of a script that is run whenever the state changes. See xref:../admin_guide/high_availability.adoc#check-notify[this discussion] for more details.

|===

[[ha-vrrp-id-offset]]
=== VRRP Id Offset
=== VRRP ID Offset

Each ipfailover pod managed by the ipfailover deployment configration (1 pod per node/replica) runs a *keepalived* daemon. As more ipfailover deployment configrations are configured, more pods are created and more daemons join into the common VRRP negotiation. This negotiation is done by all the *keepalived* daemons and it determines which nodes will service which VIPs.

Internally, *keepalived* assigns a unique vrrp-id to each VIP. The negotiation uses this set of vrrp-ids, when a decision is made, the VIP corresponding to the winning vrrp-id is serviced on the winning node.
Expand Down Expand Up @@ -704,4 +763,3 @@ In non-cloud clusters, ipfailover and xref:../architecture/core_concepts/pods_an
The approach is to specify an `ingressIPNetworkCIDR` range and then use the same range in creating the ipfailover configuration.

Since, ipfailover can support up to a maximum of 255 VIPs for the entire cluster, the `ingressIPNetworkCIDR` needs to be `/24` or less.

0 comments on commit 692be2c

Please sign in to comment.