Edition 1
pacemaker_remote
service supported by Pacemaker version 1.1.10 and greater
allows nodes not running the cluster stack (pacemaker+corosync) to integrate into the cluster and have the cluster manage their resources just as if they were a real cluster node. This means that pacemaker clusters are now capable of managing both launching virtual environments (KVM/LXC) as well as launching the resources that live within those virtual environments without requiring the virtual environments to run pacemaker or corosync.
cluster-node
- A baremetal hardware node running the High Availability stack (pacemaker + corosync)
remote-node
- A virtual guest node running the pacemaker_remote service.
pacemaker_remote
- A service daemon capable of performing remote application management within virtual guests (kvm and lxc) in both pacemaker cluster environments and standalone (non-cluster) environments. This service is an enhanced version of pacemaker’s local resource manage daemon (LRMD) that is capable of managing and monitoring LSB, OCF, upstart, and systemd resources on a guest remotely. It also allows for most of pacemaker’s cli tools (crm_mon, crm_resource, crm_master, crm_attribute, ect..) to work natively on remote-nodes.
LXC
- A Linux Container defined by the libvirt-lxc Linux container driver. http://libvirt.org/drvlxc.html
"I want a pacemaker cluster to manage virtual machine resources, but I also want pacemaker to be able to manage the resources that live within those virtual machines."
No 16 node corosync member limits
to deal with. That isn’t to say remote-nodes can scale indefinitely, but the expectation is that remote-nodes scale horizontally much further than cluster-nodes. Other than the quorum limitation, these remote-nodes behave just like cluster nodes in respects to resource management. The cluster is fully capable of managing and monitoring resources on each remote-node. You can build constraints against remote-nodes, put them in standby, or whatever else you’d expect to be able to do with normal cluster-nodes. They even show up in the crm_mon output as you would expect cluster-nodes to.
I want to isolate and limit the system resources (cpu, memory, filesystem) a cluster resource can consume without using virtual machines.
Put an authkey with this path, /etc/pacemaker/authkey, on every cluster-node and virtual machine
. This secures remote communication and authentication.
dd if=/dev/urandom of=/etc/pacemaker/authkey bs=4096 count=1
Install pacemaker_remote packages every virtual machine, enable pacemaker_remote on startup, and poke hole in firewall for tcp port 3121.
yum install pacemaker-remote resource-agents systemctl enable pacemaker_remote # If you just want to see this work, disable iptables and ip6tables on most distros. # You may have to put selinux in permissive mode as well for the time being. firewall-cmd --add-port 3121/tcp --permanent
Give each virtual machine a static network address and unique hostname
Tell pacemaker to launch a virtual machine and that the virtual machine is a remote-node capable of running resources by using the "remote-node" meta-attribute.
# pcs resource create vm-guest1 VirtualDomain hypervisor="qemu:///system" config="vm-guest1.xml" meta +remote-node=guest1+
<primitive class="ocf" id="vm-guest1" provider="heartbeat" type="VirtualDomain"> <instance_attributes id="vm-guest-instance_attributes"> <nvpair id="vm-guest1-instance_attributes-hypervisor" name="hypervisor" value="qemu:///system"/> <nvpair id="vm-guest1-instance_attributes-config" name="config" value="guest1.xml"/> </instance_attributes> <operations> <op id="vm-guest1-interval-30s" interval="30s" name="monitor"/> </operations> <meta_attributes id="vm-guest1-meta_attributes"> <nvpair id="vm-guest1-meta_attributes-remote-node" name="remote-node" value="guest1"/> </meta_attributes> </primitive>
Last updated: Wed Mar 13 13:52:39 2013 Last change: Wed Mar 13 13:25:17 2013 via crmd on node1 Stack: corosync Current DC: node1 (24815808) - partition with quorum Version: 1.1.10 2 Nodes configured, unknown expected votes 2 Resources configured. Online: [ node1 guest1] vm-guest1 (ocf::heartbeat:VirtualDomain): Started node1
# pcs resource create webserver apache params configfile=/etc/httpd/conf/httpd.conf op monitor interval=30s # pcs constraint webserver prefers guest1
Last updated: Wed Mar 13 13:52:39 2013 Last change: Wed Mar 13 13:25:17 2013 via crmd on node1 Stack: corosync Current DC: node1 (24815808) - partition with quorum Version: 1.1.10 2 Nodes configured, unknown expected votes 2 Resources configured. Online: [ node1 guest1] vm-guest1 (ocf::heartbeat:VirtualDomain): Started node1 webserver (ocf::heartbeat::apache): Started guest1
Option | Default | Description |
---|---|---|
|
<none>
|
The name of the remote-node this resource defines. This both enables the resource as a remote-node and defines the unique name used to identify the remote-node. If no other parameters are set, this value will also be assumed as the hostname to connect to at port 3121.
WARNING This value cannot overlap with any resource or node IDs.
|
|
3121
|
Configure a custom port to use for the guest connection to pacemaker_remote.
|
|
remote-node value used as hostname
|
The ip address or hostname to connect to if remote-node’s name is not the hostname of the guest.
|
|
60s
|
How long before a pending guest connection will time out.
|
tcp port 3121
. This means both the cluster-node and remote-node must share the same private key. By default this key must be placed at "/etc/pacemaker/authkey" on both cluster-nodes and remote-nodes
.
#==#==# Pacemaker Remote # Use a custom directory for finding the authkey. PCMK_authkey_location=/etc/pacemaker/authkey # # Specify a custom port for Pacemaker Remote connections PCMK_remote_port=3121
What this tutorial is:
This tutorial is an in-depth walk-through of how to get pacemaker to manage a KVM guest instance and integrate that guest into the cluster as a remote-node.
What this tutorial is not:
This tutorial is not a realistic deployment scenario. The steps shown here are meant to get users familiar with the concept of remote-nodes as quickly as possible.
WARNING:
These actions will open a significant security threat to machines exposed to the outside world.
# setenforce 0 # sed -i.bak "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config # systemctl disable iptables.service # systemctl disable ip6tables.service # rm '/etc/systemd/system/basic.target.wants/iptables.service' # rm '/etc/systemd/system/basic.target.wants/ip6tables.service' # systemctl stop iptables.service # systemctl stop ip6tables.service
# yum install -y pacemaker corosync pcs resource-agents
# export corosync_addr=`ip addr | grep "inet " | tail -n 1 | awk '{print $4}' | sed s/255/0/g`
# echo $corosync_addr
# cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf # sed -i.bak "s/.*\tbindnetaddr:.*/bindnetaddr:\ $corosync_addr/g" /etc/corosync/corosync.conf # cat << END >> /etc/corosync/corosync.conf quorum { provider: corosync_votequorum expected_votes: 2 } END
# pcs cluster start
# pcs status corosync Membership information Nodeid Votes Name 1795270848 1 example-host (local)
# pcs status Last updated: Thu Mar 14 12:26:00 2013 Last change: Thu Mar 14 12:25:55 2013 via crmd on example-host Stack: corosync Current DC: Version: 1.1.10 1 Nodes configured, unknown expected votes 0 Resources configured.
# pcs status Last updated: Thu Mar 14 12:28:23 2013 Last change: Thu Mar 14 12:25:55 2013 via crmd on example-host Stack: corosync Current DC: example-host (1795270848) - partition WITHOUT quorum Version: 1.1.8-9b13ea1 1 Nodes configured, unknown expected votes 0 Resources configured. Online: [ example-host ]
# pcs cluster stop
# yum install -y kvm libvirt qemu-system qemu-kvm bridge-utils virt-manager # systemctl enable libvirtd.service
export remote_hostname=guest1 export remote_ip=192.168.122.10 export remote_gateway=192.168.122.1 yum remove -y NetworkManager rm -f /etc/hostname cat << END >> /etc/hostname $remote_hostname END hostname $remote_hostname cat << END >> /etc/sysconfig/network HOSTNAME=$remote_hostname GATEWAY=$remote_gateway END sed -i.bak "s/.*BOOTPROTO=.*/BOOTPROTO=none/g" /etc/sysconfig/network-scripts/ifcfg-eth0 cat << END >> /etc/sysconfig/network-scripts/ifcfg-eth0 IPADDR0=$remote_ip PREFIX0=24 GATEWAY0=$remote_gateway DNS1=$remote_gateway END systemctl restart network systemctl enable network.service systemctl enable sshd systemctl start sshd echo "checking connectivity" ping www.google.com
# setenforce 0 # sed -i.bak "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config # firewall-cmd --add-port 3121/tcp --permanent
HOST
machine run these commands to generate an authkey and copy it to the /etc/pacemaker folder on both the host and guest.
# mkdir /etc/pacemaker # dd if=/dev/urandom of=/etc/pacemaker/authkey bs=4096 count=1 # scp -r /etc/pacemaker root@192.168.122.10:/etc/
GUEST
install pacemaker-remote package and enable the daemon to run at startup. In the commands below you will notice the pacemaker and pacemaker_remote packages are being installed. The pacemaker package is not required. The only reason it is being installed for this tutorial is because it contains the a Dummy resource agent we will be using later on to test the remote-node.
# yum install -y pacemaker paceamaker-remote resource-agents # systemctl enable pacemaker_remote.service
# systemctl start pacemaker_remote.service # systemctl status pacemaker_remote pacemaker_remote.service - Pacemaker Remote Service Loaded: loaded (/usr/lib/systemd/system/pacemaker_remote.service; enabled) Active: active (running) since Thu 2013-03-14 18:24:04 EDT; 2min 8s ago Main PID: 1233 (pacemaker_remot) CGroup: name=systemd:/system/pacemaker_remote.service └─1233 /usr/sbin/pacemaker_remoted Mar 14 18:24:04 guest1 systemd[1]: Starting Pacemaker Remote Service... Mar 14 18:24:04 guest1 systemd[1]: Started Pacemaker Remote Service. Mar 14 18:24:04 guest1 pacemaker_remoted[1233]: notice: lrmd_init_remote_tls_server: Starting a tls listener on port 3121.
# cat << END >> /etc/hosts 192.168.122.10 guest1 END
# telnet guest1 3121 Trying 192.168.122.10... Connected to guest1. Escape character is '^]'. Connection closed by foreign host.
# telnet guest1 3121 Trying 192.168.122.10... telnet: connect to address 192.168.122.10: No route to host
# pcs cluster start
Last updated: Thu Mar 14 16:41:22 2013 Last change: Thu Mar 14 16:41:08 2013 via crmd on example-host Stack: corosync Current DC: example-host (1795270848) - partition WITHOUT quorum Version: 1.1.10 1 Nodes configured, unknown expected votes 0 Resources configured. Online: [ example-host ]
# pcs property set stonith-enabled=false # pcs property set no-quorum-policy=ignore
# cat << END >> /etc/hosts 192.168.122.10 guest1 END
VirtualDomain
resource agent for the management of the virtual machine. This agent requires the virtual machine’s xml config to be dumped to a file on disk. To do this pick out the name of the virtual machine you just created from the output of this list.
# virsh list --all
Id Name State
______________________________________________
- guest1 shut off
# virsh dumpxml guest1 > /root/guest1.xml
# pcs resource create vm-guest1 VirtualDomain hypervisor="qemu:///system" config="/root/guest1.xml" meta remote-node=guest1
Last updated: Fri Mar 15 09:30:30 2013 Last change: Thu Mar 14 17:21:35 2013 via cibadmin on example-host Stack: corosync Current DC: example-host (1795270848) - partition WITHOUT quorum Version: 1.1.10 2 Nodes configured, unknown expected votes 2 Resources configured. Online: [ example-host guest1 ] Full list of resources: vm-guest1 (ocf::heartbeat:VirtualDomain): Started example-host
# pcs resource create FAKE1 ocf:pacemaker:Dummy # pcs resource create FAKE2 ocf:pacemaker:Dummy # pcs resource create FAKE3 ocf:pacemaker:Dummy # pcs resource create FAKE4 ocf:pacemaker:Dummy # pcs resource create FAKE5 ocf:pacemaker:Dummy
Full list of resources: vm-guest1 (ocf::heartbeat:VirtualDomain): Started example-host FAKE1 (ocf::pacemaker:Dummy): Started guest1 FAKE2 (ocf::pacemaker:Dummy): Started guest1 FAKE3 (ocf::pacemaker:Dummy): Started example-host FAKE4 (ocf::pacemaker:Dummy): Started guest1 FAKE5 (ocf::pacemaker:Dummy): Started example-host
# pcs constraint FAKE3 prefers guest1
Full list of resources: vm-guest1 (ocf::heartbeat:VirtualDomain): Started example-host FAKE1 (ocf::pacemaker:Dummy): Started guest1 FAKE2 (ocf::pacemaker:Dummy): Started guest1 FAKE3 (ocf::pacemaker:Dummy): Started guest1 FAKE4 (ocf::pacemaker:Dummy): Started example-host FAKE5 (ocf::pacemaker:Dummy): Started example-host
# kill -9 `pidof pacemaker_remoted`
Last updated: Fri Mar 15 11:00:31 2013 Last change: Fri Mar 15 09:54:16 2013 via cibadmin on example-host Stack: corosync Current DC: example-host (1795270848) - partition WITHOUT quorum Version: 1.1.10 2 Nodes configured, unknown expected votes 7 Resources configured. Online: [ example-host ] OFFLINE: [ guest1 ] Full list of resources: vm-guest1 (ocf::heartbeat:VirtualDomain): Started example-host FAKE1 (ocf::pacemaker:Dummy): Stopped FAKE2 (ocf::pacemaker:Dummy): Stopped FAKE3 (ocf::pacemaker:Dummy): Stopped FAKE4 (ocf::pacemaker:Dummy): Started example-host FAKE5 (ocf::pacemaker:Dummy): Started example-host Failed actions: guest1_monitor_30000 (node=example-host, call=3, rc=7, status=complete): not running
Last updated: Fri Mar 15 11:03:17 2013 Last change: Fri Mar 15 09:54:16 2013 via cibadmin on example-host Stack: corosync Current DC: example-host (1795270848) - partition WITHOUT quorum Version: 1.1.10 2 Nodes configured, unknown expected votes 7 Resources configured. Online: [ example-host guest1 ] Full list of resources: vm-guest1 (ocf::heartbeat:VirtualDomain): Started example-host FAKE1 (ocf::pacemaker:Dummy): Started guest1 FAKE2 (ocf::pacemaker:Dummy): Started guest1 FAKE3 (ocf::pacemaker:Dummy): Started guest1 FAKE4 (ocf::pacemaker:Dummy): Started example-host FAKE5 (ocf::pacemaker:Dummy): Started example-host Failed actions: guest1_monitor_30000 (node=example-host, call=3, rc=7, status=complete): not running
The pacemaker_remote daemon allows nearly all the pacemaker tools (crm_resource, crm_mon, crm_attribute, crm_master) to work on remote nodes natively.
crm_mon
or pcs status
on the guest after pacemaker has integrated the remote-node into the cluster. These tools just work. These means resource agents such as master/slave resources which need access to tools like crm_master work seamlessly on the remote-nodes.
What this tutorial is:
This tutorial demonstrates how pacemaker_remote can be used with Linux containers (managed by libvirt-lxc) to run cluster resources in an isolated environment.
What this tutorial is not:
This tutorial is not a realistic deployment scenario. The steps shown here are meant to introduce users to the concept of managing Linux container environments with Pacemaker.
# setenforce 0 # sed -i.bak "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config # firewall-cmd --add-port 3121/tcp --permanent # systemctl disable iptables.service # systemctl disable ip6tables.service # rm '/etc/systemd/system/basic.target.wants/iptables.service' # rm '/etc/systemd/system/basic.target.wants/ip6tables.service' # systemctl stop iptables.service # systemctl stop ip6tables.service
# yum install -y pacemaker pacemaker-remote corosync pcs resource-agents
# export corosync_addr=`ip addr | grep "inet " | tail -n 1 | awk '{print $4}' | sed s/255/0/g`
# echo $corosync_addr
# cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf # sed -i.bak "s/.*\tbindnetaddr:.*/bindnetaddr:\ $corosync_addr/g" /etc/corosync/corosync.conf # cat << END >> /etc/corosync/corosync.conf quorum { provider: corosync_votequorum expected_votes: 2 } END
# pcs cluster start
# pcs status corosync Membership information Nodeid Votes Name 1795270848 1 example-host (local)
# pcs status Last updated: Thu Mar 14 12:26:00 2013 Last change: Thu Mar 14 12:25:55 2013 via crmd on example-host Stack: corosync Current DC: Version: 1.1.10 1 Nodes configured, unknown expected votes 0 Resources configured.
# pcs status Last updated: Thu Mar 14 12:28:23 2013 Last change: Thu Mar 14 12:25:55 2013 via crmd on example-host Stack: corosync Current DC: example-host (1795270848) - partition WITHOUT quorum Version: 1.1.8-9b13ea1 1 Nodes configured, unknown expected votes 0 Resources configured. Online: [ example-host ]
# pcs cluster stop
# yum install -y libvirt libvirt-daemon-lxc wget # systemctl enable libvirtd
# mkdir /root/lxc/ # cd /root/lxc/ # wget https://raw.github.com/davidvossel/pcmk-lxc-autogen/master/lxc-autogen # chmod 755 lxc-autogen
# ./lxc-autogen
# virsh net-list Name State Autostart Persistent ________________________________________________________ default active yes yes # virsh net-dumpxml default | grep -e "ip address=" <ip address='192.168.122.1' netmask='255.255.255.0'>
# dd if=/dev/urandom of=/etc/pacemaker/authkey bs=4096 count=1
# pcs cluster start
Last updated: Thu Mar 14 16:41:22 2013 Last change: Thu Mar 14 16:41:08 2013 via crmd on example-host Stack: corosync Current DC: example-host (1795270848) - partition WITHOUT quorum Version: 1.1.10 1 Nodes configured, unknown expected votes 0 Resources configured. Online: [ example-host ]
# pcs property set stonith-enabled=false # pcs property set no-quorum-policy=ignore
# pcs resource create container1 VirtualDomain force_stop="true" hypervisor="lxc:///" config="/root/lxc/lxc1.xml" meta remote-node=lxc1 # pcs resource create container2 VirtualDomain force_stop="true" hypervisor="lxc:///" config="/root/lxc/lxc2.xml" meta remote-node=lxc2 # pcs resource create container3 VirtualDomain force_stop="true" hypervisor="lxc:///" config="/root/lxc/lxc3.xml" meta remote-node=lxc3
Last updated: Mon Mar 18 17:15:46 2013 Last change: Mon Mar 18 17:15:26 2013 via cibadmin on guest1 Stack: corosync Current DC: example-host (175810752) - partition WITHOUT quorum Version: 1.1.10 4 Nodes configured, unknown expected votes 6 Resources configured. Online: [ example-host lxc1 lxc2 lxc3 ] Full list of resources: container3 (ocf::heartbeat:VirtualDomain): Started example-host container1 (ocf::heartbeat:VirtualDomain): Started example-host container2 (ocf::heartbeat:VirtualDomain): Started example-host
# pcs resource create FAKE1 ocf:pacemaker:Dummy # pcs resource create FAKE2 ocf:pacemaker:Dummy # pcs resource create FAKE3 ocf:pacemaker:Dummy # pcs resource create FAKE4 ocf:pacemaker:Dummy # pcs resource create FAKE5 ocf:pacemaker:Dummy
Last updated: Mon Mar 18 17:31:54 2013 Last change: Mon Mar 18 17:31:05 2013 via cibadmin on example-host Stack: corosync Current DC: example=host (175810752) - partition WITHOUT quorum Version: 1.1.10 4 Nodes configured, unknown expected votes 11 Resources configured. Online: [ example-host lxc1 lxc2 lxc3 ] Full list of resources: container3 (ocf::heartbeat:VirtualDomain): Started example-host container1 (ocf::heartbeat:VirtualDomain): Started example-host container2 (ocf::heartbeat:VirtualDomain): Started example-host FAKE1 (ocf::pacemaker:Dummy): Started lxc1 FAKE2 (ocf::pacemaker:Dummy): Started lxc2 FAKE3 (ocf::pacemaker:Dummy): Started lxc3 FAKE4 (ocf::pacemaker:Dummy): Started lxc1 FAKE5 (ocf::pacemaker:Dummy): Started lxc2
# ls lxc1-filesystem/var/run/
Dummy-FAKE4.state Dummy-FAKE.state
# ps -A | grep -e pacemaker_remote* 9142 pts/2 00:00:00 pacemaker_remot 10148 pts/4 00:00:00 pacemaker_remot 10942 pts/6 00:00:00 pacemaker_remot
# kill -9 9142
Revision History | |||
---|---|---|---|
Revision 1-0 | Tue Mar 19 2013 | ||
| |||
Revision 2-0 | Tue May 13 2013 | ||
|