renay****@ybb*****
renay****@ybb*****
2015年 3月 17日 (火) 14:59:37 JST
福田さん お疲れ様です。山内です。 ふと思ったのすが、以前のやり取りのメールで以下と回答してますが、問題ないでしょうか? >>>>>> 2)Heartbeat3.0.6+Pacemaker最新 : OK >>>>>> >>>>>> どうやら、Heartbeatも最新版3.0.6を組合せる必要があるようです。 >>>>>> * http://hg.linux-ha.org/heartbeat-STABLE_3_0/rev/cceeb47a7d8f 以下のcrm_monのバージョンを見ると、1.1.12のようです。 Heartbeat3.0.6と組み合わせるには、かなり新しめのPacemakerが必要です。 ># crm_mon -rfA > >Last updated: Tue Mar 17 14:14:39 2015 >Last change: Tue Mar 17 14:01:43 2015 >Stack: heartbeat >Current DC: lbv2.beta.com (82ffc36f-1ad8-8686-7db0-35686465c624) - parti >tion with quorum >Version: 1.1.12-561c4cf たぶん、以下の変更以降は少なくとも必要かと思います。 https://github.com/ClusterLabs/pacemaker/commit/f2302da063d08719d28367d8e362b8bfb0f85bf3 以上です。 ----- Original Message ----- >From: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****> >To: 山内英生 <renay****@ybb*****>; "linux****@lists*****" <linux****@lists*****> >Date: 2015/3/17, Tue 14:38 >Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて > > >山内さん > >お疲れ様です、福田です。 > >stonith-helperのシェバング行に-xを追加すれば良いのでしょうか? >stonith-helperの先頭行を#!/bin/bash -xにしてクラスタを起動してみました。 > >crm_monでは先ほどと変わりはないようです。 > > > ># crm_mon -rfA > >Last updated: Tue Mar 17 14:14:39 2015 >Last change: Tue Mar 17 14:01:43 2015 >Stack: heartbeat >Current DC: lbv2.beta.com (82ffc36f-1ad8-8686-7db0-35686465c624) - parti >tion with quorum >Version: 1.1.12-561c4cf >2 Nodes configured >8 Resources configured > >Online: [ lbv1.beta.com lbv2.beta.com ] > >Full list of resources: > > Resource Group: HAvarnish > vip_208 (ocf::heartbeat:IPaddr2): Started lbv1.beta.com > varnishd (lsb:varnish): Started lbv1.beta.com > Resource Group: grpStonith1 > Stonith1-1 (stonith:external/stonith-helper): Stopped > Stonith1-2 (stonith:external/xen0): Stopped > Resource Group: grpStonith2 > Stonith2-1 (stonith:external/stonith-helper): Stopped > Stonith2-2 (stonith:external/xen0): Stopped > Clone Set: clone_ping [ping] > Started: [ lbv1.beta.com lbv2.beta.com ] > >Node Attributes: >* Node lbv1.beta.com: > + default_ping_set : 100 >* Node lbv2.beta.com: > + default_ping_set : 100 > >Migration summary: >* Node lbv2.beta.com: > Stonith1-1: migration-threshold=1 fail-count=1000000 last-failure='Tue Mar 17 > 14:12:16 2015' >* Node lbv1.beta.com: > Stonith2-1: migration-threshold=1 fail-count=1000000 last-failure='Tue Mar 17 > 14:12:21 2015' > >Failed actions: > Stonith1-1_start_0 on lbv2.beta.com 'unknown error' (1): call=31, st >atus=Error, last-rc-change='Tue Mar 17 14:12:14 2015', queued=0ms, exec=1065ms > Stonith2-1_start_0 on lbv1.beta.com 'unknown error' (1): call=26, st >atus=Error, last-rc-change='Tue Mar 17 14:12:19 2015', queued=0ms, exec=1081ms > >その他のログを探してみました。 > >heartbeat起動時です。 > > > ># less /var/log/pm_logconv.out >Mar 17 14:11:28 lbv1.beta.com info: Starting Heartbeat 3.0.6. >Mar 17 14:11:33 lbv1.beta.com info: Link lbv2.beta.com:eth1 is up. >Mar 17 14:11:34 lbv1.beta.com info: Start "ccm" process. (pid=13264) >Mar 17 14:11:34 lbv1.beta.com info: Start "lrmd" process. (pid=13267) >Mar 17 14:11:34 lbv1.beta.com info: Start "attrd" process. (pid=13268) >Mar 17 14:11:34 lbv1.beta.com info: Start "stonithd" process. (pid=13266) >Mar 17 14:11:34 lbv1.beta.com info: Start "cib" process. (pid=13265) >Mar 17 14:11:34 lbv1.beta.com info: Start "crmd" process. (pid=13269) > > ># less /var/log/error >Mar 17 14:12:20 lbv1 crmd[13269]: error: process_lrm_event: Operation Stonith2-1_start_0 (node=lbv1.beta.com, call=26, status=4, cib-update=19, confirmed=true) Error > > >syslogからstonithをgrepしたものです > >Mar 17 14:11:34 lbv1 heartbeat: [13255]: info: Starting child client "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0) >Mar 17 14:11:34 lbv1 heartbeat: [13266]: info: Starting "/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0 gid 0 (pid 13266) >Mar 17 14:11:34 lbv1 stonithd[13266]: notice: crm_cluster_connect: Connecting to cluster infrastructure: heartbeat >Mar 17 14:11:34 lbv1 heartbeat: [13255]: info: the send queue length from heartbeat to client stonithd is set to 1024 >Mar 17 14:11:40 lbv1 stonithd[13266]: notice: setup_cib: Watching for stonith topology changes >Mar 17 14:11:40 lbv1 stonithd[13266]: notice: unpack_config: On loss of CCM Quorum: Ignore >Mar 17 14:11:40 lbv1 stonithd[13266]: warning: handle_startup_fencing: Blind faith: not fencing unseen nodes >Mar 17 14:11:40 lbv1 stonithd[13266]: warning: handle_startup_fencing: Blind faith: not fencing unseen nodes >Mar 17 14:11:41 lbv1 stonithd[13266]: notice: stonith_device_register: Added 'Stonith2-1' to the device list (1 active devices) >Mar 17 14:11:41 lbv1 stonithd[13266]: notice: stonith_device_register: Added 'Stonith2-2' to the device list (2 active devices) >Mar 17 14:12:04 lbv1 stonithd[13266]: notice: xml_patch_version_check: Versions did not change in patch 0.5.0 >Mar 17 14:12:20 lbv1 stonithd[13266]: notice: log_operation: Operation 'monitor' [13386] for device 'Stonith2-1' returned: -201 (Generic Pacemaker error) >Mar 17 14:12:20 lbv1 stonithd[13266]: warning: log_operation: Stonith2-1:13386 [ Performing: stonith -t external/stonith-helper -S ] >Mar 17 14:12:20 lbv1 stonithd[13266]: warning: log_operation: Stonith2-1:13386 [ failed to exec "stonith" ] >Mar 17 14:12:20 lbv1 stonithd[13266]: warning: log_operation: Stonith2-1:13386 [ failed: 2 ] > > > >宜しくお願いします。 > >以上 > > > > > > > > >2015年3月17日 13:32 <renay****@ybb*****>: > >福田さん >> >>お疲れ様です。山内です。 >> >>ということは、stonith-helperのstartに問題があるようですね。 >> >>stonith-helperの先頭に >> >>#!/bin/bash -x >> >> >>を入れて、クラスタを起動すると何かわかるかも知れません。 >> >>ちなみに、stonith-helperのログもどこかに出ていると思うのですが。。。 >> >> >> >>以上です。 >> >>----- Original Message ----- >>>From: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****> >>>To: 山内英生 <renay****@ybb*****>; "linux****@lists*****" <linux****@lists*****> >> >>>Date: 2015/3/17, Tue 12:31 >>>Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて >>> >>> >>>山内さん >>>cc:松島さん >>> >>>こんにちは、福田です。 >>> >>>同じディレクトリにxen0はありました。 >>> >>># pwd >>>/usr/local/heartbeat/lib/stonith/plugins/external >>> >>># ls >>>drac5 ibmrsa kdumpcheck riloe vmware >>>dracmc-telnet ibmrsa-telnet libvirt ssh xen0 >>>hetzner ipmi nut stonith-helper xen0-ha >>>hmchttp ippower9258 rackpdu vcenter >>> >>>宜しくお願いします。 >>> >>>以上 >>> >>> >>> >>>2015-03-17 10:53 GMT+09:00 <renay****@ybb*****>: >>> >>>福田さん >>>>cc:松島さん >>>> >>>>お疲れ様です。山内です。 >>>> >>>>>標準出力や標準エラー出力はありませんでした。 >>>>> >>>>>stonith-helperがおかしいのでしょうか。 >>>>>stonith-helperはシェルスクリプトなのでインストールはあまり気にしていなかったのですが。 >>>>>stonith-helperはここに配置されています。 >>>>>/usr/local/heartbeat/lib/stonith/plugins/external/stonith-helper >>>> >>>>このディレクトリにxen0もありますか? >>>>無いようでしたら、問題がありますので、一度、stonith-helperのファイルを属性などはそのまま、xen0と同じディレクトリに >>>>コピーしてみてください。 >>>> >>>>それで稼働するなら、pm_extrasのインストールに問題があるということになります。 >>>> >>>>以上です。 >>>> >>>>----- Original Message ----- >>>>>From: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****> >>>>>To: 山内英生 <renay****@ybb*****>; "linux****@lists*****" <linux****@lists*****> >>>> >>>>>Date: 2015/3/17, Tue 10:31 >>>>>Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて >>>>> >>>>> >>>>>山内さん >>>>>cc:松島さん >>>>> >>>>>おはようございます、福田です。 >>>>>crmの例をありがとうございます。 >>>>> >>>>>早速、こちらの環境に合わせてみました。 >>>>> >>>>>$ cat test.crm >>>>>### Cluster Option ### >>>>>property \ >>>>> no-quorum-policy="ignore" \ >>>>> stonith-enabled="true" \ >>>>> startup-fencing="false" \ >>>>> stonith-timeout="710s" \ >>>>> crmd-transition-delay="2s" >>>>> >>>>>### Resource Default ### >>>>>rsc_defaults \ >>>>> resource-stickiness="INFINITY" \ >>>>> migration-threshold="1" >>>>> >>>>>### Group Configuration ### >>>>>group HAvarnish \ >>>>> vip_208 \ >>>>> varnishd >>>>> >>>>>group grpStonith1 \ >>>>> Stonith1-1 \ >>>>> Stonith1-2 >>>>> >>>>>group grpStonith2 \ >>>>> Stonith2-1 \ >>>>> Stonith2-2 >>>>> >>>>>### Clone Configuration ### >>>>>clone clone_ping \ >>>>> ping >>>>> >>>>>### Fencing Topology ### >>>>>fencing_topology \ >>>>> lbv1.beta.com: Stonith1-1 Stonith1-2 \ >>>>> lbv2.beta.com: Stonith2-1 Stonith2-2 >>>>> >>>>>### Primitive Configuration ### >>>>>primitive vip_208 ocf:heartbeat:IPaddr2 \ >>>>> params \ >>>>> ip="192.168.17.208" \ >>>>> nic="eth0" \ >>>>> cidr_netmask="24" \ >>>>> op start interval="0s" timeout="90s" on-fail="restart" \ >>>>> op monitor interval="5s" timeout="60s" on-fail="restart" \ >>>>> op stop interval="0s" timeout="100s" on-fail="fence" >>>>> >>>>>primitive varnishd lsb:varnish \ >>>>> op start interval="0s" timeout="90s" on-fail="restart" \ >>>>> op monitor interval="10s" timeout="60s" on-fail="restart" \ >>>>> op stop interval="0s" timeout="100s" on-fail="fence" >>>>> >>>>>primitive ping ocf:pacemaker:ping \ >>>>> params \ >>>>> name="default_ping_set" \ >>>>> host_list="192.168.17.254" \ >>>>> multiplier="100" \ >>>>> dampen="1" \ >>>>> op start interval="0s" timeout="90s" on-fail="restart" \ >>>>> op monitor interval="10s" timeout="60s" on-fail="restart" \ >>>>> op stop interval="0s" timeout="100s" on-fail="fence" >>>>> >>>>>primitive Stonith1-1 stonith:external/stonith-helper \ >>>>> params \ >>>>> pcmk_reboot_retries="1" \ >>>>> pcmk_reboot_timeout="40s" \ >>>>> hostlist="lbv1.beta.com" \ >>>>> dead_check_target="192.168.17.132 10.0.17.132" \ >>>>> standby_check_command="/usr/local/sbin/crm_resource -r varnishd -W | grep -q `hostname`" \ >>>>> run_online_check="yes" \ >>>>> op start interval="0s" timeout="60s" on-fail="restart" \ >>>>> op stop interval="0s" timeout="60s" on-fail="ignore" >>>>> >>>>>primitive Stonith1-2 stonith:external/xen0 \ >>>>> params \ >>>>> pcmk_reboot_timeout="60s" \ >>>>> hostlist="lbv1.beta.com:/etc/xen/lbv1.cfg" \ >>>>> dom0="xen0.beta.com" \ >>>>> op start interval="0s" timeout="60s" on-fail="restart" \ >>>>> op monitor interval="3600s" timeout="60s" on-fail="restart" \ >>>>> op stop interval="0s" timeout="60s" on-fail="ignore" >>>>> >>>>>primitive Stonith2-1 stonith:external/stonith-helper \ >>>>> params \ >>>>> pcmk_reboot_retries="1" \ >>>>> pcmk_reboot_timeout="40s" \ >>>>> hostlist="lbv2.beta.com" \ >>>>> dead_check_target="192.168.17.133 10.0.17.133" \ >>>>> standby_check_command="/usr/local/sbin/crm_resource -r varnishd -W | grep -q `hostname`" \ >>>>> run_online_check="yes" \ >>>>> op start interval="0s" timeout="60s" on-fail="restart" \ >>>>> op stop interval="0s" timeout="60s" on-fail="ignore" >>>>> >>>>>primitive Stonith2-2 stonith:external/xen0 \ >>>>> params \ >>>>> pcmk_reboot_timeout="60s" \ >>>>> hostlist="lbv2.beta.com:/etc/xen/lbv2.cfg" \ >>>>> dom0="xen0.beta.com" \ >>>>> op start interval="0s" timeout="60s" on-fail="restart" \ >>>>> op monitor interval="3600s" timeout="60s" on-fail="restart" \ >>>>> op stop interval="0s" timeout="60s" on-fail="ignore" >>>>> >>>>>### Resource Location ### >>>>>location HA_location-1 HAvarnish \ >>>>> rule 200: #uname eq lbv1.beta.com \ >>>>> rule 100: #uname eq lbv2.beta.com >>>>> >>>>>location HA_location-2 HAvarnish \ >>>>> rule -INFINITY: not_defined default_ping_set or default_ping_set lt 100 >>>>> >>>>>location HA_location-3 grpStonith1 \ >>>>> rule -INFINITY: #uname eq lbv1.beta.com >>>>> >>>>>location HA_location-4 grpStonith2 \ >>>>> rule -INFINITY: #uname eq lbv2.beta.com >>>>> >>>>> >>>>>これを流しこんだところ、昨日とはメッセージが異なります。 >>>>>pingのメッセージはなくなっていました。 >>>>> >>>>># crm_mon -rfA >>>>>Last updated: Tue Mar 17 10:21:28 2015 >>>>>Last change: Tue Mar 17 10:21:09 2015 >>>>>Stack: heartbeat >>>>>Current DC: lbv2.beta.com (82ffc36f-1ad8-8686-7db0-35686465c624) - parti >>>>>tion with quorum >>>>>Version: 1.1.12-561c4cf >>>>>2 Nodes configured >>>>>8 Resources configured >>>>> >>>>> >>>>>Online: [ lbv1.beta.com lbv2.beta.com ] >>>>> >>>>>Full list of resources: >>>>> >>>>> Resource Group: HAvarnish >>>>> vip_208 (ocf::heartbeat:IPaddr2): Started lbv1.beta.com >>>>> varnishd (lsb:varnish): Started lbv1.beta.com >>>>> Resource Group: grpStonith1 >>>>> Stonith1-1 (stonith:external/stonith-helper): Stopped >>>>> Stonith1-2 (stonith:external/xen0): Stopped >>>>> Resource Group: grpStonith2 >>>>> Stonith2-1 (stonith:external/stonith-helper): Stopped >>>>> Stonith2-2 (stonith:external/xen0): Stopped >>>>> Clone Set: clone_ping [ping] >>>>> Started: [ lbv1.beta.com lbv2.beta.com ] >>>>> >>>>>Node Attributes: >>>>>* Node lbv1.beta.com: >>>>> + default_ping_set : 100 >>>>>* Node lbv2.beta.com: >>>>> + default_ping_set : 100 >>>>> >>>>>Migration summary: >>>>>* Node lbv2.beta.com: >>>>> Stonith1-1: migration-threshold=1 fail-count=1000000 last-failure='Tue Mar 17 >>>>> 10:21:17 2015' >>>>>* Node lbv1.beta.com: >>>>> Stonith2-1: migration-threshold=1 fail-count=1000000 last-failure='Tue Mar 17 >>>>> 10:21:17 2015' >>>>> >>>>>Failed actions: >>>>> Stonith1-1_start_0 on lbv2.beta.com 'unknown error' (1): call=31, st >>>>>atus=Error, last-rc-change='Tue Mar 17 10:21:15 2015', queued=0ms, exec=1082ms >>>>> Stonith2-1_start_0 on lbv1.beta.com 'unknown error' (1): call=31, st >>>>>atus=Error, last-rc-change='Tue Mar 17 10:21:16 2015', queued=0ms, exec=1079ms >>>>> >>>>> >>>>>/var/log/ha-debugのログです。 >>>>> >>>>>IPaddr2(vip_208)[7851]: 2015/03/17_10:21:22 INFO: Adding inet address 192.168.17.208/24 with broadcast address 192.168.17.255 to device eth0 >>>>>IPaddr2(vip_208)[7851]: 2015/03/17_10:21:22 INFO: Bringing device eth0 up >>>>>IPaddr2(vip_208)[7851]: 2015/03/17_10:21:22 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.17.208 eth0 192.168.17.208 auto not_used not_used >>>>> >>>>>標準出力や標準エラー出力はありませんでした。 >>>>> >>>>>stonith-helperがおかしいのでしょうか。 >>>>>stonith-helperはシェルスクリプトなのでインストールはあまり気にしていなかったのですが。 >>>>>stonith-helperはここに配置されています。 >>>>>/usr/local/heartbeat/lib/stonith/plugins/external/stonith-helper >>>>> >>>>> >>>>> >>>>>宜しくお願いします。 >>>>> >>>>>以上 >>>>> >>>>> >>>>> >>>>>2015-03-17 9:45 GMT+09:00 <renay****@ybb*****>: >>>>> >>>>>福田さん >>>>>> >>>>>>おはようございます。山内です。 >>>>>> >>>>>>念の為、手元にある複数のstonithを利用した場合の例を抜粋してお送りします。 >>>>>>(実際には、改行に気を付けてください) >>>>>> >>>>>>以下の例は、PM1.1系での設定で、 >>>>>>nodeaは、prmStonith1-1、 prmStonith1-2の順でstonithが実行されます。 >>>>>>nodebは、prmStonith2-1、 prmStonith2-2の順でstonithが実行されます。 >>>>>> >>>>>>stonith自体は、helperとsshです。 >>>>>> >>>>>> >>>>>>(snip) >>>>>>### Group Configuration ### >>>>>>group grpStonith1 \ >>>>>>prmStonith1-1 \ >>>>>>prmStonith1-2 >>>>>> >>>>>>group grpStonith2 \ >>>>>>prmStonith2-1 \ >>>>>>prmStonith2-2 >>>>>> >>>>>>### Fencing Topology ### >>>>>>fencing_topology \ >>>>>>nodea: prmStonith1-1 prmStonith1-2 \ >>>>>>nodeb: prmStonith2-1 prmStonith2-2 >>>>>>(snp) >>>>>>primitive prmStonith1-1 stonith:external/stonith-helper \ >>>>>>params \ >>>>>> >>>>>>pcmk_reboot_retries="1" \ >>>>>>pcmk_reboot_timeout="40s" \ >>>>>>hostlist="nodea" \ >>>>>>dead_check_target="192.168.28.60 192.168.28.70" \ >>>>>>standby_check_command="/usr/sbin/crm_resource -r prmRES -W | grep -qi `hostname`" \ >>>>>>run_online_check="yes" \ >>>>>>op start interval="0s" timeout="60s" on-fail="restart" \ >>>>>>op stop interval="0s" timeout="60s" on-fail="ignore" >>>>>> >>>>>>primitive prmStonith1-2 stonith:external/ssh \ >>>>>>params \ >>>>>>pcmk_reboot_timeout="60s" \ >>>>>>hostlist="nodea" \ >>>>>>op start interval="0s" timeout="60s" on-fail="restart" \ >>>>>>op monitor interval="3600s" timeout="60s" on-fail="restart" \ >>>>>>op stop interval="0s" timeout="60s" on-fail="ignore" >>>>>> >>>>>>primitive prmStonith2-1 stonith:external/stonith-helper \ >>>>>>params \ >>>>>>pcmk_reboot_retries="1" \ >>>>>>pcmk_reboot_timeout="40s" \ >>>>>>hostlist="nodeb" \ >>>>>>dead_check_target="192.168.28.61 192.168.28.71" \ >>>>>>standby_check_command="/usr/sbin/crm_resource -r prmRES -W | grep -qi `hostname`" \ >>>>>>run_online_check="yes" \ >>>>>>op start interval="0s" timeout="60s" on-fail="restart" \ >>>>>>op stop interval="0s" timeout="60s" on-fail="ignore" >>>>>> >>>>>>primitive prmStonith2-2 stonith:external/ssh \ >>>>>>params \ >>>>>>pcmk_reboot_timeout="60s" \ >>>>>>hostlist="nodeb" \ >>>>>>op start interval="0s" timeout="60s" on-fail="restart" \ >>>>>>op monitor interval="3600s" timeout="60s" on-fail="restart" \ >>>>>>op stop interval="0s" timeout="60s" on-fail="ignore" >>>>>>(snip) >>>>>>location rsc_location-grpStonith1-2 grpStonith1 \ >>>>>>rule -INFINITY: #uname eq nodea >>>>>>location rsc_location-grpStonith2-3 grpStonith2 \ >>>>>>rule -INFINITY: #uname eq nodeb >>>>>> >>>>>> >>>>>>以上です。 >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>>-- >>>>> >>>>>ELF Systems >>>>>Masamichi Fukuda >>>>>mail to: masamichi_fukud****@elf-s***** >>>>> >>>>> >>>> >>>> >>>>_______________________________________________ >>>>Linux-ha-japan mailing list >>>>Linux****@lists***** >>>>http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan >>>> >>> >>> >>>-- >>> >>>ELF Systems >>>Masamichi Fukuda >>>mail to: masamichi_fukud****@elf-s***** >>> >>> >> >>_______________________________________________ >>Linux-ha-japan mailing list >>Linux****@lists***** >>http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan >> > > >-- > >ELF Systems >Masamichi Fukuda >mail to: masamichi_fukud****@elf-s***** > >