[Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて

Zurück zum Archiv-Index

renay****@ybb***** renay****@ybb*****
2015年 3月 17日 (火) 22:38:22 JST


福田さん

こんばんは、山内です。

ちなみに可能であれば、external/stonith-helperを外して、external/xen0だけにした場合に
どうなるか?を確認すると、問題の切り分けになるかもしれません。

以上です。



----- Original Message -----
> From: "renay****@ybb*****" <renay****@ybb*****>
> To: "linux****@lists*****" <linux****@lists*****>
> Cc: 
> Date: 2015/3/17, Tue 22:28
> Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて
> 
> 福田さん
> 
> こんばんは、山内です。
> 
> 変わらないようですね。。。
> 
> とりあえず、明日くらいに、RHEL上ですが、
> 
> Heartbeat3.0.6
> Pacemakerの最新
> 
> 組み合わせで、同じような設定(リソースはDummy、external/xen0はexternal/sshになりますが)stonith-helperが動くかどうかを確認してみます。
> 
> #stonith-helperの-x指定の出力が確認出来ると、もう少し問題が絞りやすいのですが・・・
> 
> 
> 以上です。
> 
> 
> 
> ----- Original Message -----
>> From: Masamichi Fukuda - elf-systems 
> <masamichi_fukud****@elf-s*****>
>> To: 山内英生 <renay****@ybb*****>; 
> "linux****@lists*****" 
> <linux****@lists*****> 
>> Date: 2015/3/17, Tue 21:24
>> Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて
>> 
>> 
>> 山内さん
>> 
>> こんばんは、福田です。
>> 最新版の情報をありがとうございました。
>> 
>> 早速インストールしてみました。
>> 
>> 起動後の状態です。
>> 
>> failed actionsは変わりないようです。
>> 
>> 
>> 
>> # crm_mon -rfA
>> Last updated: Tue Mar 17 21:03:49 2015
>> Last change: Tue Mar 17 20:30:58 2015
>> Stack: heartbeat
>> Current DC: lbv1.beta.com (38b0f200-83ea-8633-6f37-047d36cd39c6) - parti
>> tion with quorum
>> Version: 1.1.12-e32080b
>> 2 Nodes configured
>> 8 Resources configured
>> 
>> 
>> Online: [ lbv1.beta.com lbv2.beta.com ]
>> 
>> Full list of resources:
>> 
>>  Resource Group: HAvarnish
>>      vip_208    (ocf::heartbeat:IPaddr2):       Started lbv1.beta.com
>>      varnishd   (lsb:varnish):  Started lbv1.beta.com
>>  Resource Group: grpStonith1
>>      Stonith1-1 (stonith:external/stonith-helper):      Stopped
>>      Stonith1-2 (stonith:external/xen0):        Stopped
>>  Resource Group: grpStonith2
>>      Stonith2-1 (stonith:external/stonith-helper):      Stopped
>>      Stonith2-2 (stonith:external/xen0):        Stopped
>>  Clone Set: clone_ping [ping]
>>      Started: [ lbv1.beta.com lbv2.beta.com ]
>> 
>> Node Attributes:
>> * Node lbv1.beta.com:
>>     + default_ping_set                  : 100
>> * Node lbv2.beta.com:
>>     + default_ping_set                  : 100
>> 
>> Migration summary:
>> * Node lbv1.beta.com: 
>>    Stonith2-1: migration-threshold=1 fail-count=1000000 
> last-failure='Tue Mar 17
>>  21:03:39 2015'
>> * Node lbv2.beta.com: 
>>    Stonith1-1: migration-threshold=1 fail-count=1000000 
> last-failure='Tue Mar 17
>>  21:03:32 2015'
>> 
>> Failed actions:
>>     Stonith2-1_start_0 on lbv1.beta.com 'unknown error' (1): 
> call=31, st
>> atus=Error, exit-reason='none', last-rc-change='Tue Mar 17 
> 21:03:37 2015', queue
>> d=0ms, exec=1085ms
>>     Stonith1-1_start_0 on lbv2.beta.com 'unknown error' (1): 
> call=18, st
>> atus=Error, exit-reason='none', last-rc-change='Tue Mar 17 
> 21:03:30 2015', queue
>> d=0ms, exec=1061ms
>> 
>> 
>> 
>> 
>> ログです。
>> 
>> 
>> # less /var/log/ha-debug
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: info: Pacemaker support: 
> yes
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: File 
> /etc/ha.d//haresources exists.
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: This file is not used 
> because pacemaker is enabled
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking access of: 
> /usr/local/heartbeat/libexec/heartbeat/ccm
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking access of: 
> /usr/local/heartbeat/libexec/pacemaker/cib
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking access of: 
> /usr/local/heartbeat/libexec/pacemaker/stonithd
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking access of: 
> /usr/local/heartbeat/libexec/pacemaker/lrmd
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking access of: 
> /usr/local/heartbeat/libexec/pacemaker/attrd
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking access of: 
> /usr/local/heartbeat/libexec/pacemaker/crmd
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: Core dumps could be 
> lost if multiple dumps occur.
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: Consider setting 
> non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum 
> supportability
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: Consider setting 
> /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: Logging daemon is 
> disabled --enabling logging daemon is recommended
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: info: 
> **************************
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: info: Configuration 
> validated. Starting heartbeat 3.0.6
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: heartbeat: version 
> 3.0.6
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: Heartbeat generation: 
> 1423534116
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: seed is -1702799346
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast: write 
> socket priority set to IPTOS_LOWDELAY on eth1
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast: bound 
> send socket to device: eth1
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast: set 
> SO_REUSEADDR
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast: bound 
> receive socket to device: eth1
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast: started 
> on port 694 interface eth1 to 10.0.17.133
>> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: Local status now set 
> to: 'up'
>> Mar 17 21:02:46 lbv1.beta.com heartbeat: [4236]: info: Link 
> lbv2.beta.com:eth1 up.
>> Mar 17 21:02:46 lbv1.beta.com heartbeat: [4236]: info: Status update for 
> node lbv2.beta.com: status up
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Comm_now_up(): 
> updating status to active
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Local status now set 
> to: 'active'
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child client 
> "/usr/local/heartbeat/libexec/heartbeat/ccm" (109,113)
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child client 
> "/usr/local/heartbeat/libexec/pacemaker/cib" (109,113)
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child client 
> "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0)
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child client 
> "/usr/local/heartbeat/libexec/pacemaker/lrmd" (0,0)
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child client 
> "/usr/local/heartbeat/libexec/pacemaker/attrd" (109,113)
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child client 
> "/usr/local/heartbeat/libexec/pacemaker/crmd" (109,113)
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: debug: get_delnodelist: 
> delnodelist= 
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4250]: info: Starting 
> "/usr/local/heartbeat/libexec/pacemaker/crmd" as uid 109  gid 113 (pid 
> 4250)
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4246]: info: Starting 
> "/usr/local/heartbeat/libexec/pacemaker/cib" as uid 109  gid 113 (pid 
> 4246)
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4249]: info: Starting 
> "/usr/local/heartbeat/libexec/pacemaker/attrd" as uid 109  gid 113 
> (pid 4249)
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4245]: info: Starting 
> "/usr/local/heartbeat/libexec/heartbeat/ccm" as uid 109  gid 113 (pid 
> 4245)
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4248]: info: Starting 
> "/usr/local/heartbeat/libexec/pacemaker/lrmd" as uid 0  gid 0 (pid 
> 4248)
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4247]: info: Starting 
> "/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0  gid 0 (pid 
> 4247)
>> Mar 17 21:02:47 lbv1.beta.com ccm: [4245]: info: Hostname: lbv1.beta.com
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: the send queue length 
> from heartbeat to client ccm is set to 1024
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: the send queue length 
> from heartbeat to client attrd is set to 1024
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: the send queue length 
> from heartbeat to client stonith-ng is set to 1024
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Status update for 
> node lbv2.beta.com: status active
>> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: the send queue length 
> from heartbeat to client cib is set to 1024
>> Mar 17 21:02:51 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost packet(s) for 
> [lbv2.beta.com] [15:17]
>> Mar 17 21:02:51 lbv1.beta.com heartbeat: [4236]: info: No pkts missing from 
> lbv2.beta.com!
>> Mar 17 21:02:52 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost packet(s) for 
> [lbv2.beta.com] [19:21]
>> Mar 17 21:02:52 lbv1.beta.com heartbeat: [4236]: info: No pkts missing from 
> lbv2.beta.com!
>> Mar 17 21:02:52 lbv1.beta.com heartbeat: [4236]: info: the send queue length 
> from heartbeat to client crmd is set to 1024
>> Mar 17 21:02:53 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost packet(s) for 
> [lbv2.beta.com] [24:26]
>> Mar 17 21:02:53 lbv1.beta.com heartbeat: [4236]: info: No pkts missing from 
> lbv2.beta.com!
>> Mar 17 21:02:54 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost packet(s) for 
> [lbv2.beta.com] [26:28]
>> Mar 17 21:02:54 lbv1.beta.com heartbeat: [4236]: info: No pkts missing from 
> lbv2.beta.com!
>> Mar 17 21:02:54 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost packet(s) for 
> [lbv2.beta.com] [30:32]
>> Mar 17 21:02:54 lbv1.beta.com heartbeat: [4236]: info: No pkts missing from 
> lbv2.beta.com!
>> 
>> 
>> 
>> # less /var/log/error
>> 
>> Mar 17 21:02:47 lbv1 attrd[4249]:    error: ha_msg_dispatch: Ignored 
> incoming message. Please set_msg_callback on hbclstat
>> Mar 17 21:02:48 lbv1 attrd[4249]:    error: ha_msg_dispatch: Ignored 
> incoming message. Please set_msg_callback on hbclstat
>> Mar 17 21:02:53 lbv1 stonith-ng[4247]:    error: ha_msg_dispatch: Ignored 
> incoming message. Please set_msg_callback on hbclstat
>> Mar 17 21:02:53 lbv1 stonith-ng[4247]:    error: ha_msg_dispatch: Ignored 
> incoming message. Please set_msg_callback on hbclstat
>> Mar 17 21:03:39 lbv1 crmd[4250]:    error: process_lrm_event: Operation 
> Stonith2-1_start_0 (node=lbv1.beta.com, call=31, status=4, cib-update=42, 
> confirmed=true) Error
>> 
>> # cat syslog|egrep 'Mar 17 21:03|Mar 17 21:02' |egrep 
> 'heartbeat|stonith|pacemaker|error'
>> Mar 17 21:03:24 lbv1 pengine[4253]:   notice: process_pe_message: Calculated 
> Transition 0: /var/lib/pacemaker/pengine/pe-input-115.bz2
>> Mar 17 21:03:27 lbv1 crmd[4250]:   notice: run_graph: Transition 0 
> (Complete=15, Pending=0, Fired=0, Skipped=16, Incomplete=2, 
> Source=/var/lib/pacemaker/pengine/pe-input-115.bz2): Stopped
>> Mar 17 21:03:29 lbv1 pengine[4253]:   notice: process_pe_message: Calculated 
> Transition 1: /var/lib/pacemaker/pengine/pe-input-116.bz2
>> Mar 17 21:03:34 lbv1 crmd[4250]:   notice: run_graph: Transition 1 
> (Complete=8, Pending=0, Fired=0, Skipped=12, Incomplete=1, 
> Source=/var/lib/pacemaker/pengine/pe-input-116.bz2): Stopped
>> Mar 17 21:03:37 lbv1 pengine[4253]:  warning: unpack_rsc_op_failure: 
> Processing failed op start for Stonith1-1 on lbv2.beta.com: unknown error (1)
>> Mar 17 21:03:37 lbv1 pengine[4253]:  warning: unpack_rsc_op_failure: 
> Processing failed op start for Stonith1-1 on lbv2.beta.com: unknown error (1)
>> Mar 17 21:03:37 lbv1 pengine[4253]:   notice: process_pe_message: Calculated 
> Transition 2: /var/lib/pacemaker/pengine/pe-input-117.bz2
>> Mar 17 21:03:39 lbv1 stonith-ng[4247]:   notice: log_operation: Operation 
> 'monitor' [4377] for device 'Stonith2-1' returned: -201 (Generic 
> Pacemaker error)
>> Mar 17 21:03:39 lbv1 stonith-ng[4247]:  warning: log_operation: 
> Stonith2-1:4377 [ Performing: stonith -t external/stonith-helper -S ]
>> Mar 17 21:03:39 lbv1 stonith-ng[4247]:  warning: log_operation: 
> Stonith2-1:4377 [ failed to exec "stonith" ]
>> Mar 17 21:03:39 lbv1 stonith-ng[4247]:  warning: log_operation: 
> Stonith2-1:4377 [ failed:  2 ]
>> Mar 17 21:03:39 lbv1 crmd[4250]:    error: process_lrm_event: Operation 
> Stonith2-1_start_0 (node=lbv1.beta.com, call=31, status=4, cib-update=42, 
> confirmed=true) Error
>> Mar 17 21:03:40 lbv1 crmd[4250]:   notice: run_graph: Transition 2 
> (Complete=12, Pending=0, Fired=0, Skipped=3, Incomplete=0, 
> Source=/var/lib/pacemaker/pengine/pe-input-117.bz2): Stopped
>> Mar 17 21:03:42 lbv1 pengine[4253]:  warning: unpack_rsc_op_failure: 
> Processing failed op start for Stonith2-1 on lbv1.beta.com: unknown error (1)
>> Mar 17 21:03:42 lbv1 pengine[4253]:  warning: unpack_rsc_op_failure: 
> Processing failed op start for Stonith2-1 on lbv1.beta.com: unknown error (1)
>> Mar 17 21:03:42 lbv1 pengine[4253]:  warning: unpack_rsc_op_failure: 
> Processing failed op start for Stonith1-1 on lbv2.beta.com: unknown error (1)
>> Mar 17 21:03:42 lbv1 pengine[4253]:   notice: process_pe_message: Calculated 
> Transition 3: /var/lib/pacemaker/pengine/pe-input-118.bz2
>> Mar 17 21:03:42 lbv1 IPaddr2(vip_208)[4448]: INFO: 
> /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p 
> /var/run/resource-agents/send_arp-192.168.17.208 eth0 192.168.17.208 auto 
> not_used not_used
>> Mar 17 21:03:47 lbv1 crmd[4250]:   notice: run_graph: Transition 3 
> (Complete=10, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pacemaker/pengine/pe-input-118.bz2): Complete
>> 
>> 宜しくお願いします。
>> 
>> 以上
>> 
>> 
>> 
>> 2015年3月17日 18:31 <renay****@ybb*****>:
>> 
>> 福田さん
>>> 
>>> こんばんは、山内です。
>>> 
>>> tag付けされていないので、本日の最新版は、
>>> 
>>>  * 
> https://github.com/ClusterLabs/pacemaker/tree/e32080b460f81486b85d08ec958582b3e72d858c
>>> 
>>> 
>>> になります。
>>> 右側の[Download ZIP]からダウンロード出来ます。
>>> 
>>> 以上です。
>>> 
>>> 
>>> ----- Original Message -----
>>>> From: Masamichi Fukuda - elf-systems 
> <masamichi_fukud****@elf-s*****>
>>> 
>>>> To: "renay****@ybb*****" 
> <renay****@ybb*****>; 
> "linux****@lists*****" 
> <linux****@lists*****>
>>>> Date: 2015/3/17, Tue 18:07
>>>> Subject: スプリットブレイン時のSTONITHエラーについて
>>>> 
>>>> 
>>>> 山内さん
>>>> 
>>>> 
>>>> お疲れ様です、福田です。
>>>> 
>>>> 
>>>> こちらを見たのですが、
>>>> https://github.com/ClusterLabs/pacemaker/tags
>>>> 
>>>> 
>>>> 
>>>> pacemaker 1.1.12 561c4cf が最新のようなのですが。
>>>> 済みませんが、これ以降の最新版はどちらにあるか教えて頂けますか。
>>>> 
>>>> 
>>>> 宜しくお願いします。
>>>> 
>>>> 
>>>> 以上
>>>> 
>>>> 
>>>> 
>>>> 2015年3月17日火曜日、<renay****@ybb*****>さんは書きました:
>>>> 
>>>> 福田さん
>>>>> 
>>>>> お疲れ様です。山内です。
>>>>> 
>>>>> はい。古いです。
>>>>> 
>>>>> PacemakerがHeartbeat3.0.6に対応したのは意外と最近です。
>>>>> もっと新しいものを入れてください。(また、ソースから構築する必要がありますが・・・・)
>>>>> 
>>>>> 
>>>>> 
>>>>> 本家のgithubから入手可能です。
>>>>>  * https://github.com/ClusterLabs/pacemaker
>>>>> 
>>>>> 
>>>>> 場合によっては、最新のmasterはエラーなどが出る場合がありますので、その場合は、バージョンを古い方にたぐって
>>>>> いくのが良いと思います。
>>>>> 
>>>>> 以上です。
>>>>> 
>>>>> 
>>>>> 
>>>>> ----- Original Message -----
>>>>>> From: Masamichi Fukuda - elf-systems 
> <masamichi_fukud****@elf-s*****>
>>>>>> To: 山内英生 <renay****@ybb*****>; 
> "linux****@lists*****" 
> <linux****@lists*****>
>>>>>> Date: 2015/3/17, Tue 16:06
>>>>>> Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて
>>>>>> 
>>>>>> 
>>>>>> 山内さん
>>>>>> 
>>>>>> お疲れ様です、福田です。
>>>>>> 
>>>>>> 以前のメールでheartbeatとpacemakerを最新版を入れたほうが良いと回答頂きました。
>>>>>> そこで今回、heartbeat3.0.6とpacemaker1.1.12を入れたのですが。
>>>>>> 
>>>>>> heartbeat configuration: Version = "3.0.6"
>>>>>> pacemaker configuration: Version = 1.1.12 (Build: 
> 561c4cf)pacemakerがまだ古いということでしょうか。
>>>>>> 
>>>>>> 済みませんが、宜しくお願いします。
>>>>>> 
>>>>>> 以上
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 2015年3月17日 14:59 <renay****@ybb*****>:
>>>>>> 
>>>>>> 福田さん
>>>>>>> 
>>>>>>> お疲れ様です。山内です。
>>>>>>> 
>>>>>>> ふと思ったのすが、以前のやり取りのメールで以下と回答してますが、問題ないでしょうか?
>>>>>>> 
>>>>>>> 
>>>>>>>>>>>>>  2)Heartbeat3.0.6+Pacemaker最新 : 
> OK
>>>>>>>>>>>>>    
>>>>>>>>>>>>> 
> どうやら、Heartbeatも最新版3.0.6を組合せる必要があるようです。
>>>>>>>>>>>>> 
>  * http://hg.linux-ha.org/heartbeat-STABLE_3_0/rev/cceeb47a7d8f
>>>>>>> 
>>>>>>> 以下のcrm_monのバージョンを見ると、1.1.12のようです。
>>>>>>> Heartbeat3.0.6と組み合わせるには、かなり新しめのPacemakerが必要です。
>>>>>>> 
>>>>>>>> # crm_mon -rfA
>>>>>>>> 
>>>>>>>> Last updated: Tue Mar 17 14:14:39 2015
>>>>>>>> Last change: Tue Mar 17 14:01:43 2015
>>>>>>>> Stack: heartbeat
>>>>>>>> Current DC: lbv2.beta.com 
> (82ffc36f-1ad8-8686-7db0-35686465c624) - parti
>>>>>>>> tion with quorum
>>>>>>>> Version: 1.1.12-561c4cf
>>>>>>> 
>>>>>>> たぶん、以下の変更以降は少なくとも必要かと思います。
>>>>>>> 
>>>>>>> https://github.com/ClusterLabs/pacemaker/commit/f2302da063d08719d28367d8e362b8bfb0f85bf3
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 以上です。
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ----- Original Message -----
>>>>>>>> From: Masamichi Fukuda - elf-systems 
> <masamichi_fukud****@elf-s*****>
>>>>>>>> To: 山内英生 <renay****@ybb*****>; 
> "linux****@lists*****" 
> <linux****@lists*****>
>>>>>>> 
>>>>>>>> Date: 2015/3/17, Tue 14:38
>>>>>>>> Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 山内さん
>>>>>>>> 
>>>>>>>> お疲れ様です、福田です。
>>>>>>>> 
>>>>>>>> stonith-helperのシェバング行に-xを追加すれば良いのでしょうか?
>>>>>>>> stonith-helperの先頭行を#!/bin/bash -xにしてクラスタを起動してみました。
>>>>>>>> 
>>>>>>>> crm_monでは先ほどと変わりはないようです。
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> # crm_mon -rfA
>>>>>>>> 
>>>>>>>> Last updated: Tue Mar 17 14:14:39 2015
>>>>>>>> Last change: Tue Mar 17 14:01:43 2015
>>>>>>>> Stack: heartbeat
>>>>>>>> Current DC: lbv2.beta.com 
> (82ffc36f-1ad8-8686-7db0-35686465c624) - parti
>>>>>>>> tion with quorum
>>>>>>>> Version: 1.1.12-561c4cf
>>>>>>>> 2 Nodes configured
>>>>>>>> 8 Resources configured
>>>>>>>> 
>>>>>>>> Online: [ lbv1.beta.com lbv2.beta.com ]
>>>>>>>> 
>>>>>>>> Full list of resources:
>>>>>>>> 
>>>>>>>>  Resource Group: HAvarnish
>>>>>>>>      vip_208    (ocf::heartbeat:IPaddr2):       
> Started lbv1.beta.com
>>>>>>>>      varnishd   (lsb:varnish):  Started 
> lbv1.beta.com
>>>>>>>>  Resource Group: grpStonith1
>>>>>>>>      Stonith1-1 
> (stonith:external/stonith-helper):      Stopped
>>>>>>>>      Stonith1-2 (stonith:external/xen0):        
> Stopped
>>>>>>>>  Resource Group: grpStonith2
>>>>>>>>      Stonith2-1 
> (stonith:external/stonith-helper):      Stopped
>>>>>>>>      Stonith2-2 (stonith:external/xen0):        
> Stopped
>>>>>>>>  Clone Set: clone_ping [ping]
>>>>>>>>      Started: [ lbv1.beta.com lbv2.beta.com ]
>>>>>>>> 
>>>>>>>> Node Attributes:
>>>>>>>> * Node lbv1.beta.com:
>>>>>>>>     + default_ping_set                  : 100
>>>>>>>> * Node lbv2.beta.com:
>>>>>>>>     + default_ping_set                  : 100
>>>>>>>> 
>>>>>>>> Migration summary:
>>>>>>>> * Node lbv2.beta.com:
>>>>>>>>    Stonith1-1: migration-threshold=1 
> fail-count=1000000 last-failure='Tue Mar 17
>>>>>>>>  14:12:16 2015'
>>>>>>>> * Node lbv1.beta.com:
>>>>>>>>    Stonith2-1: migration-threshold=1 
> fail-count=1000000 last-failure='Tue Mar 17
>>>>>>>>  14:12:21 2015'
>>>>>>>> 
>>>>>>>> Failed actions:
>>>>>>>>     Stonith1-1_start_0 on lbv2.beta.com 'unknown 
> error' (1): call=31, st
>>>>>>>> atus=Error, last-rc-change='Tue Mar 17 14:12:14 
> 2015', queued=0ms, exec=1065ms
>>>>>>>>     Stonith2-1_start_0 on lbv1.beta.com 'unknown 
> error' (1): call=26, st
>>>>>>>> atus=Error, last-rc-change='Tue Mar 17 14:12:19 
> 2015', queued=0ms, exec=1081ms
>>>>>>>> 
>>>>>>>> その他のログを探してみました。
>>>>>>>> 
>>>>>>>> heartbeat起動時です。
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> # less /var/log/pm_logconv.out
>>>>>>>> Mar 17 14:11:28 lbv1.beta.com info: Starting 
> Heartbeat 3.0.6.
>>>>>>>> Mar 17 14:11:33 lbv1.beta.com info: Link 
> lbv2.beta.com:eth1 is up.
>>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start 
> "ccm" process. (pid=13264)
>>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start 
> "lrmd" process. (pid=13267)
>>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start 
> "attrd" process. (pid=13268)
>>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start 
> "stonithd" process. (pid=13266)
>>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start 
> "cib" process. (pid=13265)
>>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start 
> "crmd" process. (pid=13269)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> # less /var/log/error
>>>>>>>> Mar 17 14:12:20 lbv1 crmd[13269]:    error: 
> process_lrm_event: Operation Stonith2-1_start_0 (node=lbv1.beta.com, call=26, 
> status=4, cib-update=19, confirmed=true) Error
>>>>>>>> 
>>>>>>>> 
>>>>>>>> syslogからstonithをgrepしたものです
>>>>>>>> 
>>>>>>>> Mar 17 14:11:34 lbv1 heartbeat: [13255]: info: 
> Starting child client 
> "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0)
>>>>>>>> Mar 17 14:11:34 lbv1 heartbeat: [13266]: info: 
> Starting "/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0  
> gid 0 (pid 13266)
>>>>>>>> Mar 17 14:11:34 lbv1 stonithd[13266]:   notice: 
> crm_cluster_connect: Connecting to cluster infrastructure: heartbeat
>>>>>>>> Mar 17 14:11:34 lbv1 heartbeat: [13255]: info: the 
> send queue length from heartbeat to client stonithd is set to 1024
>>>>>>>> Mar 17 14:11:40 lbv1 stonithd[13266]:   notice: 
> setup_cib: Watching for stonith topology changes
>>>>>>>> Mar 17 14:11:40 lbv1 stonithd[13266]:   notice: 
> unpack_config: On loss of CCM Quorum: Ignore
>>>>>>>> Mar 17 14:11:40 lbv1 stonithd[13266]:  warning: 
> handle_startup_fencing: Blind faith: not fencing unseen nodes
>>>>>>>> Mar 17 14:11:40 lbv1 stonithd[13266]:  warning: 
> handle_startup_fencing: Blind faith: not fencing unseen nodes
>>>>>>>> Mar 17 14:11:41 lbv1 stonithd[13266]:   notice: 
> stonith_device_register: Added 'Stonith2-1' to the device list (1 active 
> devices)
>>>>>>>> Mar 17 14:11:41 lbv1 stonithd[13266]:   notice: 
> stonith_device_register: Added 'Stonith2-2' to the device list (2 active 
> devices)
>>>>>>>> Mar 17 14:12:04 lbv1 stonithd[13266]:   notice: 
> xml_patch_version_check: Versions did not change in patch 0.5.0
>>>>>>>> Mar 17 14:12:20 lbv1 stonithd[13266]:   notice: 
> log_operation: Operation 'monitor' [13386] for device 
> 'Stonith2-1' returned: -201 (Generic Pacemaker error)
>>>>>>>> Mar 17 14:12:20 lbv1 stonithd[13266]:  warning: 
> log_operation: Stonith2-1:13386 [ Performing: stonith -t external/stonith-helper 
> -S ]
>>>>>>>> Mar 17 14:12:20 lbv1 stonithd[13266]:  warning: 
> log_operation: Stonith2-1:13386 [ failed to exec "stonith" ]
>>>>>>>> Mar 17 14:12:20 lbv1 stonithd[13266]:  warning: 
> log_operation: Stonith2-1:13386 [ failed:  2 ]
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 宜しくお願いします。
>>>>>>>> 
>>>>>>>> 以上
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2015年3月17日 13:32 <renay****@ybb*****>:
>>>>>>>> 
>>>>>>>> 福田さん
>>>>>>>>> 
>>>>>>>>> お疲れ様です。山内です。
>>>>>>>>> 
>>>>>>>>> ということは、stonith-helperのstartに問題があるようですね。
>>>>>>>>> 
>>>>>>>>> stonith-helperの先頭に
>>>>>>>>> 
>>>>>>>>> #!/bin/bash -x
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> を入れて、クラスタを起動すると何かわかるかも知れません。
>>>>>>>>> 
>>>>>>>>> ちなみに、stonith-helperのログもどこかに出ていると思うのですが。。。
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 以上です。
>>>>>>>>> 
>>>>>>>>> ----- Original Message -----
>>>>>>>>>> From: Masamichi Fukuda - elf-systems 
> <masamichi_fukud****@elf-s*****>
>>>>>>>>>> To: 山内英生 <renay****@ybb*****>; 
> "linux****@lists*****" 
> <linux****@lists*****>
>>>>>>>>> 
>>>>>>>>>> Date: 2015/3/17, Tue 12:31
>>>>>>>>>> Subject: Re: [Linux-ha-jp] 
> スプリットブレイン時のSTONITHエラーについて
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 山内さん
>>>>>>>>>> cc:松島さん
>>>>>>>>>> 
>>>>>>>>>> こんにちは、福田です。
>>>>>>>>>> 
>>>>>>>>>> 同じディレクトリにxen0はありました。
>>>>>>>>>> 
>>>>>>>>>> # pwd
>>>>>>>>>> /usr/local/heartbeat/lib/stonith/plugins/external
>>>>>>>>>> 
>>>>>>>>>> # ls
>>>>>>>>>> drac5           ibmrsa          kdumpcheck  
> riloe          vmware
>>>>>>>>>> dracmc-telnet  ibmrsa-telnet  libvirt      
> ssh          xen0
>>>>>>>>>> hetzner        ipmi          nut      
> stonith-helper  xen0-ha
>>>>>>>>>> hmchttp        ippower9258    rackpdu      
> vcenter
>>>>>>>>>> 
>>>>>>>>>> 宜しくお願いします。
>>>>>>>>>> 
>>>>>>>>>> 以上
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 2015-03-17 10:53 GMT+09:00 
> <renay****@ybb*****>:
>>>>>>>>>> 
>>>>>>>>>> 福田さん
>>>>>>>>>>> cc:松島さん
>>>>>>>>>>> 
>>>>>>>>>>> お疲れ様です。山内です。
>>>>>>>>>>> 
>>>>>>>>>>>> 標準出力や標準エラー出力はありませんでした。
>>>>>>>>>>>> 
>>>>>>>>>>>> stonith-helperがおかしいのでしょうか。
>>>>>>>>>>>> stonith-helperはシェルスクリプトなのでインストールはあまり気にしていなかったのですが。
>>>>>>>>>>>> stonith-helperはここに配置されています。
>>>>>>>>>>>> /usr/local/heartbeat/lib/stonith/plugins/external/stonith-helper
>>>>>>>>>>> 
>>>>>>>>>>> このディレクトリにxen0もありますか?
>>>>>>>>>>> 無いようでしたら、問題がありますので、一度、stonith-helperのファイルを属性などはそのまま、xen0と同じディレクトリに
>>>>>>>>>>> コピーしてみてください。
>>>>>>>>>>> 
>>>>>>>>>>> それで稼働するなら、pm_extrasのインストールに問題があるということになります。
>>>>>>>>>>> 
>>>>>>>>>>> 以上です。
>>>>>>>>>>> 
>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>> From: Masamichi Fukuda - elf-systems 
> <masamichi_fukud****@elf-s*****>
>>>>>>>>>>>> To: 山内英生 
> <renay****@ybb*****>; 
> "linux****@lists*****" 
> <linux****@lists*****>
>>>>>>>>>>> 
>>>>>>>>>>>> Date: 2015/3/17, Tue 10:31
>>>>>>>>>>>> Subject: Re: [Linux-ha-jp] 
> スプリットブレイン時のSTONITHエラーについて
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 山内さん
>>>>>>>>>>>> cc:松島さん
>>>>>>>>>>>> 
>>>>>>>>>>>> おはようございます、福田です。
>>>>>>>>>>>> crmの例をありがとうございます。
>>>>>>>>>>>> 
>>>>>>>>>>>> 早速、こちらの環境に合わせてみました。
>>>>>>>>>>>> 
>>>>>>>>>>>> $ cat test.crm
>>>>>>>>>>>> ### Cluster Option ###
>>>>>>>>>>>> property \
>>>>>>>>>>>>     
> no-quorum-policy="ignore" \
>>>>>>>>>>>>     stonith-enabled="true" 
> \
>>>>>>>>>>>>     
> startup-fencing="false" \
>>>>>>>>>>>>     stonith-timeout="710s" 
> \
>>>>>>>>>>>>     
> crmd-transition-delay="2s"
>>>>>>>>>>>> 
>>>>>>>>>>>> ### Resource Default ###
>>>>>>>>>>>> rsc_defaults \
>>>>>>>>>>>>     
> resource-stickiness="INFINITY" \
>>>>>>>>>>>>     
> migration-threshold="1"
>>>>>>>>>>>> 
>>>>>>>>>>>> ### Group Configuration ###
>>>>>>>>>>>> group HAvarnish \
>>>>>>>>>>>>     vip_208 \
>>>>>>>>>>>>     varnishd
>>>>>>>>>>>> 
>>>>>>>>>>>> group grpStonith1 \
>>>>>>>>>>>>     Stonith1-1 \
>>>>>>>>>>>>     Stonith1-2
>>>>>>>>>>>> 
>>>>>>>>>>>> group grpStonith2 \
>>>>>>>>>>>>     Stonith2-1 \
>>>>>>>>>>>>     Stonith2-2
>>>>>>>>>>>> 
>>>>>>>>>>>> ### Clone Configuration ###
>>>>>>>>>>>> clone clone_ping \
>>>>>>>>>>>>     ping
>>>>>>>>>>>> 
>>>>>>>>>>>> ### Fencing Topology ###
>>>>>>>>>>>> fencing_topology \
>>>>>>>>>>>>     lbv1.beta.com: Stonith1-1 
> Stonith1-2 \
>>>>>>>>>>>>     lbv2.beta.com: Stonith2-1 
> Stonith2-2
>>>>>>>>>>>> 
>>>>>>>>>>>> ### Primitive Configuration ###
>>>>>>>>>>>> primitive vip_208 
> ocf:heartbeat:IPaddr2 \
>>>>>>>>>>>>     params \
>>>>>>>>>>>>         
> ip="192.168.17.208" \
>>>>>>>>>>>>         nic="eth0" \
>>>>>>>>>>>>         cidr_netmask="24" 
> \
>>>>>>>>>>>>     op start interval="0s" 
> timeout="90s" on-fail="restart" \
>>>>>>>>>>>>     op monitor 
> interval="5s" timeout="60s" on-fail="restart" 
> \
>>>>>>>>>>>>     op stop interval="0s" 
> timeout="100s" on-fail="fence"
>>>>>>>>>>>> 
>>>>>>>>>>>> primitive varnishd lsb:varnish \
>>>>>>>>>>>>     op start interval="0s" 
> timeout="90s" on-fail="restart" \
>>>>>>>>>>>>     op monitor 
> interval="10s" timeout="60s" on-fail="restart" 
> \
>>>>>>>>>>>>     op stop interval="0s" 
> timeout="100s" on-fail="fence"
>>>>>>>>>>>> 
>>>>>>>>>>>> primitive ping ocf:pacemaker:ping 
> \
>>>>>>>>>>>>     params \
>>>>>>>>>>>>         
> name="default_ping_set" \
>>>>>>>>>>>>         
> host_list="192.168.17.254" \
>>>>>>>>>>>>         multiplier="100" 
> \
>>>>>>>>>>>>         dampen="1" \
>>>>>>>>>>>>     op start interval="0s" 
> timeout="90s" on-fail="restart" \
>>>>>>>>>>>>     op monitor 
> interval="10s" timeout="60s" on-fail="restart" 
> \
>>>>>>>>>>>>     op stop interval="0s" 
> timeout="100s" on-fail="fence"
>>>>>>>>>>>> 
>>>>>>>>>>>> primitive Stonith1-1 
> stonith:external/stonith-helper \
>>>>>>>>>>>>     params \
>>>>>>>>>>>>         
> pcmk_reboot_retries="1" \
>>>>>>>>>>>>         
> pcmk_reboot_timeout="40s" \
>>>>>>>>>>>>         
> hostlist="lbv1.beta.com" \
>>>>>>>>>>>>         
> dead_check_target="192.168.17.132 10.0.17.132" \
>>>>>>>>>>>>         
> standby_check_command="/usr/local/sbin/crm_resource -r varnishd -W | grep 
> -q `hostname`" \
>>>>>>>>>>>>         
> run_online_check="yes" \
>>>>>>>>>>>>     op start interval="0s" 
> timeout="60s" on-fail="restart" \
>>>>>>>>>>>>     op stop interval="0s" 
> timeout="60s" on-fail="ignore"
>>>>>>>>>>>> 
>>>>>>>>>>>> primitive Stonith1-2 
> stonith:external/xen0 \
>>>>>>>>>>>>     params \
>>>>>>>>>>>>         
> pcmk_reboot_timeout="60s" \
>>>>>>>>>>>>         
> hostlist="lbv1.beta.com:/etc/xen/lbv1.cfg" \
>>>>>>>>>>>>         
> dom0="xen0.beta.com" \
>>>>>>>>>>>>     op start interval="0s" 
> timeout="60s" on-fail="restart" \
>>>>>>>>>>>>     op monitor 
> interval="3600s" timeout="60s" on-fail="restart" 
> \
>>>>>>>>>>>>     op stop interval="0s" 
> timeout="60s" on-fail="ignore"
>>>>>>>>>>>> 
>>>>>>>>>>>> primitive Stonith2-1 
> stonith:external/stonith-helper \
>>>>>>>>>>>>     params \
>>>>>>>>>>>>         
> pcmk_reboot_retries="1" \
>>>>>>>>>>>>         
> pcmk_reboot_timeout="40s" \
>>>>>>>>>>>>         
> hostlist="lbv2.beta.com" \
>>>>>>>>>>>>         
> dead_check_target="192.168.17.133 10.0.17.133" \
>>>>>>>>>>>>         
> standby_check_command="/usr/local/sbin/crm_resource -r varnishd -W | grep 
> -q `hostname`" \
>>>>>>>>>>>>         
> run_online_check="yes" \
>>>>>>>>>>>>     op start interval="0s" 
> timeout="60s" on-fail="restart" \
>>>>>>>>>>>>     op stop interval="0s" 
> timeout="60s" on-fail="ignore"
>>>>>>>>>>>> 
>>>>>>>>>>>> primitive Stonith2-2 
> stonith:external/xen0 \
>>>>>>>>>>>>     params \
>>>>>>>>>>>>         
> pcmk_reboot_timeout="60s" \
>>>>>>>>>>>>         
> hostlist="lbv2.beta.com:/etc/xen/lbv2.cfg" \
>>>>>>>>>>>>         
> dom0="xen0.beta.com" \
>>>>>>>>>>>>     op start interval="0s" 
> timeout="60s" on-fail="restart" \
>>>>>>>>>>>>     op monitor 
> interval="3600s" timeout="60s" on-fail="restart" 
> \
>>>>>>>>>>>>     op stop interval="0s" 
> timeout="60s" on-fail="ignore"
>>>>>>>>>>>> 
>>>>>>>>>>>> ### Resource Location ###
>>>>>>>>>>>> location HA_location-1 HAvarnish 
> \
>>>>>>>>>>>>     rule 200: #uname eq 
> lbv1.beta.com \
>>>>>>>>>>>>     rule 100: #uname eq 
> lbv2.beta.com
>>>>>>>>>>>> 
>>>>>>>>>>>> location HA_location-2 HAvarnish 
> \
>>>>>>>>>>>>     rule -INFINITY: not_defined 
> default_ping_set or default_ping_set lt 100
>>>>>>>>>>>> 
>>>>>>>>>>>> location HA_location-3 grpStonith1 
> \
>>>>>>>>>>>>     rule -INFINITY: #uname eq 
> lbv1.beta.com
>>>>>>>>>>>> 
>>>>>>>>>>>> location HA_location-4 grpStonith2 
> \
>>>>>>>>>>>>     rule -INFINITY: #uname eq 
> lbv2.beta.com
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> これを流しこんだところ、昨日とはメッセージが異なります。
>>>>>>>>>>>> pingのメッセージはなくなっていました。
>>>>>>>>>>>> 
>>>>>>>>>>>> # crm_mon -rfA
>>>>>>>>>>>> Last updated: Tue Mar 17 10:21:28 
> 2015
>>>>>>>>>>>> Last change: Tue Mar 17 10:21:09 
> 2015
>>>>>>>>>>>> Stack: heartbeat
>>>>>>>>>>>> Current DC: lbv2.beta.com 
> (82ffc36f-1ad8-8686-7db0-35686465c624) - parti
>>>>>>>>>>>> tion with quorum
>>>>>>>>>>>> Version: 1.1.12-561c4cf
>>>>>>>>>>>> 2 Nodes configured
>>>>>>>>>>>> 8 Resources configured
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Online: [ lbv1.beta.com 
> lbv2.beta.com ]
>>>>>>>>>>>> 
>>>>>>>>>>>> Full list of resources:
>>>>>>>>>>>> 
>>>>>>>>>>>>  Resource Group: HAvarnish
>>>>>>>>>>>>      vip_208    
> (ocf::heartbeat:IPaddr2):       Started lbv1.beta.com
>>>>>>>>>>>>      varnishd   (lsb:varnish):  
> Started lbv1.beta.com
>>>>>>>>>>>>  Resource Group: grpStonith1
>>>>>>>>>>>>      Stonith1-1 
> (stonith:external/stonith-helper):      Stopped
>>>>>>>>>>>>      Stonith1-2 
> (stonith:external/xen0):        Stopped
>>>>>>>>>>>>  Resource Group: grpStonith2
>>>>>>>>>>>>      Stonith2-1 
> (stonith:external/stonith-helper):      Stopped
>>>>>>>>>>>>      Stonith2-2 
> (stonith:external/xen0):        Stopped
>>>>>>>>>>>>  Clone Set: clone_ping [ping]
>>>>>>>>>>>>      Started: [ lbv1.beta.com 
> lbv2.beta.com ]
>>>>>>>>>>>> 
>>>>>>>>>>>> Node Attributes:
>>>>>>>>>>>> * Node lbv1.beta.com:
>>>>>>>>>>>>     + 
> default_ping_set                  : 100
>>>>>>>>>>>> * Node lbv2.beta.com:
>>>>>>>>>>>>     + 
> default_ping_set                  : 100
>>>>>>>>>>>> 
>>>>>>>>>>>> Migration summary:
>>>>>>>>>>>> * Node lbv2.beta.com:
>>>>>>>>>>>>    Stonith1-1: migration-threshold=1 
> fail-count=1000000 last-failure='Tue Mar 17
>>>>>>>>>>>>  10:21:17 2015'
>>>>>>>>>>>> * Node lbv1.beta.com:
>>>>>>>>>>>>    Stonith2-1: migration-threshold=1 
> fail-count=1000000 last-failure='Tue Mar 17
>>>>>>>>>>>>  10:21:17 2015'
>>>>>>>>>>>> 
>>>>>>>>>>>> Failed actions:
>>>>>>>>>>>>     Stonith1-1_start_0 on 
> lbv2.beta.com 'unknown error' (1): call=31, st
>>>>>>>>>>>> atus=Error, last-rc-change='Tue 
> Mar 17 10:21:15 2015', queued=0ms, exec=1082ms
>>>>>>>>>>>>     Stonith2-1_start_0 on 
> lbv1.beta.com 'unknown error' (1): call=31, st
>>>>>>>>>>>> atus=Error, last-rc-change='Tue 
> Mar 17 10:21:16 2015', queued=0ms, exec=1079ms
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> /var/log/ha-debugのログです。
>>>>>>>>>>>> 
>>>>>>>>>>>> IPaddr2(vip_208)[7851]: 
> 2015/03/17_10:21:22 INFO: Adding inet address 192.168.17.208/24 with broadcast 
> address 192.168.17.255 to device eth0
>>>>>>>>>>>> IPaddr2(vip_208)[7851]: 
> 2015/03/17_10:21:22 INFO: Bringing device eth0 up
>>>>>>>>>>>> IPaddr2(vip_208)[7851]: 
> 2015/03/17_10:21:22 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p 
> /var/run/resource-agents/send_arp-192.168.17.208 eth0 192.168.17.208 auto 
> not_used not_used
>>>>>>>>>>>> 
>>>>>>>>>>>> 標準出力や標準エラー出力はありませんでした。
>>>>>>>>>>>> 
>>>>>>>>>>>> stonith-helperがおかしいのでしょうか。
>>>>>>>>>>>> stonith-helperはシェルスクリプトなのでインストールはあまり気にしていなかったのですが。
>>>>>>>>>>>> stonith-helperはここに配置されています。
>>>>>>>>>>>> /usr/local/heartbeat/lib/stonith/plugins/external/stonith-helper
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 宜しくお願いします。
>>>>>>>>>>>> 
>>>>>>>>>>>> 以上
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 2015-03-17 9:45 GMT+09:00 
> <renay****@ybb*****>:
>>>>>>>>>>>> 
>>>>>>>>>>>> 福田さん
>>>>>>>>>>>>> 
>>>>>>>>>>>>> おはようございます。山内です。
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 念の為、手元にある複数のstonithを利用した場合の例を抜粋してお送りします。
>>>>>>>>>>>>> (実際には、改行に気を付けてください)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 以下の例は、PM1.1系での設定で、
>>>>>>>>>>>>> nodeaは、prmStonith1-1、 prmStonith1-2の順でstonithが実行されます。
>>>>>>>>>>>>> nodebは、prmStonith2-1、 prmStonith2-2の順でstonithが実行されます。
>>>>>>>>>>>>> 
>>>>>>>>>>>>> stonith自体は、helperとsshです。
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> (snip)
>>>>>>>>>>>>> ### Group Configuration ###
>>>>>>>>>>>>> group grpStonith1 \
>>>>>>>>>>>>> prmStonith1-1 \
>>>>>>>>>>>>> prmStonith1-2
>>>>>>>>>>>>> 
>>>>>>>>>>>>> group grpStonith2 \
>>>>>>>>>>>>> prmStonith2-1 \
>>>>>>>>>>>>> prmStonith2-2
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ### Fencing Topology ###
>>>>>>>>>>>>> fencing_topology \
>>>>>>>>>>>>> nodea: prmStonith1-1 
> prmStonith1-2 \
>>>>>>>>>>>>> nodeb: prmStonith2-1 
> prmStonith2-2
>>>>>>>>>>>>> (snp)
>>>>>>>>>>>>> primitive prmStonith1-1 
> stonith:external/stonith-helper \
>>>>>>>>>>>>> params \
>>>>>>>>>>>>> 
>>>>>>>>>>>>> pcmk_reboot_retries="1" 
> \
>>>>>>>>>>>>> pcmk_reboot_timeout="40s" 
> \
>>>>>>>>>>>>> hostlist="nodea" \
>>>>>>>>>>>>> dead_check_target="192.168.28.60 
> 192.168.28.70" \
>>>>>>>>>>>>> standby_check_command="/usr/sbin/crm_resource 
> -r prmRES -W | grep -qi `hostname`" \
>>>>>>>>>>>>> run_online_check="yes" 
> \
>>>>>>>>>>>>> op start interval="0s" 
> timeout="60s" on-fail="restart" \
>>>>>>>>>>>>> op stop interval="0s" 
> timeout="60s" on-fail="ignore"
>>>>>>>>>>>>> 
>>>>>>>>>>>>> primitive prmStonith1-2 
> stonith:external/ssh \
>>>>>>>>>>>>> params \
>>>>>>>>>>>>> pcmk_reboot_timeout="60s" 
> \
>>>>>>>>>>>>> hostlist="nodea" \
>>>>>>>>>>>>> op start interval="0s" 
> timeout="60s" on-fail="restart" \
>>>>>>>>>>>>> op monitor 
> interval="3600s" timeout="60s" on-fail="restart" 
> \
>>>>>>>>>>>>> op stop interval="0s" 
> timeout="60s" on-fail="ignore"
>>>>>>>>>>>>> 
>>>>>>>>>>>>> primitive prmStonith2-1 
> stonith:external/stonith-helper \
>>>>>>>>>>>>> params \
>>>>>>>>>>>>> pcmk_reboot_retries="1" 
> \
>>>>>>>>>>>>> pcmk_reboot_timeout="40s" 
> \
>>>>>>>>>>>>> hostlist="nodeb" \
>>>>>>>>>>>>> dead_check_target="192.168.28.61 
> 192.168.28.71" \
>>>>>>>>>>>>> standby_check_command="/usr/sbin/crm_resource 
> -r prmRES -W | grep -qi `hostname`" \
>>>>>>>>>>>>> run_online_check="yes" 
> \
>>>>>>>>>>>>> op start interval="0s" 
> timeout="60s" on-fail="restart" \
>>>>>>>>>>>>> op stop interval="0s" 
> timeout="60s" on-fail="ignore"
>>>>>>>>>>>>> 
>>>>>>>>>>>>> primitive prmStonith2-2 
> stonith:external/ssh \
>>>>>>>>>>>>> params \
>>>>>>>>>>>>> pcmk_reboot_timeout="60s" 
> \
>>>>>>>>>>>>> hostlist="nodeb" \
>>>>>>>>>>>>> op start interval="0s" 
> timeout="60s" on-fail="restart" \
>>>>>>>>>>>>> op monitor 
> interval="3600s" timeout="60s" on-fail="restart" 
> \
>>>>>>>>>>>>> op stop interval="0s" 
> timeout="60s" on-fail="ignore"
>>>>>>>>>>>>> (snip)
>>>>>>>>>>>>> location 
> rsc_location-grpStonith1-2 grpStonith1 \
>>>>>>>>>>>>> rule -INFINITY: #uname eq nodea
>>>>>>>>>>>>> location 
> rsc_location-grpStonith2-3 grpStonith2 \
>>>>>>>>>>>>> rule -INFINITY: #uname eq nodeb
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 以上です。
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> 
>>>>>>>>>>>> ELF Systems
>>>>>>>>>>>> Masamichi Fukuda
>>>>>>>>>>>> mail to: 
> masamichi_fukud****@elf-s*****
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Linux-ha-japan mailing list
>>>>>>>>>>> Linux****@lists*****
>>>>>>>>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> 
>>>>>>>>>> ELF Systems
>>>>>>>>>> Masamichi Fukuda
>>>>>>>>>> mail to: masamichi_fukud****@elf-s*****
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> Linux-ha-japan mailing list
>>>>>>>>> Linux****@lists*****
>>>>>>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> 
>>>>>>>> ELF Systems
>>>>>>>> Masamichi Fukuda
>>>>>>>> mail to: masamichi_fukud****@elf-s*****
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Linux-ha-japan mailing list
>>>>>>> Linux****@lists*****
>>>>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> ELF Systems
>>>>>> Masamichi Fukuda
>>>>>> mail to: masamichi_fukud****@elf-s*****
>>>>>> 
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Linux-ha-japan mailing list
>>>>> Linux****@lists*****
>>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>>>>> 
>>>> 
>>>> --
>>>> 
>>>> ELF Systems
>>>> Masamichi Fukuda
>>>> mail to: masamichi_fukud****@elf-s*****
>>>> 
>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> Linux-ha-japan mailing list
>>> Linux****@lists*****
>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>>> 
>> 
>> 
>> -- 
>> 
>> ELF Systems
>> Masamichi Fukuda
>> mail to: masamichi_fukud****@elf-s*****
>> 
>> 
> 
> _______________________________________________
> Linux-ha-japan mailing list
> Linux****@lists*****
> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
> 




Linux-ha-japan メーリングリストの案内
Zurück zum Archiv-Index