クラスタオプションの不具合
どういうわけか以下のエラーが出るようになった
crm(live)configure# verify WARNING: IPaddr2_1: specified timeout 5s for monitor_0 is smaller than the advised 20s WARNING: drbddisk_2: default timeout 20s for start is smaller than the advised 240 WARNING: drbddisk_2: default timeout 20s for stop is smaller than the advised 100 WARNING: Filesystem_3: default timeout 20s for start is smaller than the advised 60 WARNING: Filesystem_3: default timeout 20s for stop is smaller than the advised 60 ERROR: cib-bootstrap-options: attribute default-resource-failure-stickiness does not exist ERROR: cib-bootstrap-options: attribute short-resource-names does not exist ERROR: cib-bootstrap-options: attribute transition-idle-timeout does not exist
WARNINGは無視していいとして、ERRORのところは実際に値を設定できない。
調べてみると解決策は日本語のサイトにはなく、海外の以下のサイトが見つかった
Pb with rpms for epel-5/x86_64 | Linux-HA | Users
Well, there are no official fixes available here. You have these
options: wait until Andrew makes new rpms or compile the
pacemaker yourself. Or apply the patches to the installed files.
The patch which fixes this problem is here:
http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/042548a451fc
どうやらパッチを当てろということらしいが、参照先は dc-version と cluster-infrastructure が無い場合・・
仕方ないのでこんなやり方が通用するのか半信半疑だが、見よう見まねで /usr/lib/python2.4/site-packages/crm/cibconfig.py に存在しないと怒られてるパラメタを追加してみた。
# diff -C0 cibconfig.py cibconfig.py_orig *** cibconfig.py 2010-11-04 18:22:54.000000000 +0900 --- cibconfig.py_orig 2010-11-04 18:19:57.000000000 +0900 *************** *** 1208 **** ! l += ("dc-version","cluster-infrastructure","last-lrm-refresh","default-resource-failure-stickiness","short-resource-names","transition-idle-timeout") --- 1208 ---- ! l += ("dc-version","cluster-infrastructure","last-lrm-refresh") [root@z151 crm]#
するとどういうわけかエラーが出なくなった!パラメタ値も設定できる!
もう一つのセカンダリにもコピー。
セカンダリでもエラーが出なくなった
[root@z151 crm]# scp cibconfig.py 192.168.11.181:/usr/lib/python2.4/site-packages/crm/cibconfig.py Address 192.168.11.181 maps to z152.drbd, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT! root@192.168.11.181's password: cibconfig.py 100% 85KB 85.3KB/s 00:00 [root@z152 crm]# crm configure verify WARNING: IPaddr2_1: specified timeout 5s for monitor_0 is smaller than the advised 20s WARNING: drbddisk_2: default timeout 20s for start is smaller than the advised 240 WARNING: drbddisk_2: default timeout 20s for stop is smaller than the advised 100 WARNING: Filesystem_3: default timeout 20s for start is smaller than the advised 60 WARNING: Filesystem_3: default timeout 20s for stop is smaller than the advised 60
(追記)SUSE Linux Enterprise High Availability Extension
P18
resource-failure-stickiness クラスタオプションは、migration-threshold クラスタオプションに替わりました。
- 以下のクラスタ設定を追加
# crm configure crm(live)configure# property no-quorum-policy="ignore" crm(live)configure# no-quorum-policy="ignore" ERROR: syntax: no-quorum-policy=ignore crm(live)configure# property no-quorum-policy="ignore" crm(live)configure# property default-resource-stickiness="200" crm(live)configure# commit # crm configure show node $id="4b6dd4d0-21f5-cd4c-2ca1-f8a45c55b02f" z151.drbd node $id="c7c991cf-414b-8daa-6e5d-8c4e2e4bb197" z152.drbd primitive Filesystem_3 ocf:heartbeat:Filesystem \ op monitor interval="120s" timeout="60s" \ params device="/dev/drbd0" directory="/mnt" primitive IPaddr2_1 ocf:heartbeat:IPaddr2 \ node $id="4b6dd4d0-21f5-cd4c-2ca1-f8a45c55b02f" z151.drbd node $id="c7c991cf-414b-8daa-6e5d-8c4e2e4bb197" z152.drbd primitive Filesystem_3 ocf:heartbeat:Filesystem \ op monitor interval="120s" timeout="60s" \ params device="/dev/drbd0" directory="/mnt" primitive IPaddr2_1 ocf:heartbeat:IPaddr2 \ op monitor interval="5s" timeout="5s" \ params ip="192.168.11.182" cidr_netmask="24" nic="eth0" broadcast="192.168.11.255" primitive drbddisk_2 ocf:linbit:drbd \ op monitor interval="10" \ params drbd_resource="r0" primitive httpd_4 lsb:httpd \ op monitor interval="10s" group group_1 IPaddr2_1 drbddisk_2 Filesystem_3 httpd_4 location cli-standby-group_1 group_1 \ rule $id="cli-standby-rule-group_1" -inf: #uname eq z151.drbd location rsc_location_group_1 group_1 \ rule $id="preferred_location_group_1" 100: #uname eq z151.drbd property $id="cib-bootstrap-options" \ symmetric-cluster="true" \ no-quorum-policy="stop" \ default-resource-stickiness="0" \ default-resource-failure-stickiness="0" \ stonith-enabled="false" \ stonith-action="reboot" \ startup-fencing="true" \ stop-orphan-resources="true" \ stop-orphan-actions="true" \ remove-after-stop="false" \ short-resource-names="true" \ transition-idle-timeout="5min" \ default-action-timeout="20s" \ is-managed-default="true" \ cluster-delay="60s" \ pe-error-series-max="-1" \ pe-warn-series-max="-1" \ pe-input-series-max="-1" \ dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \ cluster-infrastructure="Heartbeat"
fstype="ext3" ms ms_drbd0 drbddisk_2 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" colocation c_group_1 inf: group_1 ms_drbd0:Master order o_drbd_befor_group_1 inf: ms_drbd0:promote group_1:start
を追加
途中どうしてもcrm_monで「call=11, rc=1, status=complete unknown error」というエラーが大量に出てくるが、なぜか両ノードでheartbeatを再起動すると直った。
しかし新たに次のエラーが…_| ̄|〇
Failed actions: drbddisk_2:0_promote_0 (node=z151.drbd, call=10, rc=-2, status=Timed Out): unknown exec error Nov 05 18:03:22 z151.drbd pengine: [6945]: ERROR: create_notification_boundaries: Creating boundaries for ms_drbd0
しかしまたもやもう一度heartbeatを再起動すると直った。結局なんだったんだ・・
crm(live)configure# ms ms_drbd0 drbddisk_2 \ > meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" ERROR: drbddisk_2 already in use at group_1 crm(live)configure#
と怒られたので、group_1からdrbddisk_2を削除
Last updated: Fri Nov 5 10:10:10 2010 Stack: Heartbeat Current DC: z152.drbd (c7c991cf-414b-8daa-6e5d-8c4e2e4bb197) - partition with quorum Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677 2 Nodes configured, unknown expected votes 1 Resources configured. ============ Online: [ z151.drbd z152.drbd ] Resource Group: group_1 IPaddr2_1 (ocf::heartbeat:IPaddr2): Started z152.drbd drbddisk_2 (ocf::heartbeat:drbd): Stopped Filesystem_3 (ocf::heartbeat:Filesystem): Stopped httpd_4 (lsb:httpd): Stopped Failed actions: drbddisk_2_monitor_0 (node=z152.drbd, call=3, rc=6, status=complete): not configured drbddisk_2_monitor_0 (node=z151.drbd, call=3, rc=6, status=complete): not configured
結局最終的に次のようになった
# crm configure crm(live)configure# show node $id="4b6dd4d0-21f5-cd4c-2ca1-f8a45c55b02f" z151.drbd node $id="c7c991cf-414b-8daa-6e5d-8c4e2e4bb197" z152.drbd primitive Filesystem_3 ocf:heartbeat:Filesystem \ op monitor interval="120s" timeout="60s" \ params device="/dev/drbd0" directory="/mnt" fstype="ext3" options="noatime" primitive IPaddr2_1 ocf:heartbeat:IPaddr2 \ op monitor interval="5s" timeout="5s" \ params ip="192.168.11.182" cidr_netmask="24" nic="eth0" broadcast="192.168.11.255" primitive drbddisk_2 ocf:linbit:drbd \ op monitor interval="10s" \ params drbd_resource="r0" primitive httpd_4 lsb:httpd \ op monitor interval="10s" group group_1 IPaddr2_1 Filesystem_3 httpd_4 ms ms_drbd0 drbddisk_2 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" location cli-standby-group_1 group_1 \ rule $id="cli-standby-rule-group_1" -inf: #uname eq z151.drbd location rsc_location_group_1 group_1 \ rule $id="preferred_location_group_1" 100: #uname eq z151.drbd colocation c_group_1 inf: group_1 ms_drbd0:Master order o_drbd_befor_group_1 inf: ms_drbd0:promote group_1:start property $id="cib-bootstrap-options" \ symmetric-cluster="true" \ no-quorum-policy="ignore" \ default-resource-stickiness="0" \ default-resource-failure-stickiness="0" \ stonith-enabled="false" \ stonith-action="reboot" \ startup-fencing="true" \ stop-orphan-resources="true" \ stop-orphan-actions="true" \ remove-after-stop="false" \ short-resource-names="true" \ transition-idle-timeout="5min" \ default-action-timeout="20s" \ is-managed-default="true" \ cluster-delay="60s" \ pe-error-series-max="-1" \ pe-warn-series-max="-1" \ pe-input-series-max="-1" \ dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \ cluster-infrastructure="Heartbeat"
crm_mon の結果も良好
============ Last updated: Fri Nov 5 18:29:46 2010 Stack: Heartbeat Current DC: z152.drbd (c7c991cf-414b-8daa-6e5d-8c4e2e4bb197) - partition with quorum Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677 2 Nodes configured, unknown expected votes 2 Resources configured. ============ Online: [ z151.drbd z152.drbd ] Resource Group: group_1 IPaddr2_1 (ocf::heartbeat:IPaddr2): Started z152.drbd Filesystem_3 (ocf::heartbeat:Filesystem): Started z152.drbd httpd_4 (lsb:httpd): Started z152.drbd Master/Slave Set: ms_drbd0 Masters: [ z152.drbd ] Slaves: [ z151.drbd ]