From nagatu2 @ gmail.com Wed Nov 28 21:30:54 2007 From: nagatu2 @ gmail.com (=?ISO-2022-JP?B?GyRCMUpERRsoQg==?=) Date: Wed, 28 Nov 2007 21:30:54 +0900 Subject: [postgresforest-users 98] Re: =?iso-2022-jp?b?GyRCJSolcyVpJSQlcyVqJSslUCVqQ2YlKCVpITwbKEI=?= =?iso-2022-jp?b?GyRCISEbKEJhcHBseUxvZ1JlY29yZCgpIGZhaWxlZC4=?= =?iso-2022-jp?b?GyRCJEskRCQkJEYbKEI=?= Message-ID: <4304b9d50711280430l67528e2er9fd88598290af11e@mail.gmail.com> お世話になっております。 永津と申します。 以前下記のような形で質問させて頂きました。 > オンラインリカバリ実行中のエラーについて質問があります。 > > マシン2台を用いて運用テストを行っています。 補足として、GSCをそれぞれのマシンに配置して冗長化の構成をとっております。 > それぞれにPostgresSQL4.0.2とPostgresSQL8.1.6をインストールしています。 > > 1つのデータベースに対して、毎秒約9000件のレコードをInsertしている状態で > オンラインリカバリを開始した際、phase1,phase2においてエラー > org.postgresforest.tool.recovery.RecoveryException: applyLogRecord() > failed.が発生しました。 > この後、start_vmまでオンラインリカバリを継続、終了いたしましたが、 > データの差異は改善されませんでした。 > > エラーは同じ状況で再現します。 > また、phase1終了後の時点でInsertを開始した場合は、 > 正常にリカバリを終了します。 上記の質問について次のようなご回答を頂きました。 > 長妻です。 > > 基本的にオンラインリカバリは、トランザクション(特に更新)が > 少ない時間帯に行っていただくことを想定しています。 > 例えばWEBアプリケーションのバックエンドであれば、 > 夜間帯のトランザクションが少ない時間帯、 > なおかつバッチなどが走っていない状態で・・・ > といった具合です。 > 秒間9000件の更新はなかなか激しいですね(^^; > ちなみに、Phase1の時間に更新量が多いと、リカバリー最中に > vmが停止している時間が長くなってしまいます。 > > > さて、それはともかく、このエラーについてはこちらでも > 調べてみましたが、今のところ根本的な原因がわかりません。 > 追加で頂いたログも含め、同じ内容を2度INSERTしているという > 現象面だけは見えているのですが・・・ > リカバリマネージャの不具合の可能性もあるため > もう少しこちらでも調べてみるつもりです。 > > もし永津さんのほうでもなにかわかりましたら、 > 情報をいただけると助かります。 チューニングをしてPostgresSQLパフォーマンスの向上を行う、 レコードのinsert件数をデータベース全体で約1200件/秒まで下げる、 などの方法を試みましたが、同様のエラーが発生しています。 そちらの調査では、何か進展がありましたでしょうか。 -debugオプションを追加してリカバリコマンドを実行したときに出力された メッセージログを今回追記いたします。 $ /usr/local/forest40/bin/ForestRecovery.sh -user forest -pass forest -gsc_url jdbc:postgresql://192.168.0.165:5432/gsc -phase1 0 1 forest_db -debug 2007-11-28 20:50:03.722 DEBUG: 2 GSC(s) found. 2007-11-28 20:50:03.729 NOTICE: RECOVERING: 2007-11-28 20:50:03.729 NOTICE: FROM [0]jdbc:postgresql://192.168.0.165:5432/forest_db 2007-11-28 20:50:03.729 NOTICE: TO [1]jdbc:postgresql://192.168.0.134:5432/forest_db 2007-11-28 20:50:03.729 DEBUG: jdbc:postgresql://192.168.0.165:5432/forest_db 2007-11-28 20:50:03.729 DEBUG: jdbc:postgresql://192.168.0.134:5432/forest_db 2007-11-28 20:50:03.736 DEBUG: Backend PID=17602 2007-11-28 20:50:03.736 NOTICE: SERVER STATUS: 2007-11-28 20:50:03.737 NOTICE: Src: Server 0, Status 1 2007-11-28 20:50:03.738 NOTICE: Dst: Server 1, Status -1 2007-11-28 20:50:03.738 NOTICE: RECOVERY_MODE = 1 2007-11-28 20:50:03.739 NOTICE: RECOVERY / INITIALIZATION 2007-11-28 20:50:03.74 NOTICE: TABLE(S) TO COPY: 2007-11-28 20:50:03.74 NOTICE: test_table1 2007-11-28 20:50:03.74 NOTICE: test_table2 2007-11-28 20:50:03.74 NOTICE: test_table3 2007-11-28 20:50:03.74 NOTICE: test_table4 2007-11-28 20:50:03.74 NOTICE: RECOVERY / PHASE 1: start. 2007-11-28 20:50:03.741 NOTICE: GSC CHECK: 2007-11-28 20:50:03.742 NOTICE: forest_servdb, forest_db: OK 2007-11-28 20:50:03.743 NOTICE: forest_tablepartdtl, test_table1: OK 2007-11-28 20:50:03.744 NOTICE: forest_tablepartdtl, test_table2: OK 2007-11-28 20:50:03.744 NOTICE: forest_tablepartdtl, test_table3: OK 2007-11-28 20:50:03.745 NOTICE: forest_tablepartdtl, test_table4: OK 2007-11-28 20:50:08.332 NOTICE: Log table already exists. 2007-11-28 20:50:08.333 NOTICE: Log table already exists. 2007-11-28 20:50:08.333 NOTICE: Log table already exists. 2007-11-28 20:50:08.334 DEBUG: beginRecoveryLogging: test_table1 2007-11-28 20:50:08.334 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been created. 2007-11-28 20:50:08.335 DEBUG: CREATE TRIGGER test_table1_logger AFTER INSERT OR UPDATE OR DELETE ON test_table1 FOR EACH ROW EXECUTE PROCEDURE postgresforest.logger() 2007-11-28 20:50:08.335 DEBUG: CancelThread is going to wait 60 secs. 2007-11-28 20:50:08.336 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db disabled. 2007-11-28 20:50:08.336 DEBUG: Logger started, table => test_table1 2007-11-28 20:50:08.336 DEBUG: beginRecoveryLogging: test_table2 2007-11-28 20:50:08.336 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been created. 2007-11-28 20:50:08.336 DEBUG: CREATE TRIGGER test_table2_logger AFTER INSERT OR UPDATE OR DELETE ON test_table2 FOR EACH ROW EXECUTE PROCEDURE postgresforest.logger() 2007-11-28 20:50:08.336 DEBUG: CancelThread is going to wait 60 secs. 2007-11-28 20:50:08.337 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db disabled. 2007-11-28 20:50:08.337 DEBUG: Logger started, table => test_table2 2007-11-28 20:50:08.338 DEBUG: beginRecoveryLogging: test_table3 2007-11-28 20:50:08.338 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been created. 2007-11-28 20:50:08.338 DEBUG: CREATE TRIGGER test_table3_logger AFTER INSERT OR UPDATE OR DELETE ON test_table3 FOR EACH ROW EXECUTE PROCEDURE postgresforest.logger() 2007-11-28 20:50:08.338 DEBUG: CancelThread is going to wait 60 secs. 2007-11-28 20:50:08.339 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db disabled. 2007-11-28 20:50:08.339 DEBUG: Logger started, table => test_table3 2007-11-28 20:50:08.339 DEBUG: beginRecoveryLogging: test_table4 2007-11-28 20:50:08.339 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been created. 2007-11-28 20:50:08.339 DEBUG: CREATE TRIGGER test_table4_logger AFTER INSERT OR UPDATE OR DELETE ON test_table4 FOR EACH ROW EXECUTE PROCEDURE postgresforest.logger() 2007-11-28 20:50:08.339 DEBUG: CancelThread is going to wait 60 secs. 2007-11-28 20:50:08.34 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db disabled. 2007-11-28 20:50:08.34 DEBUG: Logger started, table => test_table4 2007-11-28 20:50:08.34 NOTICE: COPYING ALL TABLES: start. 2007-11-28 20:50:08.341 NOTICE: CREATE SNAPSHOT: start. 2007-11-28 20:50:08.341 DEBUG: getSnapshotXid: test_table1 2007-11-28 20:50:08.341 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been created. 2007-11-28 20:50:08.341 DEBUG: CancelThread is going to wait 60 secs. 2007-11-28 20:50:08.385 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db disabled. 2007-11-28 20:50:08.385 NOTICE: COPYING test_table1 2007-11-28 20:50:08.386 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been created. 2007-11-28 20:50:08.386 DEBUG: copyTableToFile(): test_table1 to /tmp/test_table1.db 2007-11-28 20:50:08.386 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been finished without a cancel command. 2007-11-28 20:50:08.386 DEBUG: CancelThread is going to wait 60 secs. 2007-11-28 20:50:20.459 DEBUG: copyTableToFile(): done. 2007-11-28 20:50:20.459 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db disabled. 2007-11-28 20:50:20.459 NOTICE: done. 2007-11-28 20:50:20.459 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been finished without a cancel command. 2007-11-28 20:50:20.459 NOTICE: COPYING test_table2 2007-11-28 20:50:20.459 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been created. 2007-11-28 20:50:20.459 DEBUG: copyTableToFile(): test_table2 to /tmp/test_table2.db 2007-11-28 20:50:20.459 DEBUG: CancelThread is going to wait 60 secs. 2007-11-28 20:50:32.554 DEBUG: copyTableToFile(): done. 2007-11-28 20:50:32.554 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db disabled. 2007-11-28 20:50:32.554 NOTICE: done. 2007-11-28 20:50:32.554 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been finished without a cancel command. 2007-11-28 20:50:32.554 NOTICE: COPYING test_table3 2007-11-28 20:50:32.554 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been created. 2007-11-28 20:50:32.554 DEBUG: copyTableToFile(): test_table3 to /tmp/test_table3.db 2007-11-28 20:50:32.554 DEBUG: CancelThread is going to wait 60 secs. 2007-11-28 20:50:44.606 DEBUG: copyTableToFile(): done. 2007-11-28 20:50:44.606 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db disabled. 2007-11-28 20:50:44.606 NOTICE: done. 2007-11-28 20:50:44.606 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been finished without a cancel command. 2007-11-28 20:50:44.607 NOTICE: COPYING test_table4 2007-11-28 20:50:44.607 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been created. 2007-11-28 20:50:44.607 DEBUG: copyTableToFile(): test_table4 to /tmp/test_table4.db 2007-11-28 20:50:44.607 DEBUG: CancelThread is going to wait 60 secs. 2007-11-28 20:50:56.662 DEBUG: copyTableToFile(): done. 2007-11-28 20:50:56.662 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db disabled. 2007-11-28 20:50:56.662 NOTICE: done. 2007-11-28 20:50:56.662 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been finished without a cancel command. 2007-11-28 20:50:56.662 DEBUG: Snapshot xid=71776536 2007-11-28 20:50:56.662 NOTICE: CREATE SNAPSHOT: done. 2007-11-28 20:50:56.662 NOTICE: REBUILD FROM SNAPSHOT: start. 2007-11-28 20:51:01.677 DEBUG: rebuildTableFromFile(): test_table1 from /tmp/test_table1.db 2007-11-28 20:51:01.677 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been created. 2007-11-28 20:51:01.677 DEBUG: CancelThread is going to wait 60 secs. 2007-11-28 20:51:08.339 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been finished without a cancel command. 2007-11-28 20:51:08.342 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been finished without a cancel command. 2007-11-28 20:51:08.344 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been finished without a cancel command. 2007-11-28 20:51:08.35 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been finished without a cancel command. 2007-11-28 20:51:28.678 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db disabled. 2007-11-28 20:51:28.678 DEBUG: rebuildTableFromFile(): done. 2007-11-28 20:51:28.678 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been finished without a cancel command. 2007-11-28 20:51:33.002 DEBUG: rebuildTableFromFile(): test_table2 from /tmp/test_table2.db 2007-11-28 20:51:33.002 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been created. 2007-11-28 20:51:33.002 DEBUG: CancelThread is going to wait 60 secs. 2007-11-28 20:52:07.916 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db disabled. 2007-11-28 20:52:07.917 DEBUG: rebuildTableFromFile(): done. 2007-11-28 20:52:07.917 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been finished without a cancel command. 2007-11-28 20:52:12.239 DEBUG: rebuildTableFromFile(): test_table3 from /tmp/test_table3.db 2007-11-28 20:52:12.239 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been created. 2007-11-28 20:52:12.239 DEBUG: CancelThread is going to wait 60 secs. 2007-11-28 20:52:58.378 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db disabled. 2007-11-28 20:52:58.378 DEBUG: rebuildTableFromFile(): done. 2007-11-28 20:52:58.378 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been finished without a cancel command. 2007-11-28 20:53:02.695 DEBUG: rebuildTableFromFile(): test_table4 from /tmp/test_table4.db 2007-11-28 20:53:02.696 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been created. 2007-11-28 20:53:02.696 DEBUG: CancelThread is going to wait 60 secs. 2007-11-28 20:53:44.636 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db disabled. 2007-11-28 20:53:44.636 DEBUG: rebuildTableFromFile(): done. 2007-11-28 20:53:44.636 DEBUG: CancelThread: pid=17602, url=jdbc:postgresql://192.168.0.165:5432/forest_db has been finished without a cancel command. 2007-11-28 20:53:45.046 NOTICE: REBUILD FROM SNAPSHOT: done. 2007-11-28 20:53:45.046 NOTICE: COPYING ALL TABLES: done. 2007-11-28 20:53:45.046 NOTICE: APPLYING LOG RECORD(S): start. 2007-11-28 20:53:45.245 DEBUG: getMaxMinTimestamp(): Xid [71776538...71842194] (input) 2007-11-28 20:53:45.392 DEBUG: getMaxMinTimestamp(): Timestamp [2007-11-28 20:50:04.853027...2007-11-28 20:50:04.853027] (result) 2007-11-28 20:53:45.392 DEBUG: openLogRecords(): xid=[71776538...71842194] 2007-11-28 20:53:46.241 DEBUG: # of log records = 65460 org.postgresforest.tool.recovery.RecoveryException: applyLogRecord() failed. - /* 71776540/2007-11-28 20:50:04.853027 */ INSERT INTO test_table4(measure,login_id,ph_id,chs,aia_v1,conv_v1,aia_v2,conv_v2,rc,rc_ms,updt,correct,v_exe) values ('2006-05-15 02:00:00','testsah28 ','testsah28 ','1 ',null,'0',null,null,'2006-05-15 02:00:00','0','2006-05-15 02:00:00','0','0') at org.postgresforest.tool.recovery.RecoveryManager.applyLogRecord(Unknown Source) at org.postgresforest.tool.recovery.ForestRecovery.applyAllRecords(Unknown Source) at org.postgresforest.tool.recovery.ForestRecovery.RecoveryPhase1(Unknown Source) at org.postgresforest.tool.recovery.ForestRecovery.main(Unknown Source) $ また、障害インスタンス(オンラインリカバリ実行インスタンス)のサーバログには 以下のようなメッセージが出力されていました。 LOG: unexpected EOF on client connection ERROR: duplicate key violates unique constraint "test_table4_pkey" STATEMENT: /* 71776540/2007-11-28 20:50:04.853027 */ INSERT INTO test_table4(measure,login_id,ph_id,chs,aia_v1,conv_v1,aia_v2,conv_v2,rc,rc_ms,updt,correct,v_exe) values ('2006-05-15 02:00:00','testsah28 ','testsah28 ','1 ',null,'0',n ull,null,'2006-05-15 02:00:00','0','2006-05-15 02:00:00','0','0') LOG: unexpected EOF on client connection LOG: unexpected EOF on client connection