半瓶内容

 results 1 - 1 of about 1 for 新年遭遇data guard故障之案例处理. (0.294 seconds) 

新年遭遇data guard故障之案例处理

新年上班第一天就忙于处理一起data guard故障,最近太忙了,今天抽空整理一下。

话说那时是2010年第一个工作日,早上来发现备份库小型机一个目录占满了:

p55a@/home/oracle> df -g

Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on

/dev/fslv01       53.50      0.00  100%     1167    68% /stb_arch

查看可用的存储空间:

p55a@/home/oracle> lsvg
rootvg
backvg

p55a@/home/oracle> lsvg backvg
VOLUME GROUP:       backvg                   VG IDENTIFIER:  000221f30000d600000001141bb7ac12
VG STATE:           active                   PP SIZE:        256 megabyte(s)
VG PERMISSION:      read/write               TOTAL PPs:      546 (139776 megabytes)
MAX LVs:            256                      FREE PPs:       0 (0 megabytes)
LVs:                4                        USED PPs:       546 (139776 megabytes)
OPEN LVs:           3                        QUORUM:         2
TOTAL PVs:          1                        VG DESCRIPTORS: 2
STALE PVs:          0                        STALE PPs:      0
ACTIVE PVs:         1                        AUTO ON:        yes
MAX PPs per VG:     32512
MAX PPs per PV:     1016                     MAX PVs:        32
LTG size (Dynamic): 256 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:          no                       BB POLICY:      relocatable

已经没有足够的空间可以分配,而这个/stb_arch是存放主库传过来的归档日志,很显然备库在apply这些归档时出现了阻塞,导致主库过来的归档文件不断积压,最终占满了空间。

先去主库暂停到备库的日志同步:

SQL> show parameter log_archive_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest                     string
log_archive_dest_1                   string      LOCATION=/ora_arch VALID_FOR=(
ALL_LOGFILES,ALL_ROLES) DB_UNI
QUE_NAME=erpdb
log_archive_dest_10                  string
log_archive_dest_2                   string      SERVICE=bkdb VALID_FOR=(ONLIN
E_LOGFILES,PRIMARY_ROLE) DB_UN
IQUE_NAME=bkdb

SQL> alter system set log_archive_dest_state_2=defer;

System altered.

在备库查看DBA_LOGSTDBY_EVENTS视图:

SELECT EVENT_TIME,EVENT,XIDUSN, XIDSLT, XIDSQN FROM DBA_LOGSTDBY_EVENTS order by 1 desc;

发现有建立临时表的操作引起了阻塞:

create global temporary table sntempStoreOut090407 on commit preserve rows as
select
g.islcode,idx,g.bcode as bcode1,g.bcode
,g.scode,g.hcode,g.receivecode,g.ihcode,g.ecode
,g.gcode,g.gattr1,g.gattr2,g.gattr3,g.gattr4,
,g.qtx,g.qtp,g.qtpunit,g.qtycount,g.qtyunit,g.qty,g.iodate as iodate,g.ratifydate as ratifydate,g.stockfrom,g.piecewt,g.slstatus
,g.rmb2,upric0 as upric0
,g.rmb0 as rmb0
,g.gtcode1 as gtcode1
,g.slcode
,g.ecode2
,g.busimode1
,busimode2
,bcode as vpreapre
,g.ccode as ccode
,g.gcode as gcode0
,g.sbcode
,g.upislcode
,g.slcodex,g.etrdcode,g.busimode3,g.inoutmode
from
storeslistg g where 1 <> 1

先停止standby,跳过这个事务,kill使用原来的临时表的session,手动执行以上SQL,然后开启standby:

SQL> alter database stop logical standby apply;

Database altered.

SQL> exec dbms_logstdby.skip_transaction(9,27,994864);

PL/SQL procedure successfully completed.

SQL> alter database start logical standby apply;

这样这个引起阻塞的事务就能过去了,但是由于空间占满了,应用仍然有问题,这时可以手动转移走一些比较靠后的不会马上应用的归档,然后有足够的空间后再还原回来,要注意的就是要保证这些文件的属主是Oracle用户。

# cd /tmp/log_2010
# ls -l
total 1025568
-rw-r-----   1 root     system      1547264 Jan 04 08:52 1_118088_640266118.dbf
-rw-r-----   1 root     system       110080 Jan 04 08:52 1_118089_640266118.dbf
-rw-r-----   1 root     system        52224 Jan 04 08:52 1_118090_640266118.dbf
-rw-r-----   1 root     system        52736 Jan 04 08:52 1_118091_640266118.dbf
-rw-r-----   1 root     system        65024 Jan 04 08:52 1_118092_640266118.dbf
-rw-r-----   1 root     system        56832 Jan 04 08:52 1_118093_640266118.dbf
-rw-r-----   1 root     system      1138688 Jan 04 08:52 1_118094_640266118.dbf
-rw-r-----   1 root     system        52736 Jan 04 08:52 1_118095_640266118.dbf
-rw-r-----   1 root     system        59904 Jan 04 08:52 1_118096_640266118.dbf
-rw-r-----   1 root     system        57344 Jan 04 08:52 1_118097_640266118.dbf
-rw-r-----   1 root     system    520338944 Jan 04 08:52 2_125502_640266118.dbf
-rw-r-----   1 root     system      1545216 Jan 04 08:52 2_125518_640266118.dbf
# chown oracle.oinstall *
# ls -l
total 1025568
-rw-r-----   1 oracle   oinstall    1547264 Jan 04 08:52 1_118088_640266118.dbf
-rw-r-----   1 oracle   oinstall     110080 Jan 04 08:52 1_118089_640266118.dbf
-rw-r-----   1 oracle   oinstall      52224 Jan 04 08:52 1_118090_640266118.dbf
-rw-r-----   1 oracle   oinstall      52736 Jan 04 08:52 1_118091_640266118.dbf
-rw-r-----   1 oracle   oinstall      65024 Jan 04 08:52 1_118092_640266118.dbf
-rw-r-----   1 oracle   oinstall      56832 Jan 04 08:52 1_118093_640266118.dbf
-rw-r-----   1 oracle   oinstall    1138688 Jan 04 08:52 1_118094_640266118.dbf
-rw-r-----   1 oracle   oinstall      52736 Jan 04 08:52 1_118095_640266118.dbf
-rw-r-----   1 oracle   oinstall      59904 Jan 04 08:52 1_118096_640266118.dbf
-rw-r-----   1 oracle   oinstall      57344 Jan 04 08:52 1_118097_640266118.dbf
-rw-r-----   1 oracle   oinstall  520338944 Jan 04 08:52 2_125502_640266118.dbf
-rw-r-----   1 oracle   oinstall    1545216 Jan 04 08:52 2_125518_640266118.dbf
# pwd
/tmp/log_2010
# cp * /stb_arch

之后在备库的同步继续进行,然后又经历了2个临时表的手动干预后,终于腾出了足够的空间,这时可以打开主库的同步:

SQL> alter system set log_archive_dest_state_2=enable;

System altered.

后来又发现备库出现了停滞,而并没有引起阻塞的事务,仔细查看DBA_LOGSTDBY_LOG视图发现有些归档丢失,没有传到备库,去主库找归档时发现已经删除了,这时只能从RMAN中恢复这些文件。根据缺失的sequence来找对应的RMAN备份片:

ERPDB2@/orabak/arch/transmited>rman target /

RMAN> list backup of archivelog from sequence 125504 until sequence 125554 thread 2;

List of Backup Sets
===================

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
16862   1.14G      DISK        00:04:57     04-JAN-10
BP Key: 19677   Status: AVAILABLE  Compressed: YES  Tag: TAG20100104T130015
Piece Name: /orabak/arch/ERPDB_arch_20100104_707403929_16929_1

List of Archived Logs in backup set 16862
Thrd Seq     Low SCN    Low Time  Next SCN   Next Time
---- ------- ---------- --------- ---------- ---------
2    125504  11523639376 04-JAN-10 11523954045 04-JAN-10
. . . . . .
2    125509  11525286806 04-JAN-10 11525631638 04-JAN-10

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
16864   453.25M    DISK        00:01:57     04-JAN-10
BP Key: 19679   Status: AVAILABLE  Compressed: YES  Tag: TAG20100104T130015
Piece Name: /orabak/arch/ERPDB_arch_20100104_707404542_16931_1

List of Archived Logs in backup set 16864
Thrd Seq     Low SCN    Low Time  Next SCN   Next Time
---- ------- ---------- --------- ---------- ---------
2    125510  11525631638 04-JAN-10 11525981043 04-JAN-10
. . . . . .
2    125522  11526443719 04-JAN-10 11526475093 04-JAN-10

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
16867   401.53M    DISK        00:01:42     04-JAN-10
BP Key: 19682   Status: AVAILABLE  Compressed: YES  Tag: TAG20100104T213015
Piece Name: /orabak/arch/ERPDB_arch_20100104_707434218_16935_1

List of Archived Logs in backup set 16867
Thrd Seq     Low SCN    Low Time  Next SCN   Next Time
---- ------- ---------- --------- ---------- ---------
2    125523  11526475093 04-JAN-10 11526613894 04-JAN-10
. . . . . .
2    125554  11529391158 04-JAN-10 11529453386 04-JAN-10

RMAN> exit

从这里的Piece Name可以看出缺少3个备份片文件,找到这3个文件,放到RMAN备份的目的地/orabak/arch/目录下。写到这里我发现《Oracle DBA手记》书稿里一个类似的操作被我写成了放到默认的归档日志路径下,可能是晚上赶稿神志不清吧,道歉先。

然后恢复出来归档日志到指定的目录:

RMAN> run {
2> set archivelog destination to '/orabak/testarch';
3> restore archivelog from sequence 125504 until sequence 125554 thread 2;
4> }

在恢复的过程中会自动把这些归档同时传到备库,无需手动再传。

备库端归档日志全了,就能顺利的同步了。至此这个问题得到解决。这个案例和我在《Oracle DBA手记》里记录的一个案例类似,可互相参看印证。


Leave a Comment