从Cursor Cache中加载执行计划到SPM中,是最常用的一种方式,下面将介绍整个加载的步骤。

  1. 确认sql在Cursor Cache中

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
     select s.CHILD_NUMBER,
    s.PLAN_HASH_VALUE,
    s.HASH_VALUE,
    s.PARSING_SCHEMA_NAME,
    s.LAST_ACTIVE_TIME,
    s.OPTIMIZER_COST,
    s.sql_plan_baseline
    from v$sql s
    where sql_id = '1s6a8wn4p6rym'
    order by s.LAST_ACTIVE_TIME desc;
_*注意,不是所有在v$sql中看到的plan都被加载到SPM中,其中有个规律是,如果plan是相同的,即便是不同的schema,在SPM中都只有一条plan,而且这不同的schema执行同样的sql时,会使用同一个执行计划基线(前提是基线的plan能够在各自的schema中重新生成)_
  1. 使用DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE加载执行计划

    1
    2
    3
    4
    5
    6
    declare
    v_sql_plan_id pls_integer;
    begin
    v_sql_plan_id := dbms_spm.load_plans_from_cursor_cache(sql_id => '59t5p5xv2bjjy');
    end;
    /
该匿名块会将CURSOR_CACHE中所有的plan都加入到SQL Plan Baseline中,并默认为accepted,如需改变默认效果,请参阅[Oracle Database PL/SQL Packages and Types Reference](http://docs.oracle.com/database/122/ARPLS/DBMS_SPM.htm#GUID-4EFE728B-8A2A-4DF4-ABE6-E7B133CDB5DA)
  1. 查看DBA_SQL_PLAN_BASELINES确认load结果

    1
    2
    3
    4
    5
    SELECT *
    FROM dba_sql_plan_baselines
    WHERE signature IN
    ( SELECT exact_matching_signature FROM v$sql WHERE sql_id='59t5p5xv2bjjy'
    );
  2. 删除掉不想要的Baseline

    如果能确定不想要的plan(当然,也可以在load的时候进行选择),那么可以通过SPM.DROP_SQL_PLAN_BASELINE删除

    先找到sql_handle,再查找该sql_handle下所有的plan,例如:

    1
    2
    3
    4
    5
    SELECT sql_handle
    FROM dba_sql_plan_baselines
    WHERE signature IN
    ( SELECT exact_matching_signature FROM v$sql WHERE sql_id='59t5p5xv2bjjy'
    );
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14

    select SIGNATURE,
    SQL_HANDLE,
    SQL_TEXT,
    PLAN_NAME,
    PARSING_SCHEMA_NAME,
    LAST_EXECUTED,
    ENABLED,
    ACCEPTED,
    REPRODUCED,
    EXECUTIONS,
    optimizer_cost
    from dba_sql_plan_baselines;
    where sql_handle ='SQL_fca270c30418ee86';

    确定不需要的plan有很多方式,如果你正在tuning该条sql,那么你对不好的执行计划的cost肯定印象深刻,那么就可以根据optimizer_cost确定PLAN_NAME,也可以通过dbms_xplan.display_sql_plan_baseline来浏览执行计划确定,接下来就可以删除该plan了:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    DECLARE
    v_dropped_plans number;
    BEGIN
    v_dropped_plans := DBMS_SPM.DROP_SQL_PLAN_BASELINE (
    sql_handle => 'SQL_971c23013e9cf52a',
    plan_name => 'SQL_PLAN_9f71304z9tx9a42ef5257'
    );
    DBMS_OUTPUT.PUT_LINE('dropped ' || v_dropped_plans || ' plans');
    END;
  1. 演化(evolve)新的plan

    11g默认为已存在Baseline的SQL自动捕捉新的plan,但新的plan为unaccepted的状态,要使用新的plan,需要演化(evolve)为accepted,下面介绍11g的演化方法,12c请参阅MANUALLY EVOLVING SQL PLAN BASELINES IN ORACLE DATABASE 12C RELEASE 2

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    DECLARE  
    l_plans_altered clob;
    BEGIN
    l_plans_altered := dbms_spm.evolve_sql_plan_baseline(
    sql_handle => 'SQL_fca270c30418ee86',
    plan_name => 'SQL_PLAN_gt8mhsc21jvn62e14fda0',
    verify =>'YES',
    commit =>'YES');
    DBMS_OUTPUT.put_line('Plans Altered: ' || l_plans_altered);
    END;
    /

    该匿名块会评估测试执行plan_name为SQL_PLAN_gt8mhsc21jvn62e14fda0的plan,并生成报告,如果它的性能优于现存的accepted的plan,那么它会被自动接受为accepted,否则便不会被接受。

在oracle 12.1之前,是使用EVOLVE_SQL_PLAN_BASELINE函数来演化SQL plan baselines,从12c开始,演化的过程就被基于task的方法代替了,12c的演化过程主要包括以下几个阶段:

  • CREATE_EVOLVE_TASK
  • EXECUTE_EVOLVE_TASK
  • REPORT_EVOLVE_TASK
  • IMPLEMENT_EVOLVE_TASK

下面将演示以下12c执行计划基线演化的过程:

先来创建一个测试表:

1
2
3
4
5
6
7
8
9
10
11
CREATE TABLE spm_test_tab (
id NUMBER,
description VARCHAR2(50)
);

INSERT /*+ APPEND */ INTO spm_test_tab
SELECT level,
'Description for ' || level
FROM dual
CONNECT BY level <= 10000;
COMMIT;

查询一条记录,并查看其执行计划

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
SELECT description
FROM spm_test_tab
WHERE id = 99;

SELECT * FROM TABLE (SELECT DBMS_XPLAN.DISPLAY_CURSOR(null,null,'advanced') from dual);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID gat6z1bc6nc2d, child number 0
-------------------------------------
SELECT description FROM spm_test_tab WHERE id = 99

Plan hash value: 1107868462

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 13 (100)| |
|* 1 | TABLE ACCESS FULL| SPM_TEST_TAB | 1 | 40 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

1 - SEL$1 / SPM_TEST_TAB@SEL$1

Outline Data
-------------

/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
DB_VERSION('12.2.0.1')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
FULL(@"SEL$1" "SPM_TEST_TAB"@"SEL$1")
END_OUTLINE_DATA
*/

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ID"=99)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

1 - "DESCRIPTION"[VARCHAR2,50]

Note
-----
- dynamic statistics used: dynamic sampling (level=2)

查询sql_id

1
2
3
4
5
6
7
8

SELECT sql_id
FROM v$sql
WHERE plan_hash_value = 1107868462;

SQL_ID
-------------
gat6z1bc6nc2d

将该sql的执行计划从cursor cache中加载到SQL Plan Baseline

1
2
3
4
5
6
7
8
9
10
11
12
13
SET SERVEROUTPUT ON
DECLARE
l_plans_loaded PLS_INTEGER;
BEGIN
l_plans_loaded := DBMS_SPM.load_plans_from_cursor_cache(
sql_id => 'gat6z1bc6nc2d');

DBMS_OUTPUT.put_line('Plans Loaded: ' || l_plans_loaded);
END;
/
Plans Loaded: 1

PL/SQL procedure successfully completed.

在DBA_SQL_PLAN_BASELINES中查询baseline的信息

1
2
3
4
5
6
7
8
9
COLUMN sql_handle FORMAT A20
COLUMN plan_name FORMAT A30

SELECT sql_handle, plan_name, enabled, accepted
FROM dba_sql_plan_baselines;

SQL_HANDLE PLAN_NAME ENA ACC
-------------------- ------------------------------ --- ---
SQL_7b76323ad90440b9 SQL_PLAN_7qxjk7bch8h5tb65c37c8 YES YES

创建一条索引,再次执行该sql,查看执行计划结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
CREATE INDEX spm_test_tab_idx ON spm_test_tab(id);
begin
dbms_stats.gather_table_stats(ownname => 'SYS',
tabname =>'SPM_TEST_TAB',
estimate_percent =>DBMS_STATS.AUTO_SAMPLE_SIZE,
method_opt => 'FOR ALL COLUMNS SIZE AUTO',
granularity=>'ALL',
cascade => TRUE,
no_invalidate => false);
end;
/

SELECT description
FROM spm_test_tab
WHERE id = 99;

Execution Plan
----------------------------------------------------------
Plan hash value: 1107868462

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 25 | 13 (0)| 00:00:01|
|* 1 | TABLE ACCESS FULL | SPM_TEST_TAB | 1 | 25 | 13 (0)| 00:00:01|
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ID"=99)

Note
-----
- SQL plan baseline "SQL_PLAN_7qxjk7bch8h5tb65c37c8" used for this statement


Statistics
----------------------------------------------------------
0 recursive calls
3 db block gets
52 consistent gets
0 physical reads
0 redo size
365 bytes sent via SQL*Net to client
484 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

可见,在有索引的情况下执行计划依然走了全表扫描,说明SPM已发挥作用,现在查看一下DBA_SQL_PLAN_BASELINES

1
2
3
4
5
6
7
8
SELECT sql_handle, plan_name, enabled, accepted 
FROM dba_sql_plan_baselines
WHERE sql_handle = 'SQL_7b76323ad90440b9';

SQL_HANDLE PLAN_NAME ENA ACC
-------------------- ------------------------------ --- ---
SQL_7b76323ad90440b9 SQL_PLAN_7qxjk7bch8h5t3652c362 YES NO
SQL_7b76323ad90440b9 SQL_PLAN_7qxjk7bch8h5tb65c37c8 YES YES

现在执行计划基线中已经有了一条未被accepted的计划,我们可以查看一下该计划的内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
select 
*
from table(
dbms_xplan.display_sql_plan_baseline(
sql_handle=>'SQL_7b76323ad90440b9',
plan_name=>'SQL_PLAN_7qxjk7bch8h5t3652c362',
format=>'basic +NOTE'));

--------------------------------------------------------------------------------
SQL handle: SQL_7b76323ad90440b9
SQL text: SELECT description FROM spm_test_tab WHERE id = 99
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_7qxjk7bch8h5t3652c362 Plan id: 911393634
Enabled: YES Fixed: NO Accepted: NO Origin: AUTO-CAPTURE
Plan rows: From dictionary
--------------------------------------------------------------------------------

Plan hash value: 2338891031

----------------------------------------------------------------
| Id | Operation | Name |
----------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| SPM_TEST_TAB |
| 2 | INDEX RANGE SCAN | SPM_TEST_TAB_IDX |
----------------------------------------------------------------

可见新的plan是走索引的,现在可以等待维护窗口期间由自动维护任务自动evolve,或者手动进行evolve,现在演示手动evolve,创建一个evolve task

1
2
3
4
5
6
7
8
9
10
11
SET SERVEROUTPUT ON
DECLARE
l_return VARCHAR2(32767);
BEGIN
l_return := DBMS_SPM.create_evolve_task(sql_handle => 'SQL_7b76323ad90440b9');
DBMS_OUTPUT.put_line('Task Name: ' || l_return);
END;
/
Task Name: TASK_1169

PL/SQL procedure successfully completed.

执行此任务

1
2
3
4
5
6
7
8
9
10
11
SET SERVEROUTPUT ON
DECLARE
l_return VARCHAR2(32767);
BEGIN
l_return := DBMS_SPM.execute_evolve_task(task_name => 'TASK_1169');
DBMS_OUTPUT.put_line('Execution Name: ' || l_return);
END;
/
Execution Name: EXEC_1281

PL/SQL procedure successfully completed.

查看任务报告

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
SET LONG 1000000 PAGESIZE 1000 LONGCHUNKSIZE 100 LINESIZE 100
SELECT DBMS_SPM.report_evolve_task(task_name => 'TASK_1169', execution_name => 'EXEC_1281') AS output
FROM dual;

OUTPUT
----------------------------------------------------------------------------------------------------
GENERAL INFORMATION SECTION
---------------------------------------------------------------------------------------------

Task Information:
---------------------------------------------
Task Name : TASK_1169
Task Owner : SYS
Execution Name : EXEC_1281
Execution Type : SPM EVOLVE
Scope : COMPREHENSIVE
Status : COMPLETED
Started : 04/24/2017 15:04:17
Finished : 04/24/2017 15:04:25
Last Updated : 04/24/2017 15:04:25
Global Time Limit : 2147483646
Per-Plan Time Limit : UNUSED
Number of Errors : 0
---------------------------------------------------------------------------------------------

SUMMARY SECTION
---------------------------------------------------------------------------------------------
Number of plans processed : 1
Number of findings : 1
Number of recommendations : 1
Number of errors : 0
---------------------------------------------------------------------------------------------

DETAILS SECTION
---------------------------------------------------------------------------------------------
Object ID : 2
Test Plan Name : SQL_PLAN_7qxjk7bch8h5t3652c362
Base Plan Name : SQL_PLAN_7qxjk7bch8h5tb65c37c8
SQL Handle : SQL_7b76323ad90440b9
Parsing Schema : SYS
Test Plan Creator : SYS
SQL Text : SELECT description FROM spm_test_tab WHERE id = 99

Execution Statistics:
-----------------------------
Base Plan Test Plan
---------------------------- ----------------------------
Elapsed Time (s): .000037 .000005
CPU Time (s): .000044 0
Buffer Gets: 5 0
Optimizer Cost: 13 2
Disk Reads: 0 0
Direct Writes: 0 0
Rows Processed: 0 0
Executions: 10 10


FINDINGS SECTION
---------------------------------------------------------------------------------------------

Findings (1):
-----------------------------
1. The plan was verified in 0.03200 seconds. It passed the benefit criterion
because its verified performance was 18.01480 times better than that of the
baseline plan.

Recommendation:
-----------------------------
Consider accepting the plan. Execute
dbms_spm.accept_sql_plan_baseline(task_name => 'TASK_1169', object_id => 2,
task_owner => 'SYS');


EXPLAIN PLANS SECTION
---------------------------------------------------------------------------------------------

Baseline Plan
-----------------------------
Plan Id : 1
Plan Hash Value : 3059496904

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 25 | 13 | 00:00:01 |
| * 1 | TABLE ACCESS FULL | SPM_TEST_TAB | 1 | 25 | 13 | 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 1 - filter("ID"=99)


Test Plan
-----------------------------
Plan Id : 2
Plan Hash Value : 911393634

---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 25 | 2 | 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED | SPM_TEST_TAB | 1 | 25 | 2 | 00:00:01 |
| * 2 | INDEX RANGE SCAN | SPM_TEST_TAB_IDX | 1 | | 1 | 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("ID"=99)

---------------------------------------------------------------------------------------------

报告中建议执行dbms_spm.accept_sql_plan_baseline接受新的plan,但我们应该使用IMPLEMENT_EVOLVE_TASK来接受新的plan

1
2
3
4
5
6
7
8
9
10
11
SET SERVEROUTPUT ON
DECLARE
l_return NUMBER;
BEGIN
l_return := DBMS_SPM.implement_evolve_task(task_name => 'TASK_1169');
DBMS_OUTPUT.put_line('Plans Accepted: ' || l_return);
END;
/
Plans Accepted: 1

PL/SQL procedure successfully completed.

查看DBA_SQL_PLAN_BASELINES,新的plan应该已经是accepted了

1
2
3
4
5
6
7
8
SELECT sql_handle, plan_name, enabled, accepted 
FROM dba_sql_plan_baselines
WHERE sql_handle = 'SQL_7b76323ad90440b9';

SQL_HANDLE PLAN_NAME ENA ACC
-------------------- ------------------------------ --- ---
SQL_7b76323ad90440b9 SQL_PLAN_7qxjk7bch8h5t3652c362 YES YES
SQL_7b76323ad90440b9 SQL_PLAN_7qxjk7bch8h5tb65c37c8 YES YES

再次执行该sql,查看执行计划

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
SET AUTOTRACE TRACE LINESIZE 130

SELECT description
FROM spm_test_tab
WHERE id = 99;

Execution Plan
----------------------------------------------------------
Plan hash value: 2338891031

--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 25 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| SPM_TEST_TAB | 1 | 25 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SPM_TEST_TAB_IDX | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=99)

Note
-----
- SQL plan baseline "SQL_PLAN_7qxjk7bch8h5t3652c362" used for this statement


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
372 bytes sent via SQL*Net to client
484 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

由此可见,新的执行计划基线已被使用。

参考资料:Adaptive SQL Plan Management (SPM) in Oracle Database 12c Release 1 (12.1)

日常管理中经常会碰到需要drop表中字段的情况,如果表中数据很少时还可以直接drop,但是如果是一张大表,那么drop一个列是很耗时且耗资源的事,那么我们可以先逻辑上删除此字段,再在空闲的时候彻底drop掉以释放空间。

注:本文摘自官方文档

##1. Marking Columns Unused

If you are concerned about the length of time it could take to drop column data from all of the rows in a large table, you can use the ALTER TABLE...SET UNUSED statement. This statement marks one or more columns as unused, but does not actually remove the target column data or restore the disk space occupied by these columns. However, a column that is marked as unused is not displayed in queries or data dictionary views, and its name is removed so that a new column can reuse that name. All constraints, indexes, and statistics defined on the column are also removed.

To mark the hiredate and mgr columns as unused, execute the following statement:

1
ALTER TABLE hr.admin_emp SET UNUSED (hiredate, mgr);

You can later remove columns that are marked as unused by issuing an ALTER TABLE...DROP UNUSED COLUMNS statement. Unused columns are also removed from the target table whenever an explicit drop of any particular column or columns of the table is issued.

The data dictionary views USER_UNUSED_COL_TABS, ALL_UNUSED_COL_TABS, or DBA_UNUSED_COL_TABS can be used to list all tables containing unused columns. The COUNT field shows the number of unused columns in the table.

1
2
3
4
5
SELECT * FROM DBA_UNUSED_COL_TABS;

OWNER TABLE_NAME COUNT
--------------------------- --------------------------- -----
HR ADMIN_EMP 2

For external tables, the SET UNUSED statement is transparently converted into an ALTER TABLE DROP COLUMN statement. Because external tables consist of metadata only in the database, the DROP COLUMN statement performs equivalently to the SET UNUSED statement.

##2. Removing Unused Columns

The ALTER TABLE...DROP UNUSED COLUMNS statement is the only action allowed on unused columns. It physically removes unused columns from the table and reclaims disk space.

In the ALTER TABLE statement that follows, the optional clause CHECKPOINT is specified. This clause causes a checkpoint to be applied after processing the specified number of rows, in this case 250. Checkpointing cuts down on the amount of undo logs accumulated during the drop column operation to avoid a potential exhaustion of undo space.

1
ALTER TABLE hr.admin_emp DROP UNUSED COLUMNS CHECKPOINT 250;

上篇文章简要介绍了一下当外键无索引时,更新删除主表的数据会造成子表的锁定,如果此时子表上有事务,那么进行更新删除的session变会等待,等待事件就是enq: TM - contention

外键与 TM enqueue lock 的主要问题是 在早期版本中(9i之前) 当 子表child table上 的外键没有索引时 , 若发生 父表 parent table 上记录被delete 或 update时 , 会在child table上加 share lock, 这会 阻塞 child table 上的DML。
但是从 9i以后的当 子表child table上 的外键没有索引时, 父表parent table上的delete 、update 只在 实际这个DML执行的过程中要求share (TM lmode=4) lock,而不会在整个事务中 都要求保持 child table上的 share lock。

还是先了解一下oracle中的锁模式吧,TM锁和TX锁都属于DML锁,这里介绍的是TM的锁模式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Value   Name(s)                    Table method (TM lock)
0 No lock n/a

1 Null lock (NL) Used during some parallel DML operations (e.g. update) by
the pX slaves while the QC is holding an exclusive lock.

2 Sub-share (SS) Until 9.2.0.5/6 "select for update"
Row-share (RS) Since 9.2.0.1/2 used at opposite end of RI during DML
Lock table in row share mode
Lock table in share update mode

3 Sub-exclusive(SX) Update (also "select for update" from 9.2.0.5/6)
Row-exclusive(RX) Lock table in row exclusive mode
Since 11.1 used at opposite end of RI during DML

4 Share (S) Lock table in share mode
Can appear during parallel DML with id2 = 1, in the PX slave sessions
Common symptom of "foreign key locking" (missing index) problem

5 share sub exclusive (SSX) Lock table in share row exclusive mode
share row exclusive (SRX) Less common symptom of "foreign key locking" but likely to be more
frequent if the FK constraint is defined with "on delete cascade."

6 Exclusive (X) Lock table in exclusive mode

share lock就是mode为4的S锁

Summary of Locks Obtained by DML Statements

Summary of Locks Obtained by DML Statements

TM 锁在下列场景中被申请:

  • 在OPS(早期的RAC)中LGWR会以ID1=0 & ID2=0去申请该队列锁来检查 DML_LOCKS 在所有实例中是全0还是全非0
  • 当一个单表或分区 需要做不同的表/分区操作时,ORACLE需要协调这些操作,所以需要申请该队列锁。包括:
  • 启用参考约束 referential constraints
  • 修改约束从DIASABLE NOVALIDATE 到DISABLE VALIDATE
  • 重建IOT
  • 创建视图或者修改ALTER视图时可能需要申请该队列锁
  • 分析表统计信息或validate structure时
  • 一些PDML并行DML操作
  • 所有可能调用kkdllk()函数的操作
  • 太多太多了。。。

下面是各种锁之间的兼容性:

lock

好了,开始动手做个试验吧,试验中我会引用KST trace的内容,关于KST,本文不做介绍,只拿来使用

首先,准备环境,本实验均在11.2.0.4环境下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
SQL> conn lp/lp
Connected.
SQL> create table prim(a int,b varchar2(10));

Table created.

SQL> alter table prim add constraint PK_PRIM primary key(a);

Table altered.

SQL> create table child (ca int,cb varchar2(10));

Table created.

SQL> alter table child add constraint FK_CHILD_CA foreign key (ca) references prim(a);

Table altered.

SQL> insert into prim values(1,'asdasd');

1 row created.

SQL> insert into prim values(2,'asdasd');

1 row created.

SQL> insert into prim values(3,'asdasd');

1 row created.

SQL> commit;

Commit complete.

这里要说一下,在外键是否存在on delete cascade时锁的获取还有区别,所以我们分别来测试,首先是没有索引没有cascade的情况下,各个语句的锁获取情况

无索引,无cascade

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SQL> select distinct sid from v$mystat;

SID
----------
17

SQL> select pid,spid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat));

PID SPID
---------- ------------
36 2761

SQL> alter system set "_trace_events"='10000-10999:255:36';

System altered.

insert 父表:

1
insert into prim values(5,'asdasd');

查看kst信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
select kst.event,kst.sid,kst.pid,kst.function,kst.data from x$trace kst where pid=36 and sid=17;

10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563c-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10813 17 36 ktubnd ktubnd: Bind usn 3 nax 1 nbx 0 lng 0 par 0
10813 17 36 ktubnd ktubnd: Txn Bound xid: 3.3.1150
10704 17 36 ksqgtlctx ksqgtl: acquire TX-00030003-0000047e mode=X flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10811 17 36 ktbgcl1 3301000100000000 0000000000000000 69dd140000000000 0200000000000000
10811 17 36 ktbgcl1 3301000100000000 0000000000000000 0de3140000000000 0810c76a177f0000
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=173 seq_num=180 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=173 seq_num=180 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=4, exc=4, tot=4
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=174 seq_num=181 snap_id=1

可见父表上的插入会获取父表和子表mode3TM锁,TM后跟的是object_id的十六进制,一个TX锁,让我们验证一下:

1
2
3
4
5
6
7
SQL> select * from v$lock where type in('TM','TX');

ADDR KADDR SID TY ID1 ID2 IMODE REQUEST CTIME BLOCK
---------------- -------------- ---- -- ---------- ----------- -------- --------- -------- ---------
00007FD97FFE54E8 00007FD97FFE5548 17 TM 87614 0 3 0 8 0
00000000ACE37CD0 00000000ACE37D48 17 Tx 262162 851 6 0 8 0
00007FD97EFE54E8 00007FD97FFE5548 17 TM 87612 0 3 0 8 0

我们来commit一下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
10704 17 36 ksqrcli ksqrcl: release TX-00030003-0000047e mode=X
10813 17 36 ktudnx ktudnx: dec cnt xid:3.3.1150 nax:0 nbx:0
10704 17 36 ksqrcli ksqrcl: release TM-0001563e-00000000 mode=SX
10704 17 36 ksqrcli ksqrcl: release TM-0001563c-00000000 mode=SX
10021 17 36 kcrf_commit_force 2ee3140000000000 2fe3140000000000
10005 17 36 kslwtbctx KSL WAIT BEG [log file sync] 7416/0x1cf8 1368878/0x14e32e 0/0x0 wait_id=175 seq_num=182 snap_id=1
10005 17 36 ksliwat KSL FACILITY WAIT fac#=3 time_waited_csecs=1
10005 17 36 ksliwat KSL POST RCVD poster=11 num=76 loc='ksl2.h LINE:2374 ID:kslpsr' id1=138 id2=0 name=EV type=0 fac#=3 posted=0x3 may_be_posted=1
10005 17 36 kslwtectx KSL WAIT END [log file sync] 7416/0x1cf8 1368878/0x14e32e 0/0x0 wait_id=175 seq_num=182 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=11126, exc=11126, tot=11126
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=176 seq_num=183 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=176 seq_num=183 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=3, exc=3, tot=3
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=177 seq_num=184 snap_id=1

可见获得的锁全部一一释放

insert子表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
insert into child values(2,'sadsada');

10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563c-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10813 17 36 ktubnd ktubnd: Bind usn 7 nax 1 nbx 0 lng 0 par 0
10813 17 36 ktubnd ktubnd: Txn Bound xid: 7.17.835
10704 17 36 ksqgtlctx ksqgtl: acquire TX-00070011-00000343 mode=X flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=186 seq_num=193 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=186 seq_num=193 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=3, exc=3, tot=3
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=187 seq_num=194 snap_id=1

可见子表上的插入也会获取父表和子表mode3TM锁

update父表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
update prim set a=1 where a=1;

10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563c-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=S flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqrcli ksqrcl: release TM-0001563e-00000000 mode=S
10704 17 36 ksqrcli ksqrcl: SUCCESS
10811 17 36 ktbgcl1 2c01000100000000 0000000000000000 0fe7140000000000 0200000000000000
10811 17 36 ktbgcl1 2c01000100000000 0000000000000000 27e9140000000000 b80fb86a177f0000
10813 17 36 ktubnd ktubnd: Bind usn 4 nax 1 nbx 0 lng 0 par 0
10813 17 36 ktubnd ktubnd: Txn Bound xid: 4.10.851
10704 17 36 ksqgtlctx ksqgtl: acquire TX-0004000a-00000353 mode=X flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=204 seq_num=211 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=204 seq_num=211 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=5, exc=5, tot=5
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=205 seq_num=212 snap_id=1

可以很清楚的看到在执行语句期间,注意仅仅是语句的执行期间,会附加一个mode为4的S锁到子表上,很快便释放了

delete父表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
delete from prim where a=4

10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563c-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=S flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqrcli ksqrcl: release TM-0001563e-00000000 mode=S
10704 17 36 ksqrcli ksqrcl: SUCCESS
10813 17 36 ktubnd ktubnd: Bind usn 2 nax 1 nbx 0 lng 0 par 0
10813 17 36 ktubnd ktubnd: Txn Bound xid: 2.0.1117
10704 17 36 ksqgtlctx ksqgtl: acquire TX-00020000-0000045d mode=X flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=S flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqrcli ksqrcl: release TM-0001563e-00000000 mode=S
10704 17 36 ksqrcli ksqrcl: SUCCESS
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 bae9140000000000
10812 17 36 ktrgcm 0200000000000000 0000000000000000 5d04000000000000
10812 17 36 ktrgcm 9b12c00000000000 4401000000000000 0200000000000000
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 bae9140000000000
10812 17 36 ktrgcm 0200000000000000 0000000000000000 5d04000000000000
10812 17 36 ktrgcm 9b12c00000000000 4401000000000000 0200000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 bae9140000000000
10812 17 36 ktrgcm 0200000000000000 0000000000000000 5d04000000000000
10812 17 36 ktrgcm 9b12c00000000000 4401000000000000 0200000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 bae9140000000000
10812 17 36 ktrgcm 0200000000000000 0000000000000000 5d04000000000000
10812 17 36 ktrgcm 9b12c00000000000 4401000000000000 0200000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 bae9140000000000
10812 17 36 ktrgcm 0200000000000000 0000000000000000 5d04000000000000
10812 17 36 ktrgcm 9b12c00000000000 4401000000000000 0200000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 0000000000000000
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=227 seq_num=234 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=227 seq_num=234 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=3, exc=3, tot=3
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=228 seq_num=235 snap_id=1

delete跟update比多了一次S锁的获取和释放,为何呢,是否和删除的行数有关?我们再多删一行试试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
SQL> delete from prim where a=4 or a=5;

10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563c-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=S flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqrcli ksqrcl: release TM-0001563e-00000000 mode=S
10704 17 36 ksqrcli ksqrcl: SUCCESS
10813 17 36 ktubnd ktubnd: Bind usn 6 nax 1 nbx 0 lng 0 par 0
10813 17 36 ktubnd ktubnd: Txn Bound xid: 6.15.1290
10704 17 36 ksqgtlctx ksqgtl: acquire TX-0006000f-0000050a mode=X flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=S flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqrcli ksqrcl: release TM-0001563e-00000000 mode=S
10704 17 36 ksqrcli ksqrcl: SUCCESS
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 5eea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2200000000000000
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 5eea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2200000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 5eea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2200000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 5eea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2200000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 5eea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2200000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3301000100000000 0000000000000000 5eea140000000000
10812 17 36 ktrgcm 0000000000000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 0000000000000000 0000000000000000 0000000000000000
10811 17 36 ktbgcl1 3301000100000000 0000000000000000 5eea140000000000 0100000000000000
10811 17 36 ktbgcl1 3301000100000000 0000000000000000 5eea140000000000 e00fc76a177f0000
10812 17 36 kturCRBackoutOneChg 0100000000000000 bc06c00000000000 0e01000000000000 2200000000000000
10812 17 36 ktrgcm 3301000100000000 0100000000000000 0100000000000000
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=S flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqrcli ksqrcl: release TM-0001563e-00000000 mode=S
10704 17 36 ksqrcli ksqrcl: SUCCESS
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 5fea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2400000000000000
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 5fea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2400000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 5fea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2400000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 5fea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2400000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 5fea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2400000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 0000000000000000
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=236 seq_num=243 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=236 seq_num=243 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=4, exc=4, tot=4
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=237 seq_num=244 snap_id=1

可以发现除了语句执行时需要获取一次S锁之外,删多少行就要获取多少次S锁,从之前的锁兼容列表就可发现S锁和SX(RX)锁是不兼容的,而SX(RX)是insert update delete获取的锁模式,可以想象如果此时子表上有事务,或者S锁获得了尚未释放的时候,子表要进行事务获取mode为3的SX(RX)锁时,session都会产生等待

看一下此时session获取的锁,记住这次结果,后面会有对比。

1
2
3
4
5
6
SQL> select *  from v$lock where type in('TM','TX');

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
00000000ACE37CD0 00000000ACE37D48 17 TX 393231 1290 6 0 195 0
00007FD97FFE6520 00007FD97FFE6580 17 TM 87612 0 3 0 195 0

可见语句执行完,已不持有子表上的任何锁

下面来模拟一下等待,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
sid:31
SQL> insert into child values(2,'12312');

1 row created.

SQL> select distinct sid from v$mystat;

SID
----------
31

sid:1169

SQL> update prim set a=1 where a=1;

SQL> select * from v$lock where type in('TM','TX');

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
00007FD97FFE54E8 00007FD97FFE5548 31 TM 87614 0 3 0 143 1
00000000ABE6BE80 00000000ABE6BEF8 31 TX 327713 1114 6 0 143 0
00007FD97FFE54E8 00007FD97FFE5548 1169 TM 87612 0 3 0 76 0
00007FD97FFE54E8 00007FD97FFE5548 31 TM 87612 0 3 0 143 0
00007FD97FFE54E8 00007FD97FFE5548 1169 TM 87614 0 0 4 76 0

此时查一下等待链

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
SQL>  --锁源头查找,带对象和sql以及event
SQL> WITH sessions AS
2 (SELECT /*+materialize*/
3 sid,
4 blocking_session,
5 blocking_instance,
6 row_wait_obj#,
7 sql_id,
8 inst_id,
9 event
10 FROM gv$session)
11 SELECT LPAD(' ', 4 * (level - 1)) || s.inst_id || '.' || sid sid,
12 object_name,
13 substr(sql_text, 1, 40) sql_text,
14 event
15 FROM sessions s
16 LEFT OUTER JOIN dba_objects d
17 ON (object_id = row_wait_obj#)
18 LEFT OUTER JOIN gv$sql q
19 ON (s.sql_id = q.SQL_ID and s.inst_id = q.INST_ID)
20 WHERE sid IN (SELECT blocking_session FROM sessions)
21 OR blocking_session IS NOT NULL
22 CONNECT BY PRIOR sid = blocking_session
23 START WITH blocking_session IS NULL;

SID OBJECT_NAM SQL_TEXT EVENT
---------- ---------- ---------------------------------------- ------------------------------
1.31 SQL*Net message from client
1.1169 CHILD update prim set a=1 where a=1 enq: TM - contention

从上面的分析我们知道,无论插入父表和子表,都会获取两张表上的mode3的锁,而mode3的锁和mode4的锁是不兼容的,也就是说此时父表上连插入都无法进行

再开第三个session

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
sid:1167
SQL> insert into prim values(7,'dasd'); --hang住了

SQL> select * from v$lock where type in('TM','TX');

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ------- -- ---------- ---------- ---------- ---------- ---------- ----------
00007FD97D39F2A8 00007FD97D39F308 31 TM 87614 0 3 0 476 1
00007FD97D39F2A8 00007FD97D39F308 1167 TM 87614 0 0 3 85 0
00000000ABE6BE80 00000000ABE6BEF8 31 TX 327713 1114 6 0 476 0
00007FD97D39F2A8 00007FD97D39F308 1169 TM 87612 0 3 0 409 0
00007FD97D39F2A8 00007FD97D39F308 31 TM 87612 0 3 0 476 0
00007FD97D39F2A8 00007FD97D39F308 1169 TM 87614 0 0 4 409 0
00007FD97D39F2A8 00007FD97D39F308 1167 TM 87612 0 3 0 85 0

SID OBJECT_NAME SQL_TEXT EVENT
-------------------- ------------------------------ ---------------------------------------- ----------------------------------------
1.31 SQL*Net message from client
1.1169 CHILD update prim set a=1 where a=1 enq: TM - contention
1.1167 CHILD insert into prim values(7,'dasd') enq: TM - contention

可见1167的session被阻塞了。

无索引,有cascade

1
2
3
4
5
6
7
 SQL> alter table child drop constraint FK_CHILD_CA;

Table altered.

SQL> alter table child add constraint FK_CHILD_CA foreign key (ca) references prim(a) on delete cascade;

Table altered.

有cascade的时候,仅在delete语句上有所区别,下面仅列出delete语句

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
SQL> delete from prim where a=2 or a=4;

2 rows deleted.

10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563c-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=SSX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqcnv ksqcnv: convert TM-0001563e-00000000 from=SSX to=SX flags=
10704 17 36 ksqcnv ksqcnv: SUCCESS
10813 17 36 ktubnd ktubnd: Bind usn 4 nax 1 nbx 0 lng 0 par 0
10813 17 36 ktubnd ktubnd: Txn Bound xid: 4.20.852
10704 17 36 ksqgtlctx ksqgtl: acquire TX-00040014-00000354 mode=X flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqcnv ksqcnv: convert TM-0001563e-00000000 from=SX to=SSX flags=
10704 17 36 ksqcnv ksqcnv: SUCCESS
10704 17 36 ksqcnv ksqcnv: convert TM-0001563e-00000000 from=SSX to=SX flags=
10704 17 36 ksqcnv ksqcnv: SUCCESS
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 bef2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0a00000000000000
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 bef2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0a00000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 bef2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0a00000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 bef2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0a00000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 bef2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0a00000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3301000100000000 0000000000000000 bef2140000000000
10812 17 36 ktrgcm 0000000000000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 0000000000000000 0000000000000000 0000000000000000
10811 17 36 ktbgcl1 3301000100000000 0000000000000000 bef2140000000000 0100000000000000
10811 17 36 ktbgcl1 3301000100000000 0000000000000000 bff2140000000000 e00fc76a177f0000
10812 17 36 kturCRBackoutOneChg 0100000000000000 ef00c00000000000 2801000000000000 0a00000000000000
10812 17 36 ktrgcm 3301000100000000 0100000000000000 0100000000000000
10704 17 36 ksqcnv ksqcnv: convert TM-0001563e-00000000 from=SX to=SSX flags=
10704 17 36 ksqcnv ksqcnv: SUCCESS
10704 17 36 ksqcnv ksqcnv: convert TM-0001563e-00000000 from=SSX to=SX flags=
10704 17 36 ksqcnv ksqcnv: SUCCESS
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 c0f2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0e00000000000000
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 c0f2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0e00000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 c0f2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0e00000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 c0f2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0e00000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 c0f2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0e00000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 0000000000000000
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=425 seq_num=432 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=425 seq_num=432 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=4, exc=4, tot=4
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=426 seq_num=433 snap_id=1

此时会申请一个mode为5的SSX锁,随后即转换为mode为3的SX锁,这也是在语句执行期间获取和转换的,并非事务期间,同样删除多少行就涉及到多少次获取转换,看一下此时锁获得情况

1
2
3
4
5
6
7
SQL> select *  from v$lock where type in('TM','TX');

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ------- -- ---------- ---------- ---------- ---------- ---------- ----------
00007FD97FFE6520 00007FD97FFE6580 17 TM 87614 0 3 0 131 0
00000000ACE37CD0 00000000ACE37D48 17 TX 262164 852 6 0 131 0
00007FD97FFE6520 00007FD97FFE6580 17 TM 87612 0 3 0 131 0

是不是和没有cascade的时候不同了,这次最终会持有子表上的mode为3的锁,我们再深入的思考一点,SSX锁和SX锁是不兼容的,这样是否就意味着后进行的delete会被先进行的delete阻塞(不同session),好,现在就来模拟一下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

sid:1169

SQL> delete from prim where a=1;

0 rows deleted.

SQL> select * from v$lock where type in('TM','TX');

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ------- -- ---------- ---------- ---------- ---------- ---------- ----------
00007FD97FFE54E8 00007FD97FFE5548 1169 TM 87612 0 3 0 53 0
00007FD97FFE54E8 00007FD97FFE5548 1169 TM 87614 0 3 0 53 0

sid:1167

SQL> delete from prim where a=2; --session hang住了

--查询此刻锁的持有情况
SQL> select * from v$lock where type in('TM','TX');

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ------- -- ---------- ---------- ---------- ---------- ---------- ----------
00007FD97FFE6520 00007FD97FFE6580 1169 TM 87612 0 3 0 85 0
00007FD97FFE6520 00007FD97FFE6580 1167 TM 87612 0 3 0 11 0
00007FD97FFE6520 00007FD97FFE6580 1167 TM 87614 0 0 5 11 0
00007FD97FFE6520 00007FD97FFE6580 1169 TM 87614 0 3 0 85 1

可见1167在请求mode为5的锁,且已被阻塞

查看等待链

1
2
3
4
SID                  OBJECT_NAME                    SQL_TEXT                                 EVENT
-------------------- ------------------------------ ---------------------------------------- ----------------------------------------
1.1169 SQL*Net message from client
1.1167 CHILD delete from prim where a=2 enq: TM - contention

因为delete完毕会持有子表上的SX锁,而SX锁与S锁不兼容,所以delete父表的session也会阻塞update父表的session,因为update会去请求子表的S锁,而此时子表上有SX锁,类似于子表上有事务在进行,这里就不在论述了,徒占篇幅。

有索引,无cascade

我看到有资料说,如果有索引时,对父表的操作,会级联加一个TM RS锁(level 2)到子表上。但我在试验中并未看到,也许是版本差异,我也未去求证,有索引时insert与无索引时在获取锁方面没有区别,这里仅列出update和delete

创建索引:

1
2
3
4
5
6
7
8
9
10
11
SQL> alter table child drop constraint FK_CHILD_CA;

Table altered.

SQL> alter table child add constraint FK_CHILD_CA foreign key (ca) references prim(a);

Table altered.

SQL> create index ind_child_ca on child(ca);

Index created.

update父表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SQL> update prim set a=6 where a=6;

1 row updated.

10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563c-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10811 17 36 ktbgcl1 2b01000100000000 0000000000000000 fadc140000000000 0200000000000000
10811 17 36 ktbgcl1 2b01000100000000 0000000000000000 cdf9140000000000 10dfb76a177f0000
10813 17 36 ktubnd ktubnd: Bind usn 3 nax 1 nbx 0 lng 0 par 0
10813 17 36 ktubnd ktubnd: Txn Bound xid: 3.28.1117
10704 17 36 ksqgtlctx ksqgtl: acquire TX-0003001c-0000045d mode=X flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=471 seq_num=478 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=471 seq_num=478 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=3, exc=3, tot=3
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=472 seq_num=479 snap_id=1

可见有了索引之后,不再需要在语句级别获取子表上的S锁了

delete父表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SQL> delete from prim where a=9;

1 row deleted.

10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563c-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10811 17 36 ktbgcl1 2c01000100000000 0000000000000000 66f1140000000000 0200000000000000
10811 17 36 ktbgcl1 2c01000100000000 0000000000000000 58fa140000000000 78efb76a177f0000
10813 17 36 ktubnd ktubnd: Bind usn 4 nax 1 nbx 0 lng 0 par 0
10813 17 36 ktubnd ktubnd: Txn Bound xid: 4.18.854
10704 17 36 ksqgtlctx ksqgtl: acquire TX-00040012-00000356 mode=X flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10811 17 36 ktbgcl1 3301000100000000 0000000000000000 c0f2140000000000 0200000000000000
10811 17 36 ktbgcl1 3301000100000000 0000000000000000 58fa140000000000 90eec66a177f0000
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=476 seq_num=483 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=476 seq_num=483 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=4, exc=4, tot=4
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=477 seq_num=484 snap_id=1

与update相同,都持有了子表上的SX锁,而SX与SX是相容的,所以不会再产生锁定问题

1
2
3
4
5
6
7
SQL> select *  from v$lock where type in('TM','TX');

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ------- -- ---------- ---------- ---------- ---------- ---------- ----------
00007FD97FFE54E8 00007FD97FFE5548 17 TM 87614 0 3 0 182 0
00000000ABD4E7C0 00000000ABD4E838 17 TX 262162 854 6 0 182 0
00007FD97FFE54E8 00007FD97FFE5548 17 TM 87612 0 3 0 182 0

有索引,有cascade

表现与无cascade时相同

总结

  • 外键无索引锁无cascade时,update/delete父表,会在语句级别级联一个mode为4的S锁到子表,其中delete多少行就会级联多少次
  • 外键无索引有cascade时,update父表仍会在语句级别级联mode为4的S锁到子表,delete时会先获取mode为5的SSX锁,在将其转换成mode为3的SX锁,而且删除多少行就会涉及到多少次转换
  • 外键有索引无cascade时,update/delete不会在语句级级联锁到子表,最终会持有父表和子表上的mode为3的SX锁(无索引时只有有cascade的delete时最终会持有子表上的SX锁)
  • 外键有索引有cascade时,与无cascade表现相同

近日,开发负责人反映某生产环境业务处理缓慢,主要业务操作就是修改会员信息,登录查询后发现大量的session正在等待enq: TM - contention,且waiting的语句几乎都是update
session的即时信息没有保留,现在附上ash视图的一些统计信息,可以大概了解一下当时争用的场景

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
SQL> @ash_wait_chains.sql username||':'||program2||event2 session_type='FOREGROUND' sysdate-6/24 sysdate-5/24

-- Display ASH Wait Chain Signatures script v0.2 BETA by Tanel Poder ( http://blog.tanelpoder.com )

%This SECONDS AAS
------ ---------- ----------
WAIT_CHAIN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
72% 20073 5.6
-> JSCHPROD:(JDBC Thin Client) ON CPU

8% 2293 .6
-> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) ON CPU

8% 2141 .6
-> JSCHPROD:(JDBC Thin Client) log file sync -> SYS:(LGWR) log file parallel write

7% 1879 .5
-> JSCHPROD:(JDBC Thin Client) log file sync -> SYS:(LGWR) LGWR-LNS wait on channel

2% 654 .2
-> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) ON CPU

1% 288 .1
-> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) ON CPU

1% 149 0
-> JSCHPROD:(JDBC Thin Client) log file sync -> SYS:(LGWR) ON CPU

0% 128 0
-> JSCHPROD:(JDBC Thin Client) enq: TM - contention

0% 112 0
-> JSCHPROD:(JDBC Thin Client) log file sync

0% 86 0
-> JSCHPROD:(JDBC Thin Client) db file scattered read

0% 43 0
-> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) ON CPU

0% 37 0
-> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention

0% 25 0
-> JSCHPROD:(JDBC Thin Client) log file sync -> SYS:(LGWR) LGWR wait on LNS

0% 13 0
-> JSCHPROD:(plsqldev.exe) ON CPU

0% 11 0
-> SYS:(plsqldev.exe) ON CPU

0% 10 0
-> JSCHPROD:(JDBC Thin Client) SQL*Net more data from client

0% 9 0
-> JSCHPROD:(JDBC Thin Client) db file sequential read

0% 9 0
-> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) ON
CPU

0% 6 0
-> JSCHPROD:(JDBC Thin Client) read by other session -> JSCHPROD:(JDBC Thin Client) ON CPU

0% 4 0
-> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention

0% 3 0
-> JSCHPROD:(JDBC Thin Client) SQL*Net more data to client

0% 3 0
-> JSCHPROD:(JDBC Thin Client) buffer busy waits [data block]

0% 3 0
-> SYS:(oraagent.bin) Disk file operations I/O

0% 3 0
-> JSCHPROD:(JDBC Thin Client) log file sync -> SYS:(LGWR) LGWR wait for redo copy

0% 2 0
-> JSCHPROD:(JDBC Thin Client) enq: TX - row lock contention

0% 2 0
-> JSCHPROD:(JDBC Thin Client) log file sync -> SYS:(LGWR) LGWR wait for redo copy -> JSCHPROD:(JDBC Thin Client) ON CPU

0% 2 0
-> JSCHPROD:(JDBC Thin Client) enq: TX - index contention -> JSCHPROD:(JDBC Thin Client) ON CPU

0% 1 0
-> JSCHPROD:(plsqldev.exe) log file sync -> SYS:(LGWR) log file parallel write

0% 1 0
-> SYS:(plsqldev.exe) Disk file operations I/O

0% 1 0
-> JSCHPROD:(JDBC Thin Client) enq: TX - row lock contention -> JSCHPROD:(JDBC Thin Client) ON CPU


30 rows selected.

可以看到,TM锁的争用很多,再看一份当时awr报告的top10

虽然占DBTIME不多,但本来是很快的操作,短时间内给人的感觉就是业务处理缓慢
查一下当时等待事件的p1,p2,p3的值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
select ash.SAMPLE_TIME,
ash.EVENT,
ash.SESSION_ID,
ash.BLOCKING_SESSION,
ash.P1TEXT,
ash.P1,
ash.P2TEXT,
ash.p2,
ash.p3text,
ash.p3,
ash.SESSION_STATE,
ash.SQL_OPNAME,
ash.SQL_ID
--ash.*
from v$active_session_history ash
where ash.SAMPLE_TIME >
to_date('20160425 10:00:00', 'yyyymmdd HH24:MI:SS')
and ash.SAMPLE_TIME <
to_date('20160425 12:10:00', 'yyyymmdd HH24:MI:SS')
and ash.WAIT_CLASS <> 'Idle'
and ash.EVENT like 'enq: TM - contention'
order by sample_time desc;

下面是部分结果

1
2
3
4
5
6
7
8
9
10
11
12
13
enq: TM - contention	1828	2401	name|mode	1414332421	object #	110417	table/partition	0	WAITING	UPDATE	ak25v8q8p6fzd
enq: TM - contention 1873 2504 name|mode 1414332419 object # 110415 table/partition 0 WAITING INSERT 7w0tma5up32wt
enq: TM - contention 2504 1828 name|mode 1414332421 object # 110415 table/partition 0 WAITING UPDATE ak25v8q8p6fzd
enq: TM - contention 772 4 name|mode 1414332421 object # 110417 table/partition 0 WAITING UPDATE ak25v8q8p6fzd
enq: TM - contention 1781 1828 name|mode 1414332419 object # 110415 table/partition 0 WAITING INSERT 7w0tma5up32wt
enq: TM - contention 1828 772 name|mode 1414332421 object # 110415 table/partition 0 WAITING UPDATE ak25v8q8p6fzd
enq: TM - contention 2401 1828 name|mode 1414332419 object # 110415 table/partition 0 WAITING INSERT 7w0tma5up32wt
enq: TM - contention 2504 772 name|mode 1414332421 object # 110415 table/partition 0 WAITING UPDATE ak25v8q8p6fzd
enq: TM - contention 4 772 name|mode 1414332419 object # 110415 table/partition 0 WAITING INSERT 7w0tma5up32wt
enq: TM - contention 148 1781 name|mode 1414332420 object # 110428 table/partition 0 WAITING UPDATE 9gd6xhd0xyhph
enq: TM - contention 772 148 name|mode 1414332421 object # 110415 table/partition 0 WAITING UPDATE ak25v8q8p6fzd
enq: TM - contention 1828 148 name|mode 1414332421 object # 110415 table/partition 0 WAITING UPDATE ak25v8q8p6fzd
enq: TM - contention 1873 772 name|mode 1414332419 object # 110415 table/partition 0 WAITING INSERT 7w0tma5up32wt

可以看到p2的值为产生TM争用的对象id,经过查证,这些object均是session正在更新的表的子表,而且通过v$sql查看update语句均更改了主表的主键,问题到这里已经很明朗了,由于外键没加索引,导致了主表在更新主表主键或删除主表记录时对子表的锁定,而且这张主表被大量的子表引用,此时子表上也同时进行事务处理,所以造成了更新主表的session 不时hang住。

通过对所有子表的外键加索引,消除了争用,检测未加索引的外键语句:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
SELECT TABLE_NAME,
CONSTRAINT_NAME,
CNAME1 || NVL2(CNAME2, ',' || CNAME2, NULL) ||
NVL2(CNAME3, ',' || CNAME3, NULL) ||
NVL2(CNAME4, ',' || CNAME4, NULL) ||
NVL2(CNAME5, ',' || CNAME5, NULL) ||
NVL2(CNAME6, ',' || CNAME6, NULL) ||
NVL2(CNAME7, ',' || CNAME7, NULL) ||
NVL2(CNAME8, ',' || CNAME8, NULL) COLUMNS
FROM (SELECT B.TABLE_NAME,
B.CONSTRAINT_NAME,
MAX(DECODE(POSITION, 1, COLUMN_NAME, NULL)) CNAME1,
MAX(DECODE(POSITION, 2, COLUMN_NAME, NULL)) CNAME2,
MAX(DECODE(POSITION, 3, COLUMN_NAME, NULL)) CNAME3,
MAX(DECODE(POSITION, 4, COLUMN_NAME, NULL)) CNAME4,
MAX(DECODE(POSITION, 5, COLUMN_NAME, NULL)) CNAME5,
MAX(DECODE(POSITION, 6, COLUMN_NAME, NULL)) CNAME6,
MAX(DECODE(POSITION, 7, COLUMN_NAME, NULL)) CNAME7,
MAX(DECODE(POSITION, 8, COLUMN_NAME, NULL)) CNAME8,
COUNT(*) COL_CNT
FROM (SELECT SUBSTR(TABLE_NAME, 1, 30) TABLE_NAME,
SUBSTR(CONSTRAINT_NAME, 1, 30) CONSTRAINT_NAME,
SUBSTR(COLUMN_NAME, 1, 30) COLUMN_NAME,
POSITION
FROM USER_CONS_COLUMNS) A,
USER_CONSTRAINTS B
WHERE A.CONSTRAINT_NAME = B.CONSTRAINT_NAME
AND B.CONSTRAINT_TYPE = 'R'
GROUP BY B.TABLE_NAME, B.CONSTRAINT_NAME) CONS
WHERE COL_CNT > ALL
(SELECT COUNT(*)
FROM USER_IND_COLUMNS I
WHERE I.TABLE_NAME = CONS.TABLE_NAME
AND I.COLUMN_NAME IN (CNAME1, CNAME2, CNAME3, CNAME4, CNAME5,
CNAME6, CNAME7, CNAME8)
AND I.COLUMN_POSITION <= CONS.COL_CNT
GROUP BY I.INDEX_NAME);

这是摘自TOM大师的语句,外键不加索引也是导致死锁的常见原因之一,因此对于主表经常进行更新删除操作的情况,外键一定要加索引。
至于外键未加索引是如何导致锁定的,以及为何加了索引后争用就消失了? 敬请关注enq: TM - contention解决之道——外键无索引导致锁争用 (下)

一次掉电后,DG主库启动alert中报如下错误

1
2
3
4
5
6
7
8
9
Errors in file /home/oracle/app/oracle/diag/rdbms/min/min/trace/min_m002_4099.trc  (incident=16328):
ORA-00600: internal error code, arguments: [13011], [6443], [8463760], [14], [8488694], [0], [], [], [], [], [], []
Incident details in: /home/oracle/app/oracle/diag/rdbms/min/min/incident/incdir_16328/min_m002_4099_i16328.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /home/oracle/app/oracle/diag/rdbms/min/min/trace/min_m002_4099.trc (incident=16329):
ORA-00600: internal error code, arguments: [kewrose_1], [600], [ORA-00600: internal error code, arguments: [13011], [6443], [8463760], [14], [8488694], [0], [], [], [], [], [], []
], [], [], [], [], [], [], [], [], []
Incident details in: /home/oracle/app/oracle/diag/rdbms/min/min/incident/incdir_16329/min_m002_4099_i16329.trc

查看trace文件,部分内容如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

*** 2016-07-20 18:21:04.185
*** SESSION ID:(169.43) 2016-07-20 18:21:04.185
*** CLIENT ID:() 2016-07-20 18:21:04.185
*** SERVICE NAME:(SYS$BACKGROUND) 2016-07-20 18:21:04.185
*** MODULE NAME:(MMON_SLAVE) 2016-07-20 18:21:04.185
*** ACTION NAME:(Auto-Purge Slave Action) 2016-07-20 18:21:04.185

Dump continued from file: /home/oracle/app/oracle/diag/rdbms/min/min/trace/min_m000_2615.trc
ORA-00600: internal error code, arguments: [13011], [6665], [8395075], [14], [8462787], [0], [], [], [], [], [], []

========= Dump for incident 14804 (ORA 600 [13011]) ========

*** 2016-07-20 18:21:04.186
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=c8taax4bfzsjc) -----
delete from WRH$_RSRC_PLAN tab where (:beg_snap <= tab.snap_id and tab.snap_id <= :end_snap and dbid = :dbid) and not exists (select 1 from WRM$_BASELINE b wh
ere (tab.dbid = b.dbid) and (tab.snap_id >= b.start_snap_id) and (tab.snap_id <= b.end_snap_id))

MOS中关于ORA-600 [13013]描述

1
2
3
4
5
6
7
Format: ORA-600 [13013] [a] [b] {c} [d] [e] [f]
Arg [a] Passcount
Arg [b] Data Object number
Arg {c} Tablespace Decimal Relative DBA (RDBA) of block containing the row to be updated
Arg [d] Row Slot number
Arg [e] Decimal RDBA of block being updated (Typically same as {c})
Arg [f] Code

确定object

1
2
3
4
5
SQL> select owner,object_name,object_type  from dba_objects where object_id=6665;

OWNER OBJECT_NAME OBJECT_TYPE
------------------------------ ------------------------------------------------------------ -------------------
SYS WRH$_RSRC_PLAN TABLE

可见与trc文件中的sql中对象名吻合,都是WRH$_RSRC_PLAN
查看trc文件中的PLAN TABLE

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
----- Plan Table -----

============
Plan Table
============
-----------------------------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------+-----------------------------------+
| 0 | DELETE STATEMENT | | | | 626 | |
| 1 | DELETE | WRH$_RSRC_PLAN | | | | |
| 2 | FILTER | | | | | |
| 3 | INDEX RANGE SCAN | WRH$_RSRC_PLAN_PK| 94 | 1598 | 8 | 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID | WRM$_BASELINE | 1 | 33 | 2 | 00:00:01 |
| 5 | INDEX RANGE SCAN | WRM$_BASELINE_PK | 1 | | 1 | 00:00:01 |
-----------------------------------------------------------+-----------------------------------+

1.The most common cause of this error is an index corruption. The first step is to check the indexes for corruption, i.e. run

1
2
3
SQL> ANALYZE INDEX WRH$_RSRC_PLAN_PK VALIDATE STRUCTURE;

Index analyzed.

2.If the indexes do not report corruption, further test for corruption the base tables referenced in the execution plan or the statement producing the error:

1
2
3
4
5
SQL> ANALYZE TABLE WRH$_RSRC_PLAN VALIDATE STRUCTURE CASCADE;
ANALYZE TABLE WRH$_RSRC_PLAN VALIDATE STRUCTURE CASCADE
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file

查看生成的trace文件内容

1
2
3
4
5
6
7
8
9
10
*** 2016-07-21 01:31:31.840
*** SESSION ID:(52.5) 2016-07-21 01:31:31.840
*** CLIENT ID:() 2016-07-21 01:31:31.840
*** SERVICE NAME:(SYS$USERS) 2016-07-21 01:31:31.840
*** MODULE NAME:(sqlplus@primary (TNS V1-V3)) 2016-07-21 01:31:31.840
*** ACTION NAME:() 2016-07-21 01:31:31.840

Table/Index row count mismatch
table 2901 : index 3203, 296
Index root = tsn: 1 rdba: 0x0080194a

对于文件内容,简单介绍如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
trace文件中包含:

<description>: tsn: <tablespace number> rdba: <relative dba>

description有以下值:

"row not found in index"
"Table/Index row count mismatch"
"row mismatch in index dba"
"Table row count/Bitmap index bit count mismatch"
"kdavls: kdcchk returns %d when checking cluster dba 0x%08lx objn %d\n"

tsn: Tablespace Number表示的是索引存储的表空间编号。
rdba: 是索引段头相对于数据块的存储地址。

运行以下sql查询索引的文件和块号

1
2
3
4
5
6
7
8
9
SELECT dbms_utility.data_block_address_file(
to_number(trim(leading '0' from
replace('&&rdba','0x','')),'XXXXXXXX')
) AS rfile#,
dbms_utility.data_block_address_block(
to_number(trim(leading '0' from
replace('&&rdba','0x','')),'XXXXXXXX')
) AS block#
FROM dual;

结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SQL> SELECT dbms_utility.data_block_address_file(
2 to_number(trim(leading '0' from
3 replace('&&rdba','0x','')),'XXXXXXXX')
4 ) AS rfile#,
5 dbms_utility.data_block_address_block(
6 to_number(trim(leading '0' from
7 replace('&&rdba','0x','')),'XXXXXXXX')
8 ) AS block#
9 FROM dual;
Enter value for rdba: 0x0080194a
old 3: replace('&&rdba','0x','')),'XXXXXXXX')
new 3: replace('0x0080194a','0x','')),'XXXXXXXX')
old 7: replace('&&rdba','0x','')),'XXXXXXXX')
new 7: replace('0x0080194a','0x','')),'XXXXXXXX')

RFILE# BLOCK#
---------- ----------
2 6474

接下来运行如下查询,定位具体的segment

1
2
3
4
select owner, segment_name, segment_type
from dba_segments
where header_file = <rfile#>
and header_block = <block#>

结果如下:

1
2
3
4
5
6
7
SQL> select owner, segment_name, segment_type
2 from dba_segments
3 where header_file = 2
4 and header_block = 6474;
OWNER SEGMENT_NAME SEGMENT_TYPE
------------------------------ --------------------------------------------------------------------------------- ------------------
SYS WRH$_RSRC_PLAN_PK INDEX

看来仍然是上面分析过的索引,这里是因为表和索引的数据不一致,那么重建索引

1
2
3
4
alter table WRH$_RSRC_PLAN drop constraint WRH$_RSRC_PLAN_PK cascade;

alter table WRH$_RSRC_PLAN
add constraint WRH$_RSRC_PLAN_PK primary key (DBID, SNAP_ID, INSTANCE_NUMBER, SEQUENCE#);

再次分析表,发现错误消失了

1
2
3
SQL> ANALYZE TABLE WRH$_RSRC_PLAN VALIDATE STRUCTURE CASCADE;

Table analyzed.

至此,问题解决

掉电仍是数据库的大敌,而此次并非只在这一张表上有讹误,还有其他一些WRH$表,这些表均属于sysaux表空间,这些表都是awr系列表,写入频繁,掉电容易引起数据的不一致性,但是,正因为是awr数据,并不是特别重要,修复起来也相对容易。

记录一下AWR Performance Tables介绍

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
The Oracle10g dynamic performance tables constitute the foundation of sophisticated automations such as Automatic Memory Management  (AMM ) as well as intelligent advisory tools such as ADDM and the SQL Tuning   Advisor.

Remember, the AWR is a core feature of the 10g database kernel and automatically collects and stores important run-time performance information for our historical analysis.

The tables that store this information are prefixed with wrh$ and are very similar in function to the STATSPACK tables. This could make STATSPACK appear somewhat obsolete, although it is still available in the $ORACLE_HOME/rdbms/admin directory.

Unlike the more cumbersome STATSPACK utility, which requires knowledge of the table structure and creation of complex query scripts, the 10g Enterprise Manager (OEM) automatically displays and interprets this valuable time-series performance data.

The wrh$ AWR tables store important historical statistical information about the database in the form of periodic snapshots. Each snapshot is a capture of the in–memory x$ fixed view and other control structures at a certain point in time. Each of the AWR table names is prefixed with wrm$ (Metadata tables), wrh$ (History tables), or wri$ (Advisory tables).

1. The wrm$ tables store metadata information for the Workload Repository.

2. The wrh$ tables store historical data or snapshots.

3. The wri$ tables: These 49 tables store data related to advisory functions.

Once you start using the Broker you must always use the Broker to make any changes to your Data Guard configuration. This means that you must use Grid Control or the Broker CLI DGMGRL to change any Data Guard settings. If you use SQL*Plus to make configuration changes, the Broker will put things back the way it sees the world or this will lead to inconsistencies between the Broker configuration parameters and the database.

一旦开始使用Broker,就必须一直使用它修改DG的配置,这意味着必须使用grid control或者dgmgrl修改DG的配置,如果使用sqlplus修改配置,broker会将配置改回原样,或者导致broker配置参数与数据库之间不一致

此例记录修改保护模式,由最高可用降级为最大性能

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
DGMGRL> show configuration;
Configuration - minconf
Protection Mode: MaxAvailability
Databases:
minstd - Primary database
min - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
DGMGRL> EDIT DATABASE 'minstd' SET PROPERTY 'LogXptMode'='ASYNC';
Property "LogXptMode" updated
DGMGRL> EDIT DATABASE 'min' SET PROPERTY 'LogXptMode'='ASYNC';
Error: ORA-16805: change of LogXptMode property violates overall protection mode
Failed.
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxPerformance;
Succeeded.
DGMGRL> show configuration;
Configuration - minconf
Protection Mode: MaxPerformance
Databases:
minstd - Primary database
min - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
`

使用broker修改会使LOG_ARCHIVE_DEST_2参数变得较为复杂,例如

1
service="minstd", LGWR SYNC AFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300 db_unique_name="minstd" net_timeout=30, valid_for=(all_logfiles,primary_role)

而且修改后在sqlplus中show parameter LOG_ARCHIVE_DEST_2仅有主库显示更改之后的状态,而备库还保留原样,但是在show database verbose中是修改好的,仅当在发生切换时,备库变为主库时才会更正参数,可见最好使用broker来管理配置

给某一个客户安装ADG完毕,配置Broker方便日后的管理与切换,碰到些许问题,以作记录

创建完configuration之后,enable的过程很慢,而且状态出现error

主库查询:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
DGMGRL> SHOW CONFIGURATION VERBOSE;
Configuration - anxinconf
Protection Mode: MaxAvailability
Databases:
anxin - Primary database
anxinstd - Physical standby database
Error: ORA-16664: unable to receive the result from a database
Properties:
FastStartFailoverThreshold = '30'
OperationTimeout = '30'
FastStartFailoverLagLimit = '30'
CommunicationTimeout = '180'
ObserverReconnect = '0'
FastStartFailoverAutoReinstate = 'TRUE'
FastStartFailoverPmyShutdown = 'TRUE'
BystandersFollowRoleChange = 'ALL'
ObserverOverride = 'FALSE'
ExternalDestination1 = ''
ExternalDestination2 = ''
PrimaryLostWriteAction = 'CONTINUE'
Fast-Start Failover: DISABLED
Configuration Status:
ERROR

备库查询:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
DGMGRL> SHOW CONFIGURATION VERBOSE;
Configuration - anxinconf
Protection Mode: MaxAvailability
Databases:
anxin - Primary database
anxinstd - Physical standby database
Warning: ORA-16792: configurable property value is inconsistent with database setting
Properties:
FastStartFailoverThreshold = '30'
OperationTimeout = '30'
FastStartFailoverLagLimit = '30'
CommunicationTimeout = '180'
ObserverReconnect = '0'
FastStartFailoverAutoReinstate = 'TRUE'
FastStartFailoverPmyShutdown = 'TRUE'
BystandersFollowRoleChange = 'ALL'
ObserverOverride = 'FALSE'
ExternalDestination1 = ''
ExternalDestination2 = ''
PrimaryLostWriteAction = 'CONTINUE'
Fast-Start Failover: DISABLED
Configuration Status:
WARNING

查看备数据库的状态

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
DGMGRL> show database verbose anxinstd;
Database - anxinstd
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 1 second ago)
Apply Lag: 0 seconds (computed 1 second ago)
Apply Rate: 54.00 KByte/s
Real Time Query: ON
Instance(s):
anxinstd
Warning: ORA-16714: the value of property ArchiveLagTarget is inconsistent with the database setting
Warning: ORA-16714: the value of property LogArchiveMinSucceedDest is inconsistent with the database setting
Warning: ORA-16714: the value of property LogArchiveTrace is inconsistent with the database setting
Warning: ORA-16714: the value of property LogArchiveFormat is inconsistent with the database setting

......

此时主备alert日志均有报错 Fatal NI connect error 12514, connecting to:

主库:

1
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.101)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=anxinstd_DGB)(CID=(PROGRAM=oracle)(HOST=db01)(USER=oracle))))

备库:

1
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.100)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=anxin_DGB)(CID=(PROGRAM=oracle)(HOST=db02)(USER=oracle))))

备库状态报告4个属性值与数据库设置不一致,重新设置

1
2
3
4
5
6
7
8
9
DGMGRL> edit database anxinstd set property LogArchiveFormat='%t_%s_%r.arc';
Property "logarchiveformat" updated
DGMGRL> edit database anxinstd set property ArchiveLagTarget=0;
Property "archivelagtarget" updated
DGMGRL> edit database anxinstd set property LogArchiveTrace=0;
Property "logarchivetrace" updated
DGMGRL> edit database anxinstd set property LogArchiveMinSucceedDest=1;
Property "logarchiveminsucceeddest" updated
DGMGRL> show database verbose anxinstd;

再次查看broker状态

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
DGMGRL> SHOW CONFIGURATION VERBOSE;
Configuration - anxinconf
Protection Mode: MaxAvailability
Databases:
anxin - Primary database
anxinstd - Physical standby database
Properties:
FastStartFailoverThreshold = '30'
OperationTimeout = '30'
FastStartFailoverLagLimit = '30'
CommunicationTimeout = '180'
ObserverReconnect = '0'
FastStartFailoverAutoReinstate = 'TRUE'
FastStartFailoverPmyShutdown = 'TRUE'
BystandersFollowRoleChange = 'ALL'
ObserverOverride = 'FALSE'
ExternalDestination1 = ''
ExternalDestination2 = ''
PrimaryLostWriteAction = 'CONTINUE'
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS

主备状态都正常了

接下来测试切换的时候又出现了问题

1
2
3
4
5
6
7
8
9
DGMGRL> SWITCHOVER TO anxinstd;
Performing switchover NOW, please wait...
Operation requires a connection to instance "anxinstd" on database "anxinstd"
Connecting to instance "anxinstd"...
Unable to connect to database
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Failed.
Warning: You are no longer connected to ORACLE.
connect to instance "anxinstd" of database "anxinstd"

官方文档中有如下描述:

1
2
3
4
5
After performing a switchover using DGMGRL, Data Guard requires a shutdown and startup of both the primary and standby databases. This issue can occur if any necessary entry is missing in the listener.ora file.
DGMGRL is unable to connect to the database after it has been stopped while performing the switchover

To enable DGMGRL to restart instances during the course of broker operations, a service with a specific name must be statically registered with the listener of each instance.
The value for the GLOBAL_DBNAME attribute must be set to a concatenation of db_unique_name_DGMGRL.db_domain in the LISTENER.ORA file.

看一下StaticConnectIdentifier的值

1
2
3
4
DGMGRL> show database anxin StaticConnectIdentifier;
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=db01)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=anxin_DGMGRL)(INSTANCE_NAME=anxin)(SERVER=DEDICATED)))'
DGMGRL> show database anxinstd StaticConnectIdentifier;
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=db02)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=anxinstd_DGMGRL)(INSTANCE_NAME=anxinstd)(SERVER=DEDICATED)))'

均去连接一个db_unique_name_DGMGRL.db_domain格式的服务,那么需要在监听里静态注册一个db_unique_name_DGMGRL.db_domain的服务

主:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
xin =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.100)(PORT = 1521))
)
)
SID_LIST_xin=
(SID_LIST=
(SID_DESC=
(GLOBAL_DBNAME=anxin)
(ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1)
(SID_NAME=anxin)
)
(SID_DESC=
(GLOBAL_DBNAME=anxin_DGMGRL)
(ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1)
(SID_NAME=anxin)
)
)
)
ADR_BASE_LISTENER = /u01/app/oracle

备:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
xin =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.101)(PORT = 1521))
)
)
SID_LIST_xin=
(SID_LIST=
(SID_DESC=
(GLOBAL_DBNAME=anxinstd)
(ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1)
(SID_NAME=anxinstd)
)
(SID_DESC=
(GLOBAL_DBNAME=anxinstd_DGMGRL)
(ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1)
(SID_NAME=anxinstd)
)
)
)
ADR_BASE_LISTENER = /u01/app/oracle

再次测试,问题依旧,莫非和之前alert里的 Fatal NI connect error 12514报错有关?错误信息里的链接描述符均去请求一个db_unique_name_DGB.db_domain的服务,那么再依样添加到静态注册里试试

1
2
3
4
5
6
7
8
9
10
11
12
DGMGRL> switchover to anxinstd;
Performing switchover NOW, please wait...
Operation requires a connection to instance "anxinstd" on database "anxinstd"
Connecting to instance "anxinstd"...
Connected.
New primary database "anxinstd" is opening...
Operation requires startup of instance "anxin" on database "anxin"
Starting instance "anxin"...
ORACLE instance started.
Database mounted.
Database opened.
Switchover succeeded, new primary is "anxinstd"

成功了,那么这个db_unique_name_DGB.db_domain的服务究竟是做什么用的呢?

{db_unique_name}_DGB.{db_domain}: This Service is used by the DMON-Processes to communicate between each other
DMON是一个用来管理broker的后台进程,这个进程负责与本地数据库以及远程数据库的DMON进程进行通讯(与远端数据库的DMON进程进行通讯的时候使用的是一个 动态注册 的service name “db_unique_name_DGB.db_domain”)

既然是动态注册,那缘何注册失败呢?
文档 ID 365314.1给出了答案:Database Will Not Register With Listener configured on IP instead of Hostname

将主备的{db_unique_name}_DGB.{db_domain}静态entry删掉,host采用hostname,重启监听测试,switchover成功。

由此可见监听配置里还是采用hostname为好,通过本次事件也解惑了萦绕我心头很久的问题,很多时候建库完毕,使用工具创建动态注册的监听,监听状态里会有很多XDB之类的服务,而我改成静态监听之后(每次都用IP)却没有了之前的自动注册的服务,可见这就是根本原因:Database Will Not Register With Listener configured on IP instead of Hostname

Question: I am getting the ORA-31693 error during a data pump export in RAC:

ORA-31693: Table data object "xxx" failed to load/unload and is being skipped due to error:
ORA-31617: unable to open dump file "/u01/expdp/dump_test.dmp" for write
ORA-19505: failed to identify file "/u01/expdp/dump_test.dmp"
ORA-27037: unable to obtain file status

What is the cause of the ORA-31693 error?

Answer: You can use the oerr utility to look-up the ORA-31693 error:

ORA-31617: unable to open dump file "xxx" for write

Cause: Export was unable to open the export file for writing. This message is usually followed by device messages from the operating system.

Action: Take appropriate action to restore the device.

The root cause of the ORA-31617 is likely in RAC is that this error occurs when parallel parameter is specified in using expdp in RAC databases and same physical directories does not exists in all nodes.

This is a bug in 11g and it is fixed in 11by setting the "cluster" parameter cluster=N :

做DG的切换测试,发现了一些有趣的小问题,寥作记录
v$archived_log中的fal字段意思如下:
Indicates whether the archive log was generated as the result of a FAL request (YES) or not (NO)

  1. 主库当时归档日志状态
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SQL> select DEST_ID,NAME ,SEQUENCE#,STANDBY_DEST,ARCHIVED,APPLIED,DELETED,STATUS,FAL,REGISTRAR,CREATOR from v$archived_log  order by 3;

DEST_ID NAME SEQUENCE# STA ARC APPLIED DEL S FAL REGISTR CREATOR
---------- -------------------------------------------------------------------------------- ---------- --- --- --------- --- - --- ------- -------
2 /u01/app/archive/1_74_903912515.arc 74 NO YES YES NO A YES RFS ARCH
2 /u01/app/archive/1_75_903912515.arc 75 NO YES YES NO A YES RFS ARCH
1 /u01/app/archive/1_76_903912515.arc 76 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_77_903912515.arc 77 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_78_903912515.arc 78 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_79_903912515.arc 79 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_80_903912515.arc 80 NO YES NO NO A NO ARCH ARCH
2 min 80 YES YES YES NO A YES ARCH ARCH
1 /u01/app/archive/1_81_903912515.arc 81 NO YES NO NO A NO ARCH ARCH
2 min 81 YES YES YES NO A YES ARCH ARCH
1 /u01/app/archive/1_82_903912515.arc 82 NO YES NO NO A NO ARCH ARCH
2 min 82 YES YES YES NO A NO LGWR LGWR
1 /u01/app/archive/1_83_903912515.arc 83 NO YES NO NO A NO ARCH ARCH
2 min 83 YES YES YES NO A NO LGWR LGWR
1 /u01/app/archive/1_84_903912515.arc 84 NO YES NO NO A NO ARCH ARCH
2 min 84 YES YES NO NO A NO LGWR LGWR
  1. 备库当时归档日志状态
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
SQL> select DEST_ID,NAME ,SEQUENCE#,STANDBY_DEST,ARCHIVED,APPLIED,DELETED,STATUS,FAL,REGISTRAR,CREATOR from v$archived_log  order by 3;

DEST_ID NAME SEQUENCE# STA ARC APPLIED DEL S FAL REGISTR CREATOR
---------- ---------------------------------------------------------------------------------------------------- ---------- --- --- --------- --- - --- ------- -------
1 /u01/app/archive/1_72_903912515.arc 72 NO YES YES NO A NO ARCH ARCH
1 /u01/app/archive/1_73_903912515.arc 73 NO YES YES NO A NO ARCH ARCH
1 /u01/app/archive/1_74_903912515.arc 74 NO YES YES NO A NO ARCH ARCH
2 minstd 74 YES YES YES NO A YES ARCH ARCH
1 /u01/app/archive/1_75_903912515.arc 75 NO YES YES NO A NO FGRD FGRD
2 minstd 75 YES YES YES NO A YES ARCH ARCH
1 /u01/app/archive/1_76_903912515.arc 76 NO YES YES NO A NO ARCH ARCH
2 minstd 76 YES YES YES NO A YES ARCH ARCH
1 /u01/app/archive/1_77_903912515.arc 77 NO YES YES NO A NO ARCH ARCH
2 minstd 77 YES YES YES NO A NO LGWR LGWR
2 minstd 78 YES YES NO NO A NO LGWR LGWR
1 /u01/app/archive/1_78_903912515.arc 78 NO YES YES NO A NO ARCH ARCH
1 /u01/app/archive/1_79_903912515.arc 79 NO YES YES NO A NO RFS FGRD
2 minstd 79 YES YES NO NO A NO FGRD FGRD
1 /u01/app/archive/1_80_903912515.arc 80 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_81_903912515.arc 81 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_82_903912515.arc 82 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_83_903912515.arc 83 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_84_903912515.arc 84 NO YES IN-MEMORY NO A NO RFS ARCH
  1. 此时进行切换
1
2
3
4
5
6
7
8
9
SQL> SELECT SWITCHOVER_STATUS FROM V$DATABASE;

SWITCHOVER_STATUS
--------------------
TO STANDBY

SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY;

Database altered.

主库的alert日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Tue Feb 16 14:55:15 2016
ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY
ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY [Process Id: 5441] (minstd)
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Waiting for all FAL entries to be archived...
All FAL entries have been archived.
Waiting for potential Physical Standby switchover target to become synchronized...
Active, synchronized Physical Standby switchover target has been identified
Switchover End-Of-Redo Log thread 1 sequence 85 has been fixed
Switchover: Primary highest seen SCN set to 0x0.0x101743
ARCH: Noswitch archival of thread 1, sequence 85
ARCH: End-Of-Redo Branch archival of thread 1 sequence 85
ARCH: LGWR is actively archiving destination LOG_ARCHIVE_DEST_2
ARCH: Standby redo logfile selected for thread 1 sequence 85 for destination LOG_ARCHIVE_DEST_2
Archived Log entry 17 added for thread 1 sequence 85 ID 0x97bd9e7c dest 1:
ARCH: Archiving is disabled due to current logfile archival
Primary will check for some target standby to have received alls redo
Final check for a synchronized target standby. Check will be made once.
LOG_ARCHIVE_DEST_2 is a potential Physical Standby switchover target
Active, synchronized target has been identified
Target has also received all redo

这时主库做了一次日志切换,加入EOR标记,并传输日志到所有备库,然后检查确认所有备库全部接受到所有的redo
这时查询备库

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
SQL> select DEST_ID,NAME ,SEQUENCE#,STANDBY_DEST,ARCHIVED,APPLIED,DELETED,STATUS,FAL,REGISTRAR,CREATOR from v$archived_log  order by 3;

DEST_ID NAME SEQUENCE# STA ARC APPLIED DEL S FAL REGISTR CREATOR
---------- ---------------------------------------------------------------------------------------------------- ---------- --- --- --------- --- - --- ------- -------
1 /u01/app/archive/1_72_903912515.arc 72 NO YES YES NO A NO ARCH ARCH
1 /u01/app/archive/1_73_903912515.arc 73 NO YES YES NO A NO ARCH ARCH
1 /u01/app/archive/1_74_903912515.arc 74 NO YES YES NO A NO ARCH ARCH
2 minstd 74 YES YES YES NO A YES ARCH ARCH
1 /u01/app/archive/1_75_903912515.arc 75 NO YES YES NO A NO FGRD FGRD
2 minstd 75 YES YES YES NO A YES ARCH ARCH
1 /u01/app/archive/1_76_903912515.arc 76 NO YES YES NO A NO ARCH ARCH
2 minstd 76 YES YES YES NO A YES ARCH ARCH
1 /u01/app/archive/1_77_903912515.arc 77 NO YES YES NO A NO ARCH ARCH
2 minstd 77 YES YES YES NO A NO LGWR LGWR
2 minstd 78 YES YES NO NO A NO LGWR LGWR
1 /u01/app/archive/1_78_903912515.arc 78 NO YES YES NO A NO ARCH ARCH
1 /u01/app/archive/1_79_903912515.arc 79 NO YES YES NO A NO RFS FGRD
2 minstd 79 YES YES NO NO A NO FGRD FGRD
1 /u01/app/archive/1_80_903912515.arc 80 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_81_903912515.arc 81 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_82_903912515.arc 82 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_83_903912515.arc 83 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_84_903912515.arc 84 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_85_903912515.arc 85 NO YES YES NO A NO RFS ARCH

85号日志确实已经收到并且应用,备库alert如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Tue Feb 16 14:52:22 2016
Archived Log entry 19 added for thread 1 sequence 84 ID 0x97bd9e7c dest 1:
Media Recovery Waiting for thread 1 sequence 85 (in transit)
Recovery of Online Redo Log: Thread 1 Group 4 Seq 85 Reading mem 0
Mem# 0: /u01/app/oradata/min/stdb_redo01.log
Tue Feb 16 14:55:17 2016
RFS[5]: Assigned to RFS process 7837
RFS[5]: Selected log 4 for thread 1 sequence 85 dbid -1749202365 branch 903912515
Tue Feb 16 14:55:17 2016
Archived Log entry 20 added for thread 1 sequence 85 ID 0x97bd9e7c dest 1:
Tue Feb 16 14:55:17 2016
RFS[2]: Possible network disconnect with primary database
Tue Feb 16 14:55:17 2016
RFS[6]: Assigned to RFS process 7782
RFS[6]: Possible network disconnect with primary database
Tue Feb 16 14:55:17 2016
RFS[1]: Possible network disconnect with primary database
Tue Feb 16 14:55:17 2016
RFS[7]: Assigned to RFS process 7835
RFS[7]: Possible network disconnect with primary database
Tue Feb 16 14:55:17 2016
Resetting standby activation ID 2545786492 (0x97bd9e7c)
Media Recovery End-Of-Redo indicator encountered
Media Recovery Continuing
Media Recovery Waiting for thread 1 sequence 86

收到85号日志之后,与主库失去联系,在应用85号日志的时候遇到EOR标记
备库切换为主库

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SQL> SELECT SWITCHOVER_STATUS FROM V$DATABASE;

SWITCHOVER_STATUS
--------------------
TO PRIMARY

SQL> alter database recover managed standby database cancel;

Database altered.

SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN;

Database altered.

SQL> alter database open;

Database altered.

新主库alert

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN
ALTER DATABASE SWITCHOVER TO PRIMARY (min)
Maximum wait for role transition is 15 minutes.
All dispatchers and shared servers shutdown
CLOSE: killing server sessions.
CLOSE: all sessions shutdown successfully.
Tue Feb 16 15:00:29 2016
SMON: disabling cache recovery
Backup controlfile written to trace file /home/oracle/app/oracle/diag/rdbms/min/min/trace/min_ora_7726.trc
SwitchOver after complete recovery through change 1054531
Online log /u01/app/oradata/min/redo01.log: Thread 1 Group 1 was previously cleared
Online log /u01/app/oradata/min/redo02.log: Thread 1 Group 2 was previously cleared
Online log /u01/app/oradata/min/redo03.log: Thread 1 Group 3 was previously cleared
Standby became primary SCN: 1054529
AUDIT_TRAIL initialization parameter is changed back to its original value as specified in the parameter file.
Switchover: Complete - Database mounted as primary
Completed: ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN
Tue Feb 16 15:00:42 2016
alter database open
Tue Feb 16 15:00:42 2016
Assigning activation ID 2545831887 (0x97be4fcf)
Thread 1 advanced to log sequence 87 (thread open)
Tue Feb 16 15:00:42 2016
ARC8: Becoming the 'no SRL' ARCH
Thread 1 opened at log sequence 87
Current log# 2 seq# 87 mem# 0: /u01/app/oradata/min/redo02.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Feb 16 15:00:42 2016
ARC9: Becoming the 'no SRL' ARCH
Tue Feb 16 15:00:42 2016
ARCa: Becoming the 'no SRL' ARCH
Tue Feb 16 15:00:42 2016
SMON: enabling cache recovery
Archived Log entry 21 added for thread 1 sequence 86 ID 0x97be4fcf dest 1:
Tue Feb 16 15:00:42 2016
ARCt: Becoming the 'no SRL' ARCH
Tue Feb 16 15:00:42 2016
NSA2 started with pid=18, OS id=7850
[7726] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:86759444 end:86759474 diff:30 (0 seconds)
Dictionary check beginning
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Starting background process SMCO
Tue Feb 16 15:00:42 2016
idle dispatcher 'D000' terminated, pid = (17, 1)
ARCt: Standby redo logfile selected for thread 1 sequence 86 for destination LOG_ARCHIVE_DEST_2
Tue Feb 16 15:00:42 2016
SMCO started with pid=50, OS id=7852
No Resource Manager plan active
Starting background process QMNC
Tue Feb 16 15:00:42 2016
QMNC started with pid=52, OS id=7856
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Completed: alter database open
Tue Feb 16 15:00:43 2016
ARC0: Becoming the 'no SRL' ARCH
Tue Feb 16 15:00:43 2016
ARC1: Becoming the 'no SRL' ARCH
ARC0: Archive log rejected (thread 1 sequence 86) at host 'minstd'
FAL[server, ARC0]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance min - Archival Error. Archiver continuing.
Thread 1 advanced to log sequence 88 (LGWR switch)
Current log# 3 seq# 88 mem# 0: /u01/app/oradata/min/redo03.log
Tue Feb 16 15:00:45 2016
ARC2: Becoming the 'no SRL' ARCH
Archived Log entry 23 added for thread 1 sequence 87 ID 0x97be4fcf dest 1:
Tue Feb 16 15:00:45 2016
ARC3: Becoming the 'no SRL' ARCH
ARC3: Standby redo logfile selected for thread 1 sequence 87 for destination LOG_ARCHIVE_DEST_2
******************************************************************
LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
******************************************************************
LNS: Standby redo logfile selected for thread 1 sequence 88 for destination LOG_ARCHIVE_DEST_2
Tue Feb 16 15:01:05 2016
ARCr: Becoming the 'no SRL' ARCH

新主库打开的时候向备库传输86号日志被拒绝,这里我猜测是一个gap检测的问题,至于为何报错,暂时没搞清楚,我觉得出现了FAL应该是备库检测到gap来请求日志的,但为何拒绝不甚清楚,而在新备库的日志却发现是接受成功的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Tue Feb 16 15:00:42 2016
Using STANDBY_ARCHIVE_DEST parameter default value as /u01/app/archive
RFS[1]: Assigned to RFS process 5901
RFS[1]: Selected log 4 for thread 1 sequence 86 dbid -1749202365 branch 903912515
Tue Feb 16 15:00:42 2016
Archived Log entry 19 added for thread 1 sequence 86 ID 0x97be4fcf dest 1:
Tue Feb 16 15:00:45 2016
RFS[2]: Assigned to RFS process 5905
RFS[2]: Selected log 4 for thread 1 sequence 87 dbid -1749202365 branch 903912515
Tue Feb 16 15:00:45 2016
Archived Log entry 20 added for thread 1 sequence 87 ID 0x97be4fcf dest 1:
Tue Feb 16 15:00:45 2016
Primary database is in MAXIMUM PERFORMANCE mode
RFS[3]: Assigned to RFS process 5907
RFS[3]: Selected log 4 for thread 1 sequence 88 dbid -1749202365 branch 903912515
Tue Feb 16 15:01:36 2016
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE disconnect
Attempt to start background Managed Standby Recovery process (minstd)
Tue Feb 16 15:01:36 2016
MRP0 started with pid=55, OS id=5923
MRP0: Background Managed Standby Recovery process started (minstd)
started logmerger process

可见86,87已经接收到了
下一刻便开始正常传输了,主库日志信息如下:

1
2
3
4
5
6
7
8
9
Tue Feb 16 15:00:45 2016
ARC3: Becoming the 'no SRL' ARCH
ARC3: Standby redo logfile selected for thread 1 sequence 87 for destination LOG_ARCHIVE_DEST_2
******************************************************************
LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
******************************************************************
LNS: Standby redo logfile selected for thread 1 sequence 88 for destination LOG_ARCHIVE_DEST_2
Tue Feb 16 15:01:05 2016
ARCr: Becoming the 'no SRL' ARCH

有一个归档目的地2激活的提示,难道是一开始未激活?
此时再来看一下归档日志信息

新主库:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
SQL> select DEST_ID,NAME ,SEQUENCE#,STANDBY_DEST,ARCHIVED,APPLIED,DELETED,STATUS,FAL,REGISTRAR,CREATOR from v$archived_log  order by 3;

DEST_ID NAME SEQUENCE# STA ARC APPLIED DEL S FAL REGISTR CREATOR
---------- ---------------------------------------------------------------------------------------------------- ---------- --- --- --------- --- - --- ------- -------
1 /u01/app/archive/1_72_903912515.arc 72 NO YES YES NO A NO ARCH ARCH
1 /u01/app/archive/1_73_903912515.arc 73 NO YES YES NO A NO ARCH ARCH
2 minstd 74 YES YES YES NO A YES ARCH ARCH
1 /u01/app/archive/1_74_903912515.arc 74 NO YES YES NO A NO ARCH ARCH
1 /u01/app/archive/1_75_903912515.arc 75 NO YES YES NO A NO FGRD FGRD
2 minstd 75 YES YES YES NO A YES ARCH ARCH
1 /u01/app/archive/1_76_903912515.arc 76 NO YES YES NO A NO ARCH ARCH
2 minstd 76 YES YES YES NO A YES ARCH ARCH
1 /u01/app/archive/1_77_903912515.arc 77 NO YES YES NO A NO ARCH ARCH
2 minstd 77 YES YES YES NO A NO LGWR LGWR
2 minstd 78 YES YES YES NO A NO LGWR LGWR
1 /u01/app/archive/1_78_903912515.arc 78 NO YES YES NO A NO ARCH ARCH
1 /u01/app/archive/1_79_903912515.arc 79 NO YES YES NO A NO RFS FGRD
2 minstd 79 YES YES YES NO A NO FGRD FGRD
1 /u01/app/archive/1_80_903912515.arc 80 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_81_903912515.arc 81 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_82_903912515.arc 82 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_83_903912515.arc 83 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_84_903912515.arc 84 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_85_903912515.arc 85 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_86_903912515.arc 86 NO YES NO NO A NO ARCH ARCH
2 minstd 86 YES YES YES NO A YES ARCH ARCH
1 /u01/app/archive/1_87_903912515.arc 87 NO YES NO NO A NO ARCH ARCH
2 minstd 87 YES YES YES NO A YES ARCH ARCH

24 rows selected.

备库归档目的地86,97两个日志的FAL都是yes,而且在新主库查看切换期间产生的84,85两个日志的applied状态为yes,而在新备库(原主库)查询84,85在新主库(原备库)的applied状态却为NO,很有意思的一个现象

新备库:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
SQL> select DEST_ID,NAME ,SEQUENCE#,STANDBY_DEST,ARCHIVED,APPLIED,DELETED,STATUS,FAL,REGISTRAR,CREATOR from v$archived_log  order by 3;

DEST_ID NAME SEQUENCE# STA ARC APPLIED DEL S FAL REGISTR CREATOR
---------- ------------------------------------------------------------ ---------- --- --- --------- --- - --- ------- -------
2 /u01/app/archive/1_74_903912515.arc 74 NO YES YES NO A YES RFS ARCH
2 /u01/app/archive/1_75_903912515.arc 75 NO YES YES NO A YES RFS ARCH
1 /u01/app/archive/1_76_903912515.arc 76 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_77_903912515.arc 77 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_78_903912515.arc 78 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_79_903912515.arc 79 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_80_903912515.arc 80 NO YES YES NO A NO ARCH ARCH
2 min 80 YES YES YES NO A YES ARCH ARCH
1 /u01/app/archive/1_81_903912515.arc 81 NO YES YES NO A NO ARCH ARCH
2 min 81 YES YES YES NO A YES ARCH ARCH
1 /u01/app/archive/1_82_903912515.arc 82 NO YES YES NO A NO ARCH ARCH
2 min 82 YES YES YES NO A NO LGWR LGWR
1 /u01/app/archive/1_83_903912515.arc 83 NO YES YES NO A NO ARCH ARCH
2 min 83 YES YES YES NO A NO LGWR LGWR
1 /u01/app/archive/1_84_903912515.arc 84 NO YES YES NO A NO ARCH ARCH
2 min 84 YES YES NO NO A NO LGWR LGWR
1 /u01/app/archive/1_85_903912515.arc 85 NO YES YES NO A NO RFS FGRD
2 min 85 YES YES NO NO A NO FGRD FGRD
1 /u01/app/archive/1_86_903912515.arc 86 NO YES YES NO A NO RFS ARCH
1 /u01/app/archive/1_87_903912515.arc 87 NO YES YES NO A NO RFS ARCH

20 rows selected.

总结:

  1. switchover前后每个主库角色都会切换1次日志(本次实验为准,并不绝对)
  2. 新主库产生的前2个日志是以FAL方式传输到备库
  3. 在原主库查询switcover之前产生的两个日志的applied状态时为NO,而在新的主库(原备库)查询日志的应用状态是为YES的

关于本文中提到的FAL报错的问题,希望广大DBA朋友帮助解惑,如文中还有其他错误之处,还望批评指正


后记:在一套新的DG环境下测试切换,并未出现FAL报错的情况,怀疑是和当时备库的状态有关

0%