兔子先生

探寻计算机的历史与哲学密码

利用DBMS_SCHEDULER在Oracle 11gR2 RAC上执行rman备份

RMAN backup in Oracle 11gR2 RAC is exactly same like RMAN backup in Oracle 11gR2 single node.
The only difference is: Typically, in case of Oracle single node database, we will schedule RMAN scripts with the help of CRON job and it will run according to our convenience, but in case of Oracle RAC if we schedule RMAN script and if unfortunately that RAC node goes down ( where we configured RMAN scripts ), then RMAN backup won’t run obviously.

So, Same strategy will not be work in Oracle RAC node. For RMAN consistent backups use dbms_scheduler & we need to place RMAN scripts in shared directory. ( Or in my case, I have created identical scripts on both cluster node’s )

注意: 需要将脚本放在共享位置或者每个节点的相同位置

看一下rman的备份脚本,此脚本将备份放在ASM中,将日志放在节点本地

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
################################################################
# rman_backup_rac.sh FOR RAC #
# created by lp #
# 2017/03/22 #
# usage: rman_backup_rac.sh <$BACKUP_LEVEL> #
# BACKUP_LEVEL: #
# F: full backup #
# 0: level 0 #
# 1: level 1 #
################################################################


#!/bin/bash
# User specific environment and startup programs
export ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1
export RMAN_BAK_LOG_BASE=/home/oracle/DbBackup
export RMAN_BAK_DATA_BASE=+data/NXRAC/backup
export ORACLE_SID=`ps -ef|grep pmon|grep -v ASM|grep -v grep|awk -F'_' '{print $NF}'`
export TIMESTAMP=`date +%Y%m%d%H%M`;

#the destination of rman backuppiece
export RMAN_DATA=${RMAN_BAK_DATA_BASE}/rman

#the destination of rman backup logs
export RMAN_LOG=${RMAN_BAK_LOG_BASE}/logs


if [[ ! -z $1 ]] && echo $1 |grep -Ew "[01F]" >/dev/null 2>&1
then
export RMAN_LEVEL=${1}

# Check rman level
if [ "$RMAN_LEVEL" == "F" ];
then unset INCR_LVL
BACKUP_TYPE=full
else
INCR_LVL="INCREMENTAL LEVEL ${RMAN_LEVEL}"
BACKUP_TYPE=lev${RMAN_LEVEL}
fi
else
echo "${1} wrong argument!" >${RMAN_LOG}/wrong_argument_${TIMESTAMP}.log
exit 1
fi





#the prefix of rman backuppiece
export RMAN_FILE=${RMAN_DATA}/${BACKUP_TYPE}_${TIMESTAMP}

#the logfile of shell script including the rman logs contents
export SSH_LOG=${RMAN_LOG}/${BACKUP_TYPE}_${TIMESTAMP}.log

#the size of backuppiece
export MAXPIECESIZE=4G

####################################################################
# #
# the name of rman logs excluding the file expanded-name, #
# when the shell is complete,the content of this file will be #
# appended to the $SSH_LOG and the rman logfile will be deleted. #
# #
####################################################################
export RMAN_LOG_FILE=${RMAN_LOG}/${BACKUP_TYPE}_${TIMESTAMP}_1


#Check RMAN Backup Path

if ! test -d ${RMAN_LOG}
then
mkdir -p ${RMAN_LOG}
fi

echo "---------------------------------" >>${SSH_LOG}
echo " " >>${SSH_LOG}
echo "Rman Begin to Working ........." >>${SSH_LOG}
echo "Begin time at:" `date` --`date +%Y%m%d%H%M` >>${SSH_LOG}

#Startup rman to backup

$ORACLE_HOME/bin/rman log=${RMAN_LOG_FILE}.log <<EOF
connect target /
run {
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 7 DAYS;
CONFIGURE BACKUP OPTIMIZATION ON;
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE DEVICE TYPE DISK PARALLELISM 4 BACKUP TYPE TO BACKUPSET;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '${RMAN_DATA}/control_auto_%F';
ALLOCATE CHANNEL 'ch1' TYPE DISK maxpiecesize=${MAXPIECESIZE} CONNECT 'SYS/passwd@node1';
ALLOCATE CHANNEL 'ch2' TYPE DISK maxpiecesize=${MAXPIECESIZE} CONNECT 'SYS/passwd@node1';
ALLOCATE CHANNEL 'ch3' TYPE DISK maxpiecesize=${MAXPIECESIZE} CONNECT 'SYS/passwd@mode2';
ALLOCATE CHANNEL 'ch4' TYPE DISK maxpiecesize=${MAXPIECESIZE} CONNECT 'SYS/passwd@node2';
CROSSCHECK ARCHIVELOG ALL;
DELETE NOPROMPT OBSOLETE;
DELETE NOPROMPT EXPIRED BACKUP;
BACKUP AS COMPRESSED BACKUPSET
${INCR_LVL}
DATABASE FORMAT '${RMAN_FILE}_db_%U' TAG '${BACKUP_TYPE}_${TIMESTAMP}';
SQL 'ALTER SYSTEM ARCHIVE LOG CURRENT';
BACKUP FILESPERSET 20 ARCHIVELOG ALL FORMAT '${RMAN_FILE}_arc_%U' TAG '${ORACLE_SID}_arc_${TIMESTAMP}'
DELETE INPUT;
RELEASE CHANNEL ch1;
RELEASE CHANNEL ch2;
RELEASE CHANNEL ch3;
RELEASE CHANNEL ch4;
ALLOCATE CHANNEL ch00 TYPE DISK;
BACKUP
FORMAT '${RMAN_DATA}/cntrl_%U'
CURRENT CONTROLFILE;
RELEASE CHANNEL ch00;
}
exit;
EOF

RC=$?

cat ${RMAN_LOG_FILE}.log >>${SSH_LOG}
echo "Rman Stop working @ time:"`date` `date +%Y%m%d%H%M` >>${SSH_LOG}

if [ $RC -ne "0" ]; then
echo "------ error ------" >>${SSH_LOG}
else
echo "------ Success during RMAN backup peroid------" >>${SSH_LOG}
rm -rf ${RMAN_LOG_FILE}.log
fi

exit

DBMS_SCHEDULER:

Here we are using DBMS_SCHEDULER instead of DBMS_JOB, because DBMS_SCHEDULER is RAC aware.

Before jump into real DBMS_SCHEDULER configuration, we need to focus on an important thing, That:

Both RAC nodes local time zone must be identical with DBMS_SCHEDULER default time.

On all RAC node, Ensure local time zone and set it accordingly.

1
2
[oracle@node2 ]$ cat /etc/sysconfig/clock
ZONE="Asia/Shanghai"

configure default time zone for DBMS_SCHEDULER

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
SQL> select value from dba_scheduler_global_attribute where attribute_name = 'DEFAULT_TIMEZONE';

VALUE
--------------------------------------------------------------------------------
PRC

SQL> exec dbms_scheduler.set_scheduler_attribute ('DEFAULT_TIMEZONE', 'Asia/Shanghai');

PL/SQL procedure successfully completed.

SQL> select value from dba_scheduler_global_attribute where attribute_name = 'DEFAULT_TIMEZONE';

VALUE
--------------------------------------------------------------------------------
Asia/Shanghai

Now we need to create credential so that are assigned to DBMS_SCHEDULER jobs so that they can authenticate with a local/remote host operating system or a remote Oracle database.

1
2
3
SQL> exec dbms_scheduler.create_credential(credential_name => 'oracle', username => 'oracle', password => 'oracle');

PL/SQL procedure successfully completed.

Now its time to create DBMS_SCHEDULER job for RMAN incremental level 0 backup, Here in this procedure I am going to create RMAN_INC0_BACKUP job with required attributes.

1
2
3
4
5
6
7
8
9
10
11
12
begin
dbms_scheduler.create_job(
job_name => 'RMAN_INC0_BACKUP',
job_type => 'EXECUTABLE',
job_action => '/bin/sh',
number_of_arguments => 2,
start_date => SYSTIMESTAMP,
credential_name => 'oracle',
auto_drop => FALSE,
enabled => FALSE);
end;
/

Set argument_position & argument_value ( i.e. Path of the RMAN script ) for the same job:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
begin
dbms_scheduler.set_job_argument_value(
job_name => 'RMAN_INC0_BACKUP',
argument_position => 1,
argument_value => '/home/oracle/rman.sh');
end;
/

begin
dbms_scheduler.set_job_argument_value(
job_name => 'RMAN_INC0_BACKUP',
argument_position => 2,
argument_value => 0);
end;
/

Set start_date for the same job, In my case RMAN_INC0_BACKUP job will execute every week on sunday @03am, so job start date and its first run timing would according to my convenience.

1
2
3
4
5
6
7
begin
dbms_scheduler.set_attribute(
name => 'RMAN_INC0_BACKUP',
attribute => 'start_date',
value => trunc(sysdate)+3/24);
end;
/

Test your backup job manually in SQL prompt by instantiating RMAN_INC0_BACKUP job.

1
2
SQL> exec dbms_scheduler.run_job('RMAN_INC0_BACKUP');
PL/SQL procedure successfully completed.

Verify running RMAN backup status by issuing following SQL query, It will show you RMAN backup details with start time & end time.

1
2
3
4
5
6
select SESSION_KEY, INPUT_TYPE, STATUS,
to_char(START_TIME,'mm/dd/yy hh24:mi') start_time,
to_char(END_TIME,'mm/dd/yy hh24:mi') end_time,
elapsed_seconds/3600 hrs
from V$RMAN_BACKUP_JOB_DETAILS
order by session_key;

In case of any error while test run, you can make sure details of error by issuing the following query, OR You can also query to dba_scheduler_job_run_details dictionary view for more details.

1
select JOB_NAME,STATUS,STATE,ERROR#,CREDENTIAL_NAME from dba_scheduler_job_run_details where CREDENTIAL_NAME like 'RMAN%';

After successfully completion of test run, Enable & schedule it by following procedure by setting value to repeat_interval parameter, In my case RMAN_INC0_BACKUP job will execute every week on Sunday @03pm.

1
2
3
4
5
6
7
8
begin
dbms_scheduler.set_attribute(
name => 'RMAN_INC0_BACKUP',
attribute => 'repeat_interval',
value => 'freq=daily;byday=sun;byhour=03');
dbms_scheduler.enable( 'RMAN_INC0_BACKUP' );
end;
/

Ensure dbms_scheduler job details by issuing the following query OR you can also query to dba_scheduler_jobs and dba_scheduler_job_args.

1
SQL> select job_name,enabled,owner, state from dba_scheduler_jobs where job_name in ('RMAN_INC0_BACKUP');

Keep your eye on behavior of dbms_scheduler job by issuing the following query:

1
2
SQL> select job_name,RUN_COUNT,LAST_START_DATE,NEXT_RUN_DATE from dba_scheduler_jobs where job_name in ('RMAN_INC0_BACKUP');
SQL> select * from dba_scheduler_job_args where job_name like 'RMAN%';

In accordance with the above method to create a level 1 backup job RMAN_INC1_BACKUP,The only difference is the repeat_interval

1
2
3
4
5
6
7
8
begin
dbms_scheduler.set_attribute(
name => 'RMAN_INC1_BACKUP',
attribute => 'repeat_interval',
value => 'freq=daily;byday=mon,tue,wed,thu,fri,sat;byhour=03');
dbms_scheduler.enable( 'RMAN_INC1_BACKUP' );
end;
/

Important Note:
DBMS_SCHEDULER is smart enough to start backup on the node where the last backup was successfully executed.

参考:

https://dbatricksworld.com/how-to-backup-oracle-rac-11gr2-database-with-rman-backup-utility-with-the-help-of-dbms_scheduler-part-i-rman-full-database-backup/

http://dbatricksworld.com/how-to-backup-oracle-rac-11gr2-database-with-rman-backup-utility-with-the-help-of-dbms_scheduler-part-ii-rman-incremental-database-backup/

本文主要介绍SPM的原理与运行机制,针对12c做一些补充记录,并添加一些11g和12c里面的知识点,以做备忘。

干预SQL执行计划的历史

SQL的执行效率,取决于它的执行计划是否高效。 优化器的算法是一个平衡,需要收集尽量少的信息,用尽量快的速度试图去得到一个最优的执行计划,这也决定了它不是万能的。 所以Oracle提供了一些辅助手段来“修复”优化器可能产生的错误,并不断改进这些方法。

  • Oracle 8: hint
  • Oracle 8i&9i: stored outline
  • Oracle 10g: sql profile
  • Oracle 11g: sql plan manangement 、adaptive cursor sharing
  • Oracle 12c: sql plan manangement 、adaptive cursor sharing、Adaptive Execution Plans

SQL Plan Management

SPM组成

SQL Plan Management(以下简称SPM)是一种优化器自动管理执行计划的预防机制,它确保数据库仅使用已知的、经过验证的执行计划。在Oracle 11g之前,执行计划一直是作为“运行时”生成的对象存在。虽然oracle提供了一些方法去指导它的生成,但Oracle一直没有试图去保存完整的执行计划。 从11g开始,执行计划就可以作为一类资源被保存下来,允许特定SQL语句只能选择“已知”的执行计划。

同其他方法相比,SPM更加的灵活。如我们所熟知的,一条带有绑定变量的SQL语句,最好的执行计划会根据绑定变量的值而不同,11g以前的方法都无法解决这个问题。在11g中,与adaptive cursor sharing配合,SPM允许你同时接受多个执行计划。执行时,根据不同的变量值,SPM会花费很少的运算从中选择一条最合适的。

SPM有以下三个主要组件:

  • Plan capture

    捕捉并存储与SQL执行计划相关的信息

  • Plan selection

    使用SQL plan baselines选择合适的执行计划以避免性能退化

  • Plan evolution

    向SQL plan baselines增加新的执行计划

SQL Management Base

SMB是数据字典的一部分,位于SYSAUX表空间,其中存储着statement logs, plan histories, SQL plan baselines, and SQL profiles等内容。

Plan History是优化器生成的所有执行计划的总称;SQL Plan Baseline是Plan History里那些被标记为“ACCEPTED”的执行计划的总称,称为SQL执行计划基线;SQL Statement Log是一系列的查询签名(signatures),用于在自动捕获执行计划时辨别重复执行的sql;关系如下图所示:

SQL Management Base

Plan capture(执行计划的捕获)

Automatic Initial Plan Capture(自动捕获)

开启自动捕获只需在初始化参数中将OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES设置为TRUE(默认为FALSE),开启之后数据库就会为每个重复执行的sql创建SQL Plan Baseline,甄别是否是重复执行的SQL就是利用的上文提到的SQL Statement Log,基线包括供优化器重新生成该执行计划的所有信息,例如SQL text, outline, bind variable values, and compilation environment,这个初始的执行计划会被自动标记为“accepted”,如果之后又有新的执行计划生成,那么该执行计划会被加入到Plan History但是会被标记为“unaccepted”。

Oracle Database 12c Release 2 增加了自动捕获时sql过滤的功能,利用DBMS_SPM.CONFIGURE存储过程可以创建automatic capture filter。因此,你可以设置仅仅捕获你想要捕获的SQL,可以从DBA_SQL_MANAGEMENT_CONFIG视图中查询当前的设置:

SPM CONFIG

自动捕获状态下执行计划的匹配算法如下:

  • 如果SQL plan baseline不存在,那么优化器会为该SQL创建SQL plan baseline和Plan History,并将捕获的plan标记为accepted,加入SQL plan baseline
  • 如果SQL plan baseline存在,那么优化器的行为就依赖于解析时生产的基于成本的执行计划
    • 如果该plan与SQL plan baseline中的plan都不匹配,那么优化器将其标记为unaccepted加入Plan History
    • 如果该plan匹配到SQL plan baseline中的plan,那么执行该plan,不对现有的SQL plan baseline与Plan History做任何改动

Manual Plan Capture(手动捕获)

手动加载已存在的执行计划到SPM是最常用的方式,需要注意的是,默认情况下手动load的plan,会被自动标记为”accepted“,创建新的SQL plan baseline或加入到已有的SQL plan baseline中,oracle提供以下5种方式手动加载Execution Plan:

  • From a SQL Tuning Set(DBMS_SPM.LOAD_PLANS_FROM_SQLSET)
  • From the cursor cache (DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE)
  • From the AWR repository (new to Oracle Database 12c Release 2)(DBMS_SPM.LOAD_PLANS_FROM_AWR)
  • Unpacked from a staging table(DBMS_SPM.UNPACK_STGTAB_BASELINE)
  • From existing stored outlines(DBMS_SPM.MIGRATE_STORED_OUTLINE)

Loading Plans into a SQL Plan Baseline

手动load的结果有2种情况,由baseline是否存在而决定:

  • 如果SQL Plan Baseline不存在,则数据库做如下事情
    1. Creates a plan history and plan baseline for the statement
    2. Marks the initial plan for the statement as accepted
    3. Adds the plan to the new baseline
  • 如果SQL Plan Baseline存在,则数据库做如下事情
    1. Marks the loaded plan as accepted
    2. Adds the plan to the plan baseline for the statement without verifying the plan's performance

手动load的plan会被自动标为accepted,因为优化器会假定任何由管理员手动加载的plan都是性能上可接受的;当然,你也可以在load的时候在DBMS_SPM.LOAD_PLANS_FROM_% 函数中将enable设为NO,从而达到禁用的目的。

具体的load过程,本文暂不做讨论,详情请参阅官方文档 Managing SQL Plan Baselines,这里需要说明的一点是从AWR导入执行计划是oracle 12.2才开始加入的,12.2以前要想实现从AWR中导入执行计划,需要先创建SQL Tuning set,并从AWR中将plan load进STS,再使用DBMS_SPM.LOAD_PLANS_FROM_SQLSET将计划加载到SPM中。

下面再谈一下,对于已存在SQL Plan Baseline的SQL,其后续的新的执行计划的捕获情况;

不论使用哪种方式创建了SQL Plan Baseline,后续新的plan都会被加入到Plan History并被标记为”unaccepted“,此行为不依赖于OPTIMIZER_CAPTURE_SQL_PLAN_ BASELINES的设置,新加入的plan不会被使用,直到其经过验证比已accepted的plan性能更好,并演化为accepted加入SQL Plan Baseline。

Plan Selection(执行计划的选择)

SPM通过几个标记来实现对执行计划的控制,这些标记可以在视图DBA_SQL_PLAN_BASELINES中查到:

  • Enabled(控制活动)

    • YES(活动的,但不一定会被使用)
    • NO(禁用的,不活动的,肯定不被使用)
  • Accepted(控制使用)

    • YES(只有 “Enabled” 并且 “Accepted” 的计划才会被选择使用)
    • NO(如果是“Enabled” 那么只有被evolve成“Accepted”才有可能被执行)
  • Fixed(控制优先级)

    • YES(如果是“Enabled”并且“Accepted”,会优先选择这个计划,这个计划会被视为不需要改变的、固定的)
    • NO(普通的计划,无需优先)
  • Reproduced(有效性)

    • YES(优化器可以使用这个计划)
    • NO(计划无效,比如索引被删除)
  • ADAPTIVE(12c引入,标记是否是Adaptive Plans)

    • YES(是一个adaptive plan,在evolve的时候会被考虑,evolve时会测试执行,验证后final plan会变为accepted)
    • NO(adaptive plan被evolve后会被标记为NO)

    先来看一下SQL Plan Selection的决策树:

Decision Tree for SQL Plan Selection

当数据库为一条sql执行硬解析的时候,优化器会生成一个best-cost plan,如果初始化参数OPTIMIZER_USE_SQL_PLAN_BASELINES被设置成为TRUE(默认为TRUE),那么在执行best-cost plan之前会先检查是否存在对应的SQL Plan Baseline,匹配的时候使用SQL语句的signature,signature是一个由SQL文本规范化后的SQL标识符(case insensitivity and with whitespaces removed)(由此可见只要一条sql生成了baseline之后,那么无论大小写改变、空格多少都会被认为是一条sql),这个比较是内存操作,因此开销很小。

如果baseline不存在,就按生成的计划执行。如果baseline存在,那么要查看history里是否有这个计划,如果没有,就将这个计划插入,并标记为ENABLED,NON-ACCEPTED。

在baseline中查看是否有FIXED的计划存在,如果存在,执行FIXED的计划,如果存在多个FIXED的计划,根据统计信息重新计算cost,选择cost小的那个。

如果FIXED的计划不存在,就选择ACCEPTED的计划执行。 如果存在多个ACCEPTED的计划,根据统计信息重新计算cost,选择cost小的那个。

如果因为某些系统的改变(例如索引删除)导致已accepted的计划无法reproducible,那么优化器将使用新生成的best-cost plan,并将其加入plan history标记为unaccepted

* 注意,这里每次重新计算cost的代价不大,因为执行计划是已知的,优化器不必遍历所有的可能,只需根据算法计算出已知计划的cost便可。

* 注意,当sql plan baseline中有Fixed的时候,新生成的执行计划是不会被加入到plan history中的。

Plan Evolution(执行计划演化)

当优化器生成一个新的执行计划后,将其加入到plan history中作为一个unaccepted的plan,它需要被verified之后才可以使用,Verification是一个比较unaccepted和accepted(所有accepted中最小cost的)plan的执行性能的过程,verify的过程是实际执行该plan的过程,通过与accepted的plan比较elapse time, CPU time and buffer gets等性能统计信息来判断新plan的性能,如果新的plan的性能优于原plan或者相差无几,那么该plan会被标记为accepted加入SQL Plan Baseline,否则它仍然是一个unaccepted的plan,但是LAST_VERIFIED属性会被更新为当前时刻,12c之后的Automatic plan evolution过程还会考虑自上次被verify之后超过30天的plan,oracle认为如果系统有所改变,也许之前的plan会有更好的性能,当然这个功能可以通过dbms_spm.alter_sql_plan_baseline来禁止。

12c之前没有自动Evolve的机制,从12.1开始automatic plan evolution由SPM Evolve Advisor来完成,11g时代的DBMS_SPM.EVOLVE_SQL_PLAN_BASELINE被废弃,下面将分别针对11g和12c来介绍Plan Evolution的过程:

11g DBMS_SPM.EVOLVE_SQL_PLAN_BASELINE

11g可以使用Oracle Enterprise Manager或者DBMS_SPM.EVOLVE_SQL_PLAN_BASELINE函数来演化SQL Plan,语法如下:

1
2
3
4
5
6
7
DBMS_SPM.EVOLVE_SQL_PLAN_BASELINE (
sql_handle IN VARCHAR2 := NULL,
plan_name IN VARCHAR2 := NULL,
time_limit IN INTEGER := DBMS_SPM.AUTO_LIMIT,
verify IN VARCHAR2 := 'YES',
commit IN VARCHAR2 := 'YES')
RETURN CLOB;

这里由两个标记控制:

- Verify 
    - YES (验证比较性能)
    - NO (不验证性能)
- Commit
    - YES (演化)
    - NO (只生成报告,不演化)

这里可以通过不同的排列组合,达到不同的效果:

  • 自动接收所有性能更好的执行计划,并生成report (Verify->YES, Commit->YES)
  • 自动接收所有新的执行计划,不验证性能,生成report (Verify->NO, Commit->YES)
  • 比较性能,生成report,人工确认是否演化 (Verify->YES, Commit->NO)

12c SPM Evolve Advisor Task & Manually Evolve Task

SPM Evolve Advisor Task

从12.1开始,自动演化任务由SPM Evolve Advisor每天在维护窗口期内自动执行,SPM Evolve Advisor是一个自动任务(SYS_AUTO_SPM_EVOLVE_TASK),它每天做如下操作:

  1. Locates unaccepted plans
  2. Ranks all unaccepted plans
  3. Performs test executions of as many plans as possible during the maintenance window
  4. Selects the lowest-cost plan to compare against each unaccepted plan
  5. Accepts automatically any unaccepted plan that performs sufficiently better, using a cost-based algorithm, than the existing accepted plan

* 需要注意的是,没有单独针对Automatic SPM Evolve Advisor task的scheduler client,Automatic SQL Tuning Advisor 和 Automatic SPM Evolve Advisor共用一个client,因此它们两个同时启用或同时禁用。

自动任务属性可以通过DBMS_SPM.SET_EVOLVE_TASK_PARAMETER来配置,下面列举几个重要的属性:

1
2
3
4
5
6
7
8
COL PARAMETER_NAME FORMAT a25
COL VALUE FORMAT a42
SELECT PARAMETER_NAME, PARAMETER_VALUE AS "VALUE"
FROM DBA_ADVISOR_PARAMETERS
WHERE ( (TASK_NAME = 'SYS_AUTO_SPM_EVOLVE_TASK') AND
( (PARAMETER_NAME = 'ACCEPT_PLANS') OR
(PARAMETER_NAME LIKE '%ALT%') OR
(PARAMETER_NAME = 'TIME_LIMIT') ) );

12.2的默认值如下(各个属性的意义详见官方文档):

1
2
3
4
5
6
7
PARAMETER_NAME		    VALUE
------------------------- ------------------------------------------
TIME_LIMIT 3600
ALTERNATE_PLAN_LIMIT 10
ALTERNATE_PLAN_SOURCE CURSOR_CACHE+AUTOMATIC_WORKLOAD_REPOSITORY
ALTERNATE_PLAN_BASELINE EXISTING
ACCEPT_PLANS TRUE

修改属性:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
  BEGIN
DBMS_SPM.SET_EVOLVE_TASK_PARAMETER(
task_name => 'SYS_AUTO_SPM_EVOLVE_TASK'
, parameter => 'TIME_LIMIT'
, value => '1200'
);
DBMS_SPM.SET_EVOLVE_TASK_PARAMETER(
task_name => 'SYS_AUTO_SPM_EVOLVE_TASK'
, parameter => 'ACCEPT_PLANS'
, value => 'true'
);
DBMS_SPM.SET_EVOLVE_TASK_PARAMETER(
task_name => 'SYS_AUTO_SPM_EVOLVE_TASK'
, parameter => 'ALTERNATE_PLAN_LIMIT'
, value => '500'
);
END;
/

查看结果:

1
2
3
4
5
6
7
 PARAMETER_NAME            VALUE
------------------------- ------------------------------------------
ALTERNATE_PLAN_LIMIT 500
ALTERNATE_PLAN_SOURCE CURSOR_CACHE+AUTOMATIC_WORKLOAD_REPOSITORY
ALTERNATE_PLAN_BASELINE EXISTING
ACCEPT_PLANS true
TIME_LIMIT 1200

自动演化的报告可以通过函数DBMS_SPM.REPORT_AUTO_EVOLVE_TASK来查询:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
SQL> SET LONG 1000000 PAGESIZE 1000 LONGCHUNKSIZE 100 LINESIZE 100
SQL> SELECT DBMS_SPM.report_auto_evolve_task FROM dual;

REPORT_AUTO_EVOLVE_TASK
----------------------------------------------------------------------------------------------------
GENERAL INFORMATION SECTION
---------------------------------------------------------------------------------------------

Task Information:
---------------------------------------------
Task Name : SYS_AUTO_SPM_EVOLVE_TASK
Task Owner : SYS
Description : Automatic SPM Evolve Task
Execution Name : EXEC_1173
Execution Type : SPM EVOLVE
Scope : COMPREHENSIVE
Status : COMPLETED
Started : 04/20/2017 22:00:05
Finished : 04/20/2017 22:00:09
Last Updated : 04/20/2017 22:00:09
Global Time Limit : 3600
Per-Plan Time Limit : UNUSED
Number of Errors : 0
---------------------------------------------------------------------------------------------

SUMMARY SECTION
---------------------------------------------------------------------------------------------
Number of plans processed : 0
Number of findings : 0
Number of recommendations : 0
Number of errors : 0
---------------------------------------------------------------------------------------------
Manually Evolve Task

手动演化的过程大致如下图:

Evolving SQL Plan Baselines

  1. Create an evolve task
  2. Optionally, set evolve task parameters(在12.2.0.1中仅TIME_LIMIT有效)
  3. Execute the evolve task
  4. Implement the recommendations in the task
  5. Report on the task outcome

个人认为4、5无绝对先后顺序

具体操作,本文暂不讨论,请浏览Manually Evolving SQL Plan Baselines in Oracle Database 12c Release 2

Managing the SQL Management Base

利用DBMS_SPM.CONFIGURE存储过程可以配置SMB,视图DBA_SQL_MANAGEMENT_CONFIG显示了各个配置项的状态。

参数有如下几个:

  • SPACE_BUDGET_PERCENT(SYSAUX表空间的最大使用率)
  • PLAN_RETENTION_WEEKS(没被使用的plan的最大保留weeks,默认是53周)
  • AUTO_CAPTURE_PARSING_SCHEMA_NAME(自动捕获过滤schema)
  • AUTO_CAPTURE_MODULE
  • AUTO_CAPTURE_MODULE
  • AUTO_CAPTURE_SQL_TEXT(自动捕获指定的sql)

修改SMB磁盘使用限额

1.查询当前配置

1
2
3
4
5
6
7
8
9
10
11
12
13
  
SELECT PARAMETER_NAME, PARAMETER_VALUE AS "%_LIMIT",
( SELECT sum(bytes/1024/1024) FROM DBA_DATA_FILES
WHERE TABLESPACE_NAME = 'SYSAUX' ) AS SYSAUX_SIZE_IN_MB,
PARAMETER_VALUE/100 *
( SELECT sum(bytes/1024/1024) FROM DBA_DATA_FILES
WHERE TABLESPACE_NAME = 'SYSAUX' ) AS "CURRENT_LIMIT_IN_MB"
FROM DBA_SQL_MANAGEMENT_CONFIG
WHERE PARAMETER_NAME = 'SPACE_BUDGET_PERCENT';

PARAMETER_NAME %_LIMIT SYSAUX_SIZE_IN_MB CURRENT_LIMIT_IN_MB
-------------------- ---------- ----------------- -------------------
SPACE_BUDGET_PERCENT 10 211.4375 21.14375

2.修改配额

1
EXECUTE DBMS_SPM.CONFIGURE('space_budget_percent',30);

3.查看结果

1
2
3
4
5
6
7
8
9
10
11
12
  SELECT PARAMETER_NAME, PARAMETER_VALUE AS "%_LIMIT", 
( SELECT sum(bytes/1024/1024) FROM DBA_DATA_FILES
WHERE TABLESPACE_NAME = 'SYSAUX' ) AS SYSAUX_SIZE_IN_MB,
PARAMETER_VALUE/100 *
( SELECT sum(bytes/1024/1024) FROM DBA_DATA_FILES
WHERE TABLESPACE_NAME = 'SYSAUX' ) AS "CURRENT_LIMIT_IN_MB"
FROM DBA_SQL_MANAGEMENT_CONFIG
WHERE PARAMETER_NAME = 'SPACE_BUDGET_PERCENT';

PARAMETER_NAME %_LIMIT SYSAUX_SIZE_IN_MB CURRENT_LIMIT_IN_MB
-------------------- ---------- ----------------- -------------------
SPACE_BUDGET_PERCENT 30 211.4375 63.43125

修改保留策略

1.查看当前配置

1
2
3
4
5
6
7
SQL> SELECT PARAMETER_NAME, PARAMETER_VALUE
2 FROM DBA_SQL_MANAGEMENT_CONFIG
3 WHERE PARAMETER_NAME = 'PLAN_RETENTION_WEEKS';

PARAMETER_NAME PARAMETER_VALUE
------------------------------ ---------------
PLAN_RETENTION_WEEKS 53

2.修改配置

1
EXECUTE DBMS_SPM.CONFIGURE('plan_retention_weeks',105);

3.查看结果

1
2
3
4
5
6
7
SQL> SELECT PARAMETER_NAME, PARAMETER_VALUE
2 FROM DBA_SQL_MANAGEMENT_CONFIG
3 WHERE PARAMETER_NAME = 'PLAN_RETENTION_WEEKS';

PARAMETER_NAME PARAMETER_VALUE
------------------------------ ---------------
PLAN_RETENTION_WEEKS 105

Monitoring SQL plan baselines

视图DBA_SQL_PLAN_BASELINES展示了当前SQL plan baselines的信息:

1
2
3
4
5
6
7
8
9
10
11
select SIGNATURE,
SQL_HANDLE,
SQL_TEXT,
PLAN_NAME,
PARSING_SCHEMA_NAME,
LAST_EXECUTED,
ENABLED,
ACCEPTED,
REPRODUCED,
EXECUTIONS
from dba_sql_plan_baselines;

结果如下:

1
2
3
4
5
6
7
8
9
10
                SIGNATURE SQL_HANDLE           SQL_TEXT                       PLAN_NAME                      PARSING_SC LAST_EXECUTED                       ENA ACC REP EXECUTIONS
----------------------- -------------------- ------------------------------ ------------------------------ ---------- ----------------------------------- --- --- --- ----------
here id=1

1962643652257108320 SQL_1b3cb600d175d160 select /*liu*/ * from test wh SQL_PLAN_1qg5q038rbnb025a3834b SCOTT 13-APR-17 02.23.12.000000 PM YES YES YES 0
ere id=1

1962643652257108320 SQL_1b3cb600d175d160 select /*liu*/ * from test wh SQL_PLAN_1qg5q038rbnb097bbe3d0 HR 13-APR-17 02.35.09.000000 PM YES YES YES 0
ere id=1

需要注意的一点是,该视图中LAST_EXECUTED和EXECUTIONS并不是实时更新的。

函数DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE可以查看baseline中的执行计划,语法如下:

1
2
3
4
5
 DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE (
sql_handle IN VARCHAR2 := NULL,
plan_name IN VARCHAR2 := NULL,
format IN VARCHAR2 := 'TYPICAL')
RETURN dbms_xplan_type_table;

例如:

1
2
3
4
5
6
7
8
select 
*
from table(
dbms_xplan.display_sql_plan_baseline(
sql_handle=>'SQL_5671036d51fd678f',
format=>'basic'));
--或者
select * from table(dbms_xplan.display_sql_plan_baseline('SQL_17574e83c195631c',null,'BASIC +NOTE'));

也可以通过V$SQL视图查看一条SQL是否使用SQL Base Line,如果使用了baseline,那么它的sql_plan_baseline列将会显示plan_name,因此可以连接DBA_SQL_PLAN_BASELINES和V$SQL视图:

1
2
3
4
select s.SQL_ID, s.SQL_TEXT, b.plan_name, b.origin, b.accepted
from dba_sql_plan_baselines b, v$sql s
where s.EXACT_MATCHING_SIGNATURE = b.signature
and s.SQL_PLAN_BASELINE = b.plan_name;

查看v$sql中的sql是否有baseline:

1
2
3
4
5
6
SELECT sql_handle,
plan_name
FROM dba_sql_plan_baselines
WHERE signature IN
( SELECT exact_matching_signature FROM v$sql WHERE sql_id='1s6a8wn4p6rym'
);

Dropping SQL Plan Baselines

1.查询要删除的baseline

1
2
3
4
5
6
7
8
9
10
11
SQL> SELECT SQL_HANDLE, SQL_TEXT, PLAN_NAME, ORIGIN,
2 ENABLED, ACCEPTED
3 FROM DBA_SQL_PLAN_BASELINES
4 WHERE SQL_TEXT LIKE 'SELECT /* repeatable_sql%';

SQL_HANDLE SQL_TEXT PLAN_NAME ORIGIN ENA ACC
-------------------- -------------------- ------------------------------ -------------- --- ---
SQL_b6b0d1c71cd1807b SELECT /* repeatable SQL_PLAN_bdc6jswfd303v2f1e9c20 AUTO-CAPTURE YES YES
_sql */ count(*) fro
m hr.jobs

2.删除sql plan baseline

1
2
3
4
5
6
7
8
9
DECLARE
v_dropped_plans number;
BEGIN
v_dropped_plans := DBMS_SPM.DROP_SQL_PLAN_BASELINE (
sql_handle => 'SQL_b6b0d1c71cd1807b'
);
DBMS_OUTPUT.PUT_LINE('dropped ' || v_dropped_plans || ' plans');
END;
/

3.确认删除

1
2
3
4
5
6
SELECT SQL_HANDLE, SQL_TEXT, PLAN_NAME, ORIGIN,
ENABLED, ACCEPTED
FROM DBA_SQL_PLAN_BASELINES
WHERE SQL_TEXT LIKE 'SELECT /* repeatable_sql%';

no rows selected

参考文献:

  1. Oracle 11g 针对SQL性能的新特性(三)- SQL Plan Management
  2. Database SQL Tuning Guide
  3. White Paper:SQL Plan Management in Oracle Database 11g
  4. White Paper:SQL Plan Management with Oracle Database 12c Release 2

LOAD_PLANS_FROM_CURSOR_CACHE函数中有一种语法组合如下:

1
2
3
4
5
6
7
DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE (
sql_id IN VARCHAR2,
plan_hash_value IN NUMBER := NULL,
sql_handle IN VARCHAR2, --注意这里
fixed IN VARCHAR2 := 'NO',
enabled IN VARCHAR2 := 'YES')
RETURN PLS_INTEGER;

Oracle对其中sql_handle的解释为:

SQL handle to use in identifying the SQL plan baseline into which the plans are loaded. The sql_handle must denote an existing SQL plan baseline. The use of handle is crucial when the user tunes a SQL statement by adding hints to its text and then wants to load the resulting plan(s) into the SQL plan baseline of the original SQL statement.

意思是说,使用sql_handle可以将cursor cache中的执行计划load进一个已存在的SQL Plan Baseline,这就意味着你可以不需要更改应用的代码,就可以让SQL跑出你期望的执行计划,前提是你有一个已经tuning好的plan在cacahe里,你就可以将这个plan load进原SQL的Baseline,下面将针对此调优方法展开详细论述:

我要进行的实验是要一个原本走索引的sql,在不改变原sql的基础上,让它走全表扫描

  1. 创建测试环境

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    CREATE TABLE spm_test_tab (
    id NUMBER,
    description VARCHAR2(50)
    );

    INSERT /*+ APPEND */ INTO spm_test_tab
    SELECT level,
    'Description for ' || level
    FROM dual
    CONNECT BY level <= 10000;

    COMMIT;
    --创建索引
    CREATE INDEX spm_test_tab_idx ON spm_test_tab(id);
    --收集统计信息
    begin
    dbms_stats.gather_table_stats(ownname => 'SYS',
    tabname =>'SPM_TEST_TAB',
    estimate_percent =>DBMS_STATS.AUTO_SAMPLE_SIZE,
    method_opt => 'FOR ALL COLUMNS SIZE AUTO',
    granularity=>'ALL',
    cascade => TRUE,
    no_invalidate => false);
    end;
    /
  2. 执行例句,查看其执行计划

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    	
    SELECT description FROM spm_test_tab WHERE id = 1113;

    SELECT * FROM TABLE (SELECT DBMS_XPLAN.DISPLAY_CURSOR(null,null,'advanced') from dual);

    SQL_ID gsttbra9z0ddw, child number 0
    -------------------------------------
    SELECT description FROM spm_test_tab WHERE id = 1113

    Plan hash value: 3121206333

    ------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | | | 2 (100)| |
    | 1 | TABLE ACCESS BY INDEX ROWID| SPM_TEST_TAB | 1 | 25 | 2 (0)| 00:00:01 |
    |* 2 | INDEX RANGE SCAN | SPM_TEST_TAB_IDX | 1 | | 1 (0)| 00:00:01 |
    ------------------------------------------------------------------------------------------------

    Query Block Name / Object Alias (identified by operation id):
    -------------------------------------------------------------

    1 - SEL$1 / SPM_TEST_TAB@SEL$1
    2 - SEL$1 / SPM_TEST_TAB@SEL$1

    Outline Data
    -------------

    /*+
    BEGIN_OUTLINE_DATA
    IGNORE_OPTIM_EMBEDDED_HINTS
    OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
    DB_VERSION('11.2.0.4')
    ALL_ROWS
    OUTLINE_LEAF(@"SEL$1")
    INDEX_RS_ASC(@"SEL$1" "SPM_TEST_TAB"@"SEL$1" ("SPM_TEST_TAB"."ID"))
    END_OUTLINE_DATA
    */

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    2 - access("ID"=1113)

    Column Projection Information (identified by operation id):
    -----------------------------------------------------------

    1 - "DESCRIPTION"[VARCHAR2,50]
    2 - "SPM_TEST_TAB".ROWID[ROWID,10]
  3. 从v$sql中查找语句的sql_id,并使用DBMS_SPM.LOAD_PLAN_FROM_CURSOR_CACHE创建SQL Plan Baseline

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    SYS@DB41 2017/04/26 14:57:55> SELECT SQL_ID,SQL_FULLTEXT FROM V$SQL WHERE SQL_FULLTEXT LIKE  '%1113%';

    SQL_ID SQL_FULLTEXT
    ------------- --------------------------------------------------------------------------------
    gsttbra9z0ddw SELECT description FROM spm_test_tab WHERE id = 1113

    declare
    v_sql_plan_id pls_integer;
    begin
    v_sql_plan_id := dbms_spm.load_plans_from_cursor_cache(sql_id => 'gsttbra9z0ddw');
    end;
    /

    SELECT SQL_HANDLE,PLAN_NAME,ENABLED,ACCEPTED
    FROM dba_sql_plan_baselines
    WHERE signature IN
    ( SELECT exact_matching_signature FROM v$sql WHERE sql_id='gsttbra9z0ddw'
    );


    SQL_HANDLE PLAN_NAME ENA ACC
    ------------------------------ ------------------------------ --- ---
    SQL_c1c9aa52fd90f3ae SQL_PLAN_c3kdaabyt1wxfed3324c0 YES YES
  4. 使用ALTER_SQL_PLAN_BASELINE禁用原执行计划

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    declare
    v_sql_plan_id pls_integer;
    begin
    v_sql_plan_id := dbms_spm.ALTER_SQL_PLAN_BASELINE(sql_handle => 'SQL_c1c9aa52fd90f3ae',
    plan_name => 'SQL_PLAN_c3kdaabyt1wxfed3324c0',
    attribute_name => 'enabled',
    attribute_value => 'NO');
    end;
    /

    SQL_HANDLE PLAN_NAME ENA ACC
    ------------------------------ ------------------------------ --- ---
    SQL_c1c9aa52fd90f3ae SQL_PLAN_c3kdaabyt1wxfed3324c0 NO YES
  5. 加入hint使sql走全表扫描

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    SELECT /*+ FULL(SPM_TEST_TAB) */description FROM  spm_test_tab WHERE  id = 1113;

    SELECT * FROM TABLE (SELECT DBMS_XPLAN.DISPLAY_CURSOR(null,null,'advanced') from dual);

    SQL_ID fr5dd23pfbjfz, child number 0
    -------------------------------------
    SELECT /*+ FULL(SPM_TEST_TAB) */description FROM spm_test_tab WHERE
    id = 1113

    Plan hash value: 1107868462

    ----------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ----------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | | | 13 (100)| |
    |* 1 | TABLE ACCESS FULL | SPM_TEST_TAB | 1 | 25 | 13 (0)| 00:00:01|
    ----------------------------------------------------------------------------------

    Query Block Name / Object Alias (identified by operation id):
    -------------------------------------------------------------

    1 - SEL$1 / SPM_TEST_TAB@SEL$1

    Outline Data
    -------------

    /*+
    BEGIN_OUTLINE_DATA
    IGNORE_OPTIM_EMBEDDED_HINTS
    OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
    DB_VERSION('11.2.0.4')
    ALL_ROWS
    OUTLINE_LEAF(@"SEL$1")
    FULL(@"SEL$1" "SPM_TEST_TAB"@"SEL$1")
    END_OUTLINE_DATA
    */

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    1 - filter("ID"=1113)

    Column Projection Information (identified by operation id):
    -----------------------------------------------------------

    1 - "DESCRIPTION"[VARCHAR2,50]
  6. 在V$SQL中找到加入hint的语句的sql_id和plan_hash_value

    1
    2
    3
    4
    5
    6
    SELECT SQL_ID,PLAN_HASH_VALUE,SQL_FULLTEXT FROM V$SQL WHERE SQL_FULLTEXT LIKE  '%1113%';

    SQL_ID PLAN_HASH_VALUE SQL_FULLTEXT
    ------------- --------------- --------------------------------------------------------------------------------
    fr5dd23pfbjfz 1107868462 SELECT /*+ FULL(SPM_TEST_TAB) */description FROM spm_test_tab WHERE id = 1113
    08vxbgd0qrm8s 903671040 SELECT SQL_ID,SQL_FULLTEXT FROM V$SQL WHERE SQL_FULLTEXT LIKE '%1113%'
  7. 利用SQL_ID和PLAN_HASH_VALUE创建一个新的accepted的plan,并通过SQL_HANDLE将修改过的plan与原SQL联系起来

    1
    2
    3
    4
    5
    6
    7
    8
    declare
    v_sql_plan_id pls_integer;
    begin
    v_sql_plan_id := dbms_spm.load_plans_from_cursor_cache(sql_id => 'fr5dd23pfbjfz',
    plan_hash_value => 1107868462,
    sql_handle => 'SQL_c1c9aa52fd90f3ae');
    end;
    /
  8. 查询DBA_SQL_PLAN_BASELINES会看到2个plan

    1
    2
    3
    4
    SQL_HANDLE		       		   PLAN_NAME		              ENA ACC
    ------------------------------ ------------------------------ --- ---
    SQL_c1c9aa52fd90f3ae SQL_PLAN_c3kdaabyt1wxfb65c37c8 YES YES
    SQL_c1c9aa52fd90f3ae SQL_PLAN_c3kdaabyt1wxfed3324c0 NO YES
  9. 现在执行原sql查看其执行计划

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37

    SYS@DB41 2017/04/26 15:50:44> SET AUTOTRACE TRACE
    SYS@DB41 2017/04/26 15:50:57> SELECT description FROM spm_test_tab WHERE id = 1113;

    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1107868462

    ----------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ----------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 25 | 13 (0)| 00:00:01 |
    |* 1 | TABLE ACCESS FULL| SPM_TEST_TAB | 1 | 25 | 13 (0)| 00:00:01 |
    ----------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    1 - filter("ID"=1113)

    Note
    -----
    - SQL plan baseline "SQL_PLAN_c3kdaabyt1wxfb65c37c8" used for this statement

    Statistics
    ----------------------------------------------------------
    7 recursive calls
    0 db block gets
    45 consistent gets
    0 physical reads
    0 redo size
    367 bytes sent via SQL*Net to client
    476 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    1 rows processed

可见,执行计划已改变

注: 测试过程中发现,使用SQL Plan Baseline的sql无法再用SELECT * FROM TABLE (SELECT DBMS_XPLAN.DISPLAY_CURSOR(null,null,'advanced') from dual)获取其上次的执行计划。

从Cursor Cache中加载执行计划到SPM中,是最常用的一种方式,下面将介绍整个加载的步骤。

  1. 确认sql在Cursor Cache中

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
     select s.CHILD_NUMBER,
    s.PLAN_HASH_VALUE,
    s.HASH_VALUE,
    s.PARSING_SCHEMA_NAME,
    s.LAST_ACTIVE_TIME,
    s.OPTIMIZER_COST,
    s.sql_plan_baseline
    from v$sql s
    where sql_id = '1s6a8wn4p6rym'
    order by s.LAST_ACTIVE_TIME desc;

    *注意,不是所有在v$sql中看到的plan都被加载到SPM中,其中有个规律是,如果plan是相同的,即便是不同的schema,在SPM中都只有一条plan,而且这不同的schema执行同样的sql时,会使用同一个执行计划基线(前提是基线的plan能够在各自的schema中重新生成)

  2. 使用DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE加载执行计划

    1
    2
    3
    4
    5
    6
    declare
    v_sql_plan_id pls_integer;
    begin
    v_sql_plan_id := dbms_spm.load_plans_from_cursor_cache(sql_id => '59t5p5xv2bjjy');
    end;
    /

    该匿名块会将CURSOR_CACHE中所有的plan都加入到SQL Plan Baseline中,并默认为accepted,如需改变默认效果,请参阅Oracle Database PL/SQL Packages and Types Reference

  3. 查看DBA_SQL_PLAN_BASELINES确认load结果

    1
    2
    3
    4
    5
    SELECT *
    FROM dba_sql_plan_baselines
    WHERE signature IN
    ( SELECT exact_matching_signature FROM v$sql WHERE sql_id='59t5p5xv2bjjy'
    );
  4. 删除掉不想要的Baseline

    如果能确定不想要的plan(当然,也可以在load的时候进行选择),那么可以通过SPM.DROP_SQL_PLAN_BASELINE删除

    先找到sql_handle,再查找该sql_handle下所有的plan,例如:

    1
    2
    3
    4
    5
    SELECT sql_handle
    FROM dba_sql_plan_baselines
    WHERE signature IN
    ( SELECT exact_matching_signature FROM v$sql WHERE sql_id='59t5p5xv2bjjy'
    );
1
2
3
4
5
6
7
8
9
10
11
12
13
14

select SIGNATURE,
SQL_HANDLE,
SQL_TEXT,
PLAN_NAME,
PARSING_SCHEMA_NAME,
LAST_EXECUTED,
ENABLED,
ACCEPTED,
REPRODUCED,
EXECUTIONS,
optimizer_cost
from dba_sql_plan_baselines;
where sql_handle ='SQL_fca270c30418ee86';
确定不需要的plan有很多方式,如果你正在tuning该条sql,那么你对不好的执行计划的cost肯定印象深刻,那么就可以根据optimizer_cost确定PLAN_NAME,也可以通过*dbms_xplan.display_sql_plan_baseline*来浏览执行计划确定,接下来就可以删除该plan了:

1
2
3
4
5
6
7
8
9
DECLARE
v_dropped_plans number;
BEGIN
v_dropped_plans := DBMS_SPM.DROP_SQL_PLAN_BASELINE (
sql_handle => 'SQL_971c23013e9cf52a',
plan_name => 'SQL_PLAN_9f71304z9tx9a42ef5257'
);
DBMS_OUTPUT.PUT_LINE('dropped ' || v_dropped_plans || ' plans');
END;
  1. 演化(evolve)新的plan

    11g默认为已存在Baseline的SQL自动捕捉新的plan,但新的plan为unaccepted的状态,要使用新的plan,需要演化(evolve)为accepted,下面介绍11g的演化方法,12c请参阅MANUALLY EVOLVING SQL PLAN BASELINES IN ORACLE DATABASE 12C RELEASE 2

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    DECLARE  
    l_plans_altered clob;
    BEGIN
    l_plans_altered := dbms_spm.evolve_sql_plan_baseline(
    sql_handle => 'SQL_fca270c30418ee86',
    plan_name => 'SQL_PLAN_gt8mhsc21jvn62e14fda0',
    verify =>'YES',
    commit =>'YES');
    DBMS_OUTPUT.put_line('Plans Altered: ' || l_plans_altered);
    END;
    /

    该匿名块会评估测试执行plan_name为SQL_PLAN_gt8mhsc21jvn62e14fda0的plan,并生成报告,如果它的性能优于现存的accepted的plan,那么它会被自动接受为accepted,否则便不会被接受。

在oracle 12.1之前,是使用EVOLVE_SQL_PLAN_BASELINE函数来演化SQL plan baselines,从12c开始,演化的过程就被基于task的方法代替了,12c的演化过程主要包括以下几个阶段:

  • CREATE_EVOLVE_TASK
  • EXECUTE_EVOLVE_TASK
  • REPORT_EVOLVE_TASK
  • IMPLEMENT_EVOLVE_TASK

下面将演示以下12c执行计划基线演化的过程:

先来创建一个测试表:

1
2
3
4
5
6
7
8
9
10
11
CREATE TABLE spm_test_tab (
id NUMBER,
description VARCHAR2(50)
);

INSERT /*+ APPEND */ INTO spm_test_tab
SELECT level,
'Description for ' || level
FROM dual
CONNECT BY level <= 10000;
COMMIT;

查询一条记录,并查看其执行计划

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
SELECT description
FROM spm_test_tab
WHERE id = 99;

SELECT * FROM TABLE (SELECT DBMS_XPLAN.DISPLAY_CURSOR(null,null,'advanced') from dual);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID gat6z1bc6nc2d, child number 0
-------------------------------------
SELECT description FROM spm_test_tab WHERE id = 99

Plan hash value: 1107868462

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 13 (100)| |
|* 1 | TABLE ACCESS FULL| SPM_TEST_TAB | 1 | 40 | 13 (0)| 00:00:01 |
----------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

1 - SEL$1 / SPM_TEST_TAB@SEL$1

Outline Data
-------------

/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
DB_VERSION('12.2.0.1')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
FULL(@"SEL$1" "SPM_TEST_TAB"@"SEL$1")
END_OUTLINE_DATA
*/

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ID"=99)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

1 - "DESCRIPTION"[VARCHAR2,50]

Note
-----
- dynamic statistics used: dynamic sampling (level=2)

查询sql_id

1
2
3
4
5
6
7
8
9

SELECT sql_id
FROM v$sql
WHERE plan_hash_value = 1107868462;

SQL_ID
-------------
gat6z1bc6nc2d

将该sql的执行计划从cursor cache中加载到SQL Plan Baseline

1
2
3
4
5
6
7
8
9
10
11
12
13
SET SERVEROUTPUT ON
DECLARE
l_plans_loaded PLS_INTEGER;
BEGIN
l_plans_loaded := DBMS_SPM.load_plans_from_cursor_cache(
sql_id => 'gat6z1bc6nc2d');

DBMS_OUTPUT.put_line('Plans Loaded: ' || l_plans_loaded);
END;
/
Plans Loaded: 1

PL/SQL procedure successfully completed.

在DBA_SQL_PLAN_BASELINES中查询baseline的信息

1
2
3
4
5
6
7
8
9
COLUMN sql_handle FORMAT A20
COLUMN plan_name FORMAT A30

SELECT sql_handle, plan_name, enabled, accepted
FROM dba_sql_plan_baselines;

SQL_HANDLE PLAN_NAME ENA ACC
-------------------- ------------------------------ --- ---
SQL_7b76323ad90440b9 SQL_PLAN_7qxjk7bch8h5tb65c37c8 YES YES

创建一条索引,再次执行该sql,查看执行计划结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
CREATE INDEX spm_test_tab_idx ON spm_test_tab(id);
begin
dbms_stats.gather_table_stats(ownname => 'SYS',
tabname =>'SPM_TEST_TAB',
estimate_percent =>DBMS_STATS.AUTO_SAMPLE_SIZE,
method_opt => 'FOR ALL COLUMNS SIZE AUTO',
granularity=>'ALL',
cascade => TRUE,
no_invalidate => false);
end;
/

SELECT description
FROM spm_test_tab
WHERE id = 99;

Execution Plan
----------------------------------------------------------
Plan hash value: 1107868462

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 25 | 13 (0)| 00:00:01|
|* 1 | TABLE ACCESS FULL | SPM_TEST_TAB | 1 | 25 | 13 (0)| 00:00:01|
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ID"=99)

Note
-----
- SQL plan baseline "SQL_PLAN_7qxjk7bch8h5tb65c37c8" used for this statement


Statistics
----------------------------------------------------------
0 recursive calls
3 db block gets
52 consistent gets
0 physical reads
0 redo size
365 bytes sent via SQL*Net to client
484 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

可见,在有索引的情况下执行计划依然走了全表扫描,说明SPM已发挥作用,现在查看一下DBA_SQL_PLAN_BASELINES

1
2
3
4
5
6
7
8
SELECT sql_handle, plan_name, enabled, accepted 
FROM dba_sql_plan_baselines
WHERE sql_handle = 'SQL_7b76323ad90440b9';

SQL_HANDLE PLAN_NAME ENA ACC
-------------------- ------------------------------ --- ---
SQL_7b76323ad90440b9 SQL_PLAN_7qxjk7bch8h5t3652c362 YES NO
SQL_7b76323ad90440b9 SQL_PLAN_7qxjk7bch8h5tb65c37c8 YES YES

现在执行计划基线中已经有了一条未被accepted的计划,我们可以查看一下该计划的内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
select 
*
from table(
dbms_xplan.display_sql_plan_baseline(
sql_handle=>'SQL_7b76323ad90440b9',
plan_name=>'SQL_PLAN_7qxjk7bch8h5t3652c362',
format=>'basic +NOTE'));

--------------------------------------------------------------------------------
SQL handle: SQL_7b76323ad90440b9
SQL text: SELECT description FROM spm_test_tab WHERE id = 99
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_7qxjk7bch8h5t3652c362 Plan id: 911393634
Enabled: YES Fixed: NO Accepted: NO Origin: AUTO-CAPTURE
Plan rows: From dictionary
--------------------------------------------------------------------------------

Plan hash value: 2338891031

----------------------------------------------------------------
| Id | Operation | Name |
----------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| SPM_TEST_TAB |
| 2 | INDEX RANGE SCAN | SPM_TEST_TAB_IDX |
----------------------------------------------------------------

可见新的plan是走索引的,现在可以等待维护窗口期间由自动维护任务自动evolve,或者手动进行evolve,现在演示手动evolve,创建一个evolve task

1
2
3
4
5
6
7
8
9
10
11
SET SERVEROUTPUT ON
DECLARE
l_return VARCHAR2(32767);
BEGIN
l_return := DBMS_SPM.create_evolve_task(sql_handle => 'SQL_7b76323ad90440b9');
DBMS_OUTPUT.put_line('Task Name: ' || l_return);
END;
/
Task Name: TASK_1169

PL/SQL procedure successfully completed.

执行此任务

1
2
3
4
5
6
7
8
9
10
11
SET SERVEROUTPUT ON
DECLARE
l_return VARCHAR2(32767);
BEGIN
l_return := DBMS_SPM.execute_evolve_task(task_name => 'TASK_1169');
DBMS_OUTPUT.put_line('Execution Name: ' || l_return);
END;
/
Execution Name: EXEC_1281

PL/SQL procedure successfully completed.

查看任务报告

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
SET LONG 1000000 PAGESIZE 1000 LONGCHUNKSIZE 100 LINESIZE 100
SELECT DBMS_SPM.report_evolve_task(task_name => 'TASK_1169', execution_name => 'EXEC_1281') AS output
FROM dual;

OUTPUT
----------------------------------------------------------------------------------------------------
GENERAL INFORMATION SECTION
---------------------------------------------------------------------------------------------

Task Information:
---------------------------------------------
Task Name : TASK_1169
Task Owner : SYS
Execution Name : EXEC_1281
Execution Type : SPM EVOLVE
Scope : COMPREHENSIVE
Status : COMPLETED
Started : 04/24/2017 15:04:17
Finished : 04/24/2017 15:04:25
Last Updated : 04/24/2017 15:04:25
Global Time Limit : 2147483646
Per-Plan Time Limit : UNUSED
Number of Errors : 0
---------------------------------------------------------------------------------------------

SUMMARY SECTION
---------------------------------------------------------------------------------------------
Number of plans processed : 1
Number of findings : 1
Number of recommendations : 1
Number of errors : 0
---------------------------------------------------------------------------------------------

DETAILS SECTION
---------------------------------------------------------------------------------------------
Object ID : 2
Test Plan Name : SQL_PLAN_7qxjk7bch8h5t3652c362
Base Plan Name : SQL_PLAN_7qxjk7bch8h5tb65c37c8
SQL Handle : SQL_7b76323ad90440b9
Parsing Schema : SYS
Test Plan Creator : SYS
SQL Text : SELECT description FROM spm_test_tab WHERE id = 99

Execution Statistics:
-----------------------------
Base Plan Test Plan
---------------------------- ----------------------------
Elapsed Time (s): .000037 .000005
CPU Time (s): .000044 0
Buffer Gets: 5 0
Optimizer Cost: 13 2
Disk Reads: 0 0
Direct Writes: 0 0
Rows Processed: 0 0
Executions: 10 10


FINDINGS SECTION
---------------------------------------------------------------------------------------------

Findings (1):
-----------------------------
1. The plan was verified in 0.03200 seconds. It passed the benefit criterion
because its verified performance was 18.01480 times better than that of the
baseline plan.

Recommendation:
-----------------------------
Consider accepting the plan. Execute
dbms_spm.accept_sql_plan_baseline(task_name => 'TASK_1169', object_id => 2,
task_owner => 'SYS');


EXPLAIN PLANS SECTION
---------------------------------------------------------------------------------------------

Baseline Plan
-----------------------------
Plan Id : 1
Plan Hash Value : 3059496904

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 25 | 13 | 00:00:01 |
| * 1 | TABLE ACCESS FULL | SPM_TEST_TAB | 1 | 25 | 13 | 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 1 - filter("ID"=99)


Test Plan
-----------------------------
Plan Id : 2
Plan Hash Value : 911393634

---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 25 | 2 | 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED | SPM_TEST_TAB | 1 | 25 | 2 | 00:00:01 |
| * 2 | INDEX RANGE SCAN | SPM_TEST_TAB_IDX | 1 | | 1 | 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("ID"=99)

---------------------------------------------------------------------------------------------

报告中建议执行dbms_spm.accept_sql_plan_baseline接受新的plan,但我们应该使用IMPLEMENT_EVOLVE_TASK来接受新的plan

1
2
3
4
5
6
7
8
9
10
11
SET SERVEROUTPUT ON
DECLARE
l_return NUMBER;
BEGIN
l_return := DBMS_SPM.implement_evolve_task(task_name => 'TASK_1169');
DBMS_OUTPUT.put_line('Plans Accepted: ' || l_return);
END;
/
Plans Accepted: 1

PL/SQL procedure successfully completed.

查看DBA_SQL_PLAN_BASELINES,新的plan应该已经是accepted了

1
2
3
4
5
6
7
8
SELECT sql_handle, plan_name, enabled, accepted 
FROM dba_sql_plan_baselines
WHERE sql_handle = 'SQL_7b76323ad90440b9';

SQL_HANDLE PLAN_NAME ENA ACC
-------------------- ------------------------------ --- ---
SQL_7b76323ad90440b9 SQL_PLAN_7qxjk7bch8h5t3652c362 YES YES
SQL_7b76323ad90440b9 SQL_PLAN_7qxjk7bch8h5tb65c37c8 YES YES

再次执行该sql,查看执行计划

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
SET AUTOTRACE TRACE LINESIZE 130

SELECT description
FROM spm_test_tab
WHERE id = 99;

Execution Plan
----------------------------------------------------------
Plan hash value: 2338891031

--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 25 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| SPM_TEST_TAB | 1 | 25 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SPM_TEST_TAB_IDX | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=99)

Note
-----
- SQL plan baseline "SQL_PLAN_7qxjk7bch8h5t3652c362" used for this statement


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
372 bytes sent via SQL*Net to client
484 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

由此可见,新的执行计划基线已被使用。

参考资料:Adaptive SQL Plan Management (SPM) in Oracle Database 12c Release 1 (12.1)

日常管理中经常会碰到需要drop表中字段的情况,如果表中数据很少时还可以直接drop,但是如果是一张大表,那么drop一个列是很耗时且耗资源的事,那么我们可以先逻辑上删除此字段,再在空闲的时候彻底drop掉以释放空间。

注:本文摘自官方文档

##1. Marking Columns Unused

If you are concerned about the length of time it could take to drop column data from all of the rows in a large table, you can use the ALTER TABLE...SET UNUSED statement. This statement marks one or more columns as unused, but does not actually remove the target column data or restore the disk space occupied by these columns. However, a column that is marked as unused is not displayed in queries or data dictionary views, and its name is removed so that a new column can reuse that name. All constraints, indexes, and statistics defined on the column are also removed.

To mark the hiredate and mgr columns as unused, execute the following statement:

1
ALTER TABLE hr.admin_emp SET UNUSED (hiredate, mgr);

You can later remove columns that are marked as unused by issuing an ALTER TABLE...DROP UNUSED COLUMNS statement. Unused columns are also removed from the target table whenever an explicit drop of any particular column or columns of the table is issued.

The data dictionary views USER_UNUSED_COL_TABS, ALL_UNUSED_COL_TABS, or DBA_UNUSED_COL_TABS can be used to list all tables containing unused columns. The COUNT field shows the number of unused columns in the table.

1
2
3
4
5
SELECT * FROM DBA_UNUSED_COL_TABS;

OWNER TABLE_NAME COUNT
--------------------------- --------------------------- -----
HR ADMIN_EMP 2

For external tables, the SET UNUSED statement is transparently converted into an ALTER TABLE DROP COLUMN statement. Because external tables consist of metadata only in the database, the DROP COLUMN statement performs equivalently to the SET UNUSED statement.
##2. Removing Unused Columns

The ALTER TABLE...DROP UNUSED COLUMNS statement is the only action allowed on unused columns. It physically removes unused columns from the table and reclaims disk space.

In the ALTER TABLE statement that follows, the optional clause CHECKPOINT is specified. This clause causes a checkpoint to be applied after processing the specified number of rows, in this case 250. Checkpointing cuts down on the amount of undo logs accumulated during the drop column operation to avoid a potential exhaustion of undo space.

1
ALTER TABLE hr.admin_emp DROP UNUSED COLUMNS CHECKPOINT 250;

上篇文章简要介绍了一下当外键无索引时,更新删除主表的数据会造成子表的锁定,如果此时子表上有事务,那么进行更新删除的session变会等待,等待事件就是enq: TM - contention

外键与 TM enqueue lock 的主要问题是 在早期版本中(9i之前) 当 子表child table上 的外键没有索引时 , 若发生 父表 parent table 上记录被delete 或 update时 , 会在child table上加 share lock, 这会 阻塞 child table 上的DML。
但是从 9i以后的当 子表child table上 的外键没有索引时, 父表parent table上的delete 、update 只在 实际这个DML执行的过程中要求share (TM lmode=4) lock,而不会在整个事务中 都要求保持 child table上的 share lock。

还是先了解一下oracle中的锁模式吧,TM锁和TX锁都属于DML锁,这里介绍的是TM的锁模式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Value   Name(s)                    Table method (TM lock)
0 No lock n/a

1 Null lock (NL) Used during some parallel DML operations (e.g. update) by
the pX slaves while the QC is holding an exclusive lock.

2 Sub-share (SS) Until 9.2.0.5/6 "select for update"
Row-share (RS) Since 9.2.0.1/2 used at opposite end of RI during DML
Lock table in row share mode
Lock table in share update mode

3 Sub-exclusive(SX) Update (also "select for update" from 9.2.0.5/6)
Row-exclusive(RX) Lock table in row exclusive mode
Since 11.1 used at opposite end of RI during DML

4 Share (S) Lock table in share mode
Can appear during parallel DML with id2 = 1, in the PX slave sessions
Common symptom of "foreign key locking" (missing index) problem

5 share sub exclusive (SSX) Lock table in share row exclusive mode
share row exclusive (SRX) Less common symptom of "foreign key locking" but likely to be more
frequent if the FK constraint is defined with "on delete cascade."

6 Exclusive (X) Lock table in exclusive mode

share lock就是mode为4的S锁

Summary of Locks Obtained by DML Statements

Summary of Locks Obtained by DML Statements

TM 锁在下列场景中被申请:

  • 在OPS(早期的RAC)中LGWR会以ID1=0 & ID2=0去申请该队列锁来检查 DML_LOCKS 在所有实例中是全0还是全非0
  • 当一个单表或分区 需要做不同的表/分区操作时,ORACLE需要协调这些操作,所以需要申请该队列锁。包括:
  • 启用参考约束 referential constraints
  • 修改约束从DIASABLE NOVALIDATE 到DISABLE VALIDATE
  • 重建IOT
  • 创建视图或者修改ALTER视图时可能需要申请该队列锁
  • 分析表统计信息或validate structure时
  • 一些PDML并行DML操作
  • 所有可能调用kkdllk()函数的操作
  • 太多太多了。。。

下面是各种锁之间的兼容性:

lock

好了,开始动手做个试验吧,试验中我会引用KST trace的内容,关于KST,本文不做介绍,只拿来使用

首先,准备环境,本实验均在11.2.0.4环境下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
SQL> conn lp/lp
Connected.
SQL> create table prim(a int,b varchar2(10));

Table created.

SQL> alter table prim add constraint PK_PRIM primary key(a);

Table altered.

SQL> create table child (ca int,cb varchar2(10));

Table created.

SQL> alter table child add constraint FK_CHILD_CA foreign key (ca) references prim(a);

Table altered.

SQL> insert into prim values(1,'asdasd');

1 row created.

SQL> insert into prim values(2,'asdasd');

1 row created.

SQL> insert into prim values(3,'asdasd');

1 row created.

SQL> commit;

Commit complete.

这里要说一下,在外键是否存在on delete cascade时锁的获取还有区别,所以我们分别来测试,首先是没有索引没有cascade的情况下,各个语句的锁获取情况

无索引,无cascade

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SQL> select distinct sid from v$mystat;

SID
----------
17

SQL> select pid,spid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat));

PID SPID
---------- ------------
36 2761

SQL> alter system set "_trace_events"='10000-10999:255:36';

System altered.

insert 父表:

1
insert into prim values(5,'asdasd');

查看kst信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
select kst.event,kst.sid,kst.pid,kst.function,kst.data from x$trace kst where pid=36 and sid=17;

10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563c-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10813 17 36 ktubnd ktubnd: Bind usn 3 nax 1 nbx 0 lng 0 par 0
10813 17 36 ktubnd ktubnd: Txn Bound xid: 3.3.1150
10704 17 36 ksqgtlctx ksqgtl: acquire TX-00030003-0000047e mode=X flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10811 17 36 ktbgcl1 3301000100000000 0000000000000000 69dd140000000000 0200000000000000
10811 17 36 ktbgcl1 3301000100000000 0000000000000000 0de3140000000000 0810c76a177f0000
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=173 seq_num=180 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=173 seq_num=180 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=4, exc=4, tot=4
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=174 seq_num=181 snap_id=1

可见父表上的插入会获取父表和子表mode3TM锁,TM后跟的是object_id的十六进制,一个TX锁,让我们验证一下:

1
2
3
4
5
6
7
SQL> select * from v$lock where type in('TM','TX');

ADDR KADDR SID TY ID1 ID2 IMODE REQUEST CTIME BLOCK
---------------- -------------- ---- -- ---------- ----------- -------- --------- -------- ---------
00007FD97FFE54E8 00007FD97FFE5548 17 TM 87614 0 3 0 8 0
00000000ACE37CD0 00000000ACE37D48 17 Tx 262162 851 6 0 8 0
00007FD97EFE54E8 00007FD97FFE5548 17 TM 87612 0 3 0 8 0

我们来commit一下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
10704 17 36 ksqrcli ksqrcl: release TX-00030003-0000047e mode=X
10813 17 36 ktudnx ktudnx: dec cnt xid:3.3.1150 nax:0 nbx:0
10704 17 36 ksqrcli ksqrcl: release TM-0001563e-00000000 mode=SX
10704 17 36 ksqrcli ksqrcl: release TM-0001563c-00000000 mode=SX
10021 17 36 kcrf_commit_force 2ee3140000000000 2fe3140000000000
10005 17 36 kslwtbctx KSL WAIT BEG [log file sync] 7416/0x1cf8 1368878/0x14e32e 0/0x0 wait_id=175 seq_num=182 snap_id=1
10005 17 36 ksliwat KSL FACILITY WAIT fac#=3 time_waited_csecs=1
10005 17 36 ksliwat KSL POST RCVD poster=11 num=76 loc='ksl2.h LINE:2374 ID:kslpsr' id1=138 id2=0 name=EV type=0 fac#=3 posted=0x3 may_be_posted=1
10005 17 36 kslwtectx KSL WAIT END [log file sync] 7416/0x1cf8 1368878/0x14e32e 0/0x0 wait_id=175 seq_num=182 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=11126, exc=11126, tot=11126
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=176 seq_num=183 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=176 seq_num=183 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=3, exc=3, tot=3
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=177 seq_num=184 snap_id=1

可见获得的锁全部一一释放

insert子表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
insert into child values(2,'sadsada');

10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563c-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10813 17 36 ktubnd ktubnd: Bind usn 7 nax 1 nbx 0 lng 0 par 0
10813 17 36 ktubnd ktubnd: Txn Bound xid: 7.17.835
10704 17 36 ksqgtlctx ksqgtl: acquire TX-00070011-00000343 mode=X flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=186 seq_num=193 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=186 seq_num=193 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=3, exc=3, tot=3
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=187 seq_num=194 snap_id=1

可见子表上的插入也会获取父表和子表mode3TM锁

update父表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
update prim set a=1 where a=1;

10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563c-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=S flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqrcli ksqrcl: release TM-0001563e-00000000 mode=S
10704 17 36 ksqrcli ksqrcl: SUCCESS
10811 17 36 ktbgcl1 2c01000100000000 0000000000000000 0fe7140000000000 0200000000000000
10811 17 36 ktbgcl1 2c01000100000000 0000000000000000 27e9140000000000 b80fb86a177f0000
10813 17 36 ktubnd ktubnd: Bind usn 4 nax 1 nbx 0 lng 0 par 0
10813 17 36 ktubnd ktubnd: Txn Bound xid: 4.10.851
10704 17 36 ksqgtlctx ksqgtl: acquire TX-0004000a-00000353 mode=X flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=204 seq_num=211 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=204 seq_num=211 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=5, exc=5, tot=5
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=205 seq_num=212 snap_id=1

可以很清楚的看到在执行语句期间,注意仅仅是语句的执行期间,会附加一个mode为4的S锁到子表上,很快便释放了

delete父表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
delete from prim where a=4

10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563c-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=S flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqrcli ksqrcl: release TM-0001563e-00000000 mode=S
10704 17 36 ksqrcli ksqrcl: SUCCESS
10813 17 36 ktubnd ktubnd: Bind usn 2 nax 1 nbx 0 lng 0 par 0
10813 17 36 ktubnd ktubnd: Txn Bound xid: 2.0.1117
10704 17 36 ksqgtlctx ksqgtl: acquire TX-00020000-0000045d mode=X flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=S flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqrcli ksqrcl: release TM-0001563e-00000000 mode=S
10704 17 36 ksqrcli ksqrcl: SUCCESS
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 bae9140000000000
10812 17 36 ktrgcm 0200000000000000 0000000000000000 5d04000000000000
10812 17 36 ktrgcm 9b12c00000000000 4401000000000000 0200000000000000
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 bae9140000000000
10812 17 36 ktrgcm 0200000000000000 0000000000000000 5d04000000000000
10812 17 36 ktrgcm 9b12c00000000000 4401000000000000 0200000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 bae9140000000000
10812 17 36 ktrgcm 0200000000000000 0000000000000000 5d04000000000000
10812 17 36 ktrgcm 9b12c00000000000 4401000000000000 0200000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 bae9140000000000
10812 17 36 ktrgcm 0200000000000000 0000000000000000 5d04000000000000
10812 17 36 ktrgcm 9b12c00000000000 4401000000000000 0200000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 bae9140000000000
10812 17 36 ktrgcm 0200000000000000 0000000000000000 5d04000000000000
10812 17 36 ktrgcm 9b12c00000000000 4401000000000000 0200000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 0000000000000000
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=227 seq_num=234 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=227 seq_num=234 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=3, exc=3, tot=3
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=228 seq_num=235 snap_id=1

delete跟update比多了一次S锁的获取和释放,为何呢,是否和删除的行数有关?我们再多删一行试试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
SQL> delete from prim where a=4 or a=5;

10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563c-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=S flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqrcli ksqrcl: release TM-0001563e-00000000 mode=S
10704 17 36 ksqrcli ksqrcl: SUCCESS
10813 17 36 ktubnd ktubnd: Bind usn 6 nax 1 nbx 0 lng 0 par 0
10813 17 36 ktubnd ktubnd: Txn Bound xid: 6.15.1290
10704 17 36 ksqgtlctx ksqgtl: acquire TX-0006000f-0000050a mode=X flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=S flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqrcli ksqrcl: release TM-0001563e-00000000 mode=S
10704 17 36 ksqrcli ksqrcl: SUCCESS
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 5eea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2200000000000000
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 5eea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2200000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 5eea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2200000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 5eea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2200000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 5eea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2200000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3301000100000000 0000000000000000 5eea140000000000
10812 17 36 ktrgcm 0000000000000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 0000000000000000 0000000000000000 0000000000000000
10811 17 36 ktbgcl1 3301000100000000 0000000000000000 5eea140000000000 0100000000000000
10811 17 36 ktbgcl1 3301000100000000 0000000000000000 5eea140000000000 e00fc76a177f0000
10812 17 36 kturCRBackoutOneChg 0100000000000000 bc06c00000000000 0e01000000000000 2200000000000000
10812 17 36 ktrgcm 3301000100000000 0100000000000000 0100000000000000
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=S flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqrcli ksqrcl: release TM-0001563e-00000000 mode=S
10704 17 36 ksqrcli ksqrcl: SUCCESS
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 5fea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2400000000000000
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 5fea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2400000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 5fea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2400000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 5fea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2400000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 5fea140000000000
10812 17 36 ktrgcm 0600000000000000 0f00000000000000 0a05000000000000
10812 17 36 ktrgcm bc06c00000000000 0e01000000000000 2400000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 0000000000000000
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=236 seq_num=243 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=236 seq_num=243 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=4, exc=4, tot=4
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=237 seq_num=244 snap_id=1

可以发现除了语句执行时需要获取一次S锁之外,删多少行就要获取多少次S锁,从之前的锁兼容列表就可发现S锁和SX(RX)锁是不兼容的,而SX(RX)是insert update delete获取的锁模式,可以想象如果此时子表上有事务,或者S锁获得了尚未释放的时候,子表要进行事务获取mode为3的SX(RX)锁时,session都会产生等待

看一下此时session获取的锁,记住这次结果,后面会有对比。

1
2
3
4
5
6
SQL> select *  from v$lock where type in('TM','TX');

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
00000000ACE37CD0 00000000ACE37D48 17 TX 393231 1290 6 0 195 0
00007FD97FFE6520 00007FD97FFE6580 17 TM 87612 0 3 0 195 0

可见语句执行完,已不持有子表上的任何锁

下面来模拟一下等待,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
sid:31
SQL> insert into child values(2,'12312');

1 row created.

SQL> select distinct sid from v$mystat;

SID
----------
31

sid:1169

SQL> update prim set a=1 where a=1;

SQL> select * from v$lock where type in('TM','TX');

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
00007FD97FFE54E8 00007FD97FFE5548 31 TM 87614 0 3 0 143 1
00000000ABE6BE80 00000000ABE6BEF8 31 TX 327713 1114 6 0 143 0
00007FD97FFE54E8 00007FD97FFE5548 1169 TM 87612 0 3 0 76 0
00007FD97FFE54E8 00007FD97FFE5548 31 TM 87612 0 3 0 143 0
00007FD97FFE54E8 00007FD97FFE5548 1169 TM 87614 0 0 4 76 0

此时查一下等待链

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
SQL>  --锁源头查找,带对象和sql以及event
SQL> WITH sessions AS
2 (SELECT /*+materialize*/
3 sid,
4 blocking_session,
5 blocking_instance,
6 row_wait_obj#,
7 sql_id,
8 inst_id,
9 event
10 FROM gv$session)
11 SELECT LPAD(' ', 4 * (level - 1)) || s.inst_id || '.' || sid sid,
12 object_name,
13 substr(sql_text, 1, 40) sql_text,
14 event
15 FROM sessions s
16 LEFT OUTER JOIN dba_objects d
17 ON (object_id = row_wait_obj#)
18 LEFT OUTER JOIN gv$sql q
19 ON (s.sql_id = q.SQL_ID and s.inst_id = q.INST_ID)
20 WHERE sid IN (SELECT blocking_session FROM sessions)
21 OR blocking_session IS NOT NULL
22 CONNECT BY PRIOR sid = blocking_session
23 START WITH blocking_session IS NULL;

SID OBJECT_NAM SQL_TEXT EVENT
---------- ---------- ---------------------------------------- ------------------------------
1.31 SQL*Net message from client
1.1169 CHILD update prim set a=1 where a=1 enq: TM - contention

从上面的分析我们知道,无论插入父表和子表,都会获取两张表上的mode3的锁,而mode3的锁和mode4的锁是不兼容的,也就是说此时父表上连插入都无法进行

再开第三个session

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
sid:1167
SQL> insert into prim values(7,'dasd'); --hang住了

SQL> select * from v$lock where type in('TM','TX');

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ------- -- ---------- ---------- ---------- ---------- ---------- ----------
00007FD97D39F2A8 00007FD97D39F308 31 TM 87614 0 3 0 476 1
00007FD97D39F2A8 00007FD97D39F308 1167 TM 87614 0 0 3 85 0
00000000ABE6BE80 00000000ABE6BEF8 31 TX 327713 1114 6 0 476 0
00007FD97D39F2A8 00007FD97D39F308 1169 TM 87612 0 3 0 409 0
00007FD97D39F2A8 00007FD97D39F308 31 TM 87612 0 3 0 476 0
00007FD97D39F2A8 00007FD97D39F308 1169 TM 87614 0 0 4 409 0
00007FD97D39F2A8 00007FD97D39F308 1167 TM 87612 0 3 0 85 0

SID OBJECT_NAME SQL_TEXT EVENT
-------------------- ------------------------------ ---------------------------------------- ----------------------------------------
1.31 SQL*Net message from client
1.1169 CHILD update prim set a=1 where a=1 enq: TM - contention
1.1167 CHILD insert into prim values(7,'dasd') enq: TM - contention

可见1167的session被阻塞了。

无索引,有cascade

1
2
3
4
5
6
7
 SQL> alter table child drop constraint FK_CHILD_CA;

Table altered.

SQL> alter table child add constraint FK_CHILD_CA foreign key (ca) references prim(a) on delete cascade;

Table altered.

有cascade的时候,仅在delete语句上有所区别,下面仅列出delete语句

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
SQL> delete from prim where a=2 or a=4;

2 rows deleted.

10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563c-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=SSX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqcnv ksqcnv: convert TM-0001563e-00000000 from=SSX to=SX flags=
10704 17 36 ksqcnv ksqcnv: SUCCESS
10813 17 36 ktubnd ktubnd: Bind usn 4 nax 1 nbx 0 lng 0 par 0
10813 17 36 ktubnd ktubnd: Txn Bound xid: 4.20.852
10704 17 36 ksqgtlctx ksqgtl: acquire TX-00040014-00000354 mode=X flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqcnv ksqcnv: convert TM-0001563e-00000000 from=SX to=SSX flags=
10704 17 36 ksqcnv ksqcnv: SUCCESS
10704 17 36 ksqcnv ksqcnv: convert TM-0001563e-00000000 from=SSX to=SX flags=
10704 17 36 ksqcnv ksqcnv: SUCCESS
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 bef2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0a00000000000000
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 bef2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0a00000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 bef2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0a00000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 bef2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0a00000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 bef2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0a00000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3301000100000000 0000000000000000 bef2140000000000
10812 17 36 ktrgcm 0000000000000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 0000000000000000 0000000000000000 0000000000000000
10811 17 36 ktbgcl1 3301000100000000 0000000000000000 bef2140000000000 0100000000000000
10811 17 36 ktbgcl1 3301000100000000 0000000000000000 bff2140000000000 e00fc76a177f0000
10812 17 36 kturCRBackoutOneChg 0100000000000000 ef00c00000000000 2801000000000000 0a00000000000000
10812 17 36 ktrgcm 3301000100000000 0100000000000000 0100000000000000
10704 17 36 ksqcnv ksqcnv: convert TM-0001563e-00000000 from=SX to=SSX flags=
10704 17 36 ksqcnv ksqcnv: SUCCESS
10704 17 36 ksqcnv ksqcnv: convert TM-0001563e-00000000 from=SSX to=SX flags=
10704 17 36 ksqcnv ksqcnv: SUCCESS
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 c0f2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0e00000000000000
10812 17 36 ktrgcm 3b01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 c0f2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0e00000000000000
10812 17 36 ktrgcm 3c01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 c0f2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0e00000000000000
10812 17 36 ktrgcm 3d01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 c0f2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0e00000000000000
10812 17 36 ktrgcm 3e01000100000000 0000000000000000 0000000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 c0f2140000000000
10812 17 36 ktrgcm 0400000000000000 1400000000000000 5403000000000000
10812 17 36 ktrgcm ef00c00000000000 2801000000000000 0e00000000000000
10812 17 36 ktrgcm 3f01000100000000 0000000000000000 0000000000000000
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=425 seq_num=432 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=425 seq_num=432 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=4, exc=4, tot=4
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=426 seq_num=433 snap_id=1

此时会申请一个mode为5的SSX锁,随后即转换为mode为3的SX锁,这也是在语句执行期间获取和转换的,并非事务期间,同样删除多少行就涉及到多少次获取转换,看一下此时锁获得情况

1
2
3
4
5
6
7
SQL> select *  from v$lock where type in('TM','TX');

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ------- -- ---------- ---------- ---------- ---------- ---------- ----------
00007FD97FFE6520 00007FD97FFE6580 17 TM 87614 0 3 0 131 0
00000000ACE37CD0 00000000ACE37D48 17 TX 262164 852 6 0 131 0
00007FD97FFE6520 00007FD97FFE6580 17 TM 87612 0 3 0 131 0

是不是和没有cascade的时候不同了,这次最终会持有子表上的mode为3的锁,我们再深入的思考一点,SSX锁和SX锁是不兼容的,这样是否就意味着后进行的delete会被先进行的delete阻塞(不同session),好,现在就来模拟一下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

sid:1169

SQL> delete from prim where a=1;

0 rows deleted.

SQL> select * from v$lock where type in('TM','TX');

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ------- -- ---------- ---------- ---------- ---------- ---------- ----------
00007FD97FFE54E8 00007FD97FFE5548 1169 TM 87612 0 3 0 53 0
00007FD97FFE54E8 00007FD97FFE5548 1169 TM 87614 0 3 0 53 0

sid:1167

SQL> delete from prim where a=2; --session hang住了

--查询此刻锁的持有情况
SQL> select * from v$lock where type in('TM','TX');

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ------- -- ---------- ---------- ---------- ---------- ---------- ----------
00007FD97FFE6520 00007FD97FFE6580 1169 TM 87612 0 3 0 85 0
00007FD97FFE6520 00007FD97FFE6580 1167 TM 87612 0 3 0 11 0
00007FD97FFE6520 00007FD97FFE6580 1167 TM 87614 0 0 5 11 0
00007FD97FFE6520 00007FD97FFE6580 1169 TM 87614 0 3 0 85 1

可见1167在请求mode为5的锁,且已被阻塞

查看等待链

1
2
3
4
SID                  OBJECT_NAME                    SQL_TEXT                                 EVENT
-------------------- ------------------------------ ---------------------------------------- ----------------------------------------
1.1169 SQL*Net message from client
1.1167 CHILD delete from prim where a=2 enq: TM - contention

因为delete完毕会持有子表上的SX锁,而SX锁与S锁不兼容,所以delete父表的session也会阻塞update父表的session,因为update会去请求子表的S锁,而此时子表上有SX锁,类似于子表上有事务在进行,这里就不在论述了,徒占篇幅。

有索引,无cascade

我看到有资料说,如果有索引时,对父表的操作,会级联加一个TM RS锁(level 2)到子表上。但我在试验中并未看到,也许是版本差异,我也未去求证,有索引时insert与无索引时在获取锁方面没有区别,这里仅列出update和delete

创建索引:

1
2
3
4
5
6
7
8
9
10
11
SQL> alter table child drop constraint FK_CHILD_CA;

Table altered.

SQL> alter table child add constraint FK_CHILD_CA foreign key (ca) references prim(a);

Table altered.

SQL> create index ind_child_ca on child(ca);

Index created.

update父表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SQL> update prim set a=6 where a=6;

1 row updated.

10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563c-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10811 17 36 ktbgcl1 2b01000100000000 0000000000000000 fadc140000000000 0200000000000000
10811 17 36 ktbgcl1 2b01000100000000 0000000000000000 cdf9140000000000 10dfb76a177f0000
10813 17 36 ktubnd ktubnd: Bind usn 3 nax 1 nbx 0 lng 0 par 0
10813 17 36 ktubnd ktubnd: Txn Bound xid: 3.28.1117
10704 17 36 ksqgtlctx ksqgtl: acquire TX-0003001c-0000045d mode=X flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=471 seq_num=478 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=471 seq_num=478 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=3, exc=3, tot=3
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=472 seq_num=479 snap_id=1

可见有了索引之后,不再需要在语句级别获取子表上的S锁了

delete父表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SQL> delete from prim where a=9;

1 row deleted.

10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563c-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10704 17 36 ksqgtlctx ksqgtl: acquire TM-0001563e-00000000 mode=SX flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10811 17 36 ktbgcl1 2c01000100000000 0000000000000000 66f1140000000000 0200000000000000
10811 17 36 ktbgcl1 2c01000100000000 0000000000000000 58fa140000000000 78efb76a177f0000
10813 17 36 ktubnd ktubnd: Bind usn 4 nax 1 nbx 0 lng 0 par 0
10813 17 36 ktubnd ktubnd: Txn Bound xid: 4.18.854
10704 17 36 ksqgtlctx ksqgtl: acquire TX-00040012-00000356 mode=X flags=GLOBAL|XACT why="contention"
10704 17 36 ksqgtlctx ksqgtl: SUCCESS
10811 17 36 ktbgcl1 3301000100000000 0000000000000000 c0f2140000000000 0200000000000000
10811 17 36 ktbgcl1 3301000100000000 0000000000000000 58fa140000000000 90eec66a177f0000
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=476 seq_num=483 snap_id=1
10005 17 36 kslwtectx KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=476 seq_num=483 snap_id=1
10005 17 36 kslwtectx KSL WAIT END wait times (usecs) - snap=4, exc=4, tot=4
10005 17 36 kslwtbctx KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 wait_id=477 seq_num=484 snap_id=1

与update相同,都持有了子表上的SX锁,而SX与SX是相容的,所以不会再产生锁定问题

1
2
3
4
5
6
7
SQL> select *  from v$lock where type in('TM','TX');

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ------- -- ---------- ---------- ---------- ---------- ---------- ----------
00007FD97FFE54E8 00007FD97FFE5548 17 TM 87614 0 3 0 182 0
00000000ABD4E7C0 00000000ABD4E838 17 TX 262162 854 6 0 182 0
00007FD97FFE54E8 00007FD97FFE5548 17 TM 87612 0 3 0 182 0

有索引,有cascade

表现与无cascade时相同

总结

  • 外键无索引锁无cascade时,update/delete父表,会在语句级别级联一个mode为4的S锁到子表,其中delete多少行就会级联多少次
  • 外键无索引有cascade时,update父表仍会在语句级别级联mode为4的S锁到子表,delete时会先获取mode为5的SSX锁,在将其转换成mode为3的SX锁,而且删除多少行就会涉及到多少次转换
  • 外键有索引无cascade时,update/delete不会在语句级级联锁到子表,最终会持有父表和子表上的mode为3的SX锁(无索引时只有有cascade的delete时最终会持有子表上的SX锁)
  • 外键有索引有cascade时,与无cascade表现相同

近日,开发负责人反映某生产环境业务处理缓慢,主要业务操作就是修改会员信息,登录查询后发现大量的session正在等待enq: TM - contention,且waiting的语句几乎都是update
session的即时信息没有保留,现在附上ash视图的一些统计信息,可以大概了解一下当时争用的场景

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
SQL> @ash_wait_chains.sql username||':'||program2||event2 session_type='FOREGROUND' sysdate-6/24 sysdate-5/24

-- Display ASH Wait Chain Signatures script v0.2 BETA by Tanel Poder ( http://blog.tanelpoder.com )

%This SECONDS AAS
------ ---------- ----------
WAIT_CHAIN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
72% 20073 5.6
-> JSCHPROD:(JDBC Thin Client) ON CPU

8% 2293 .6
-> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) ON CPU

8% 2141 .6
-> JSCHPROD:(JDBC Thin Client) log file sync -> SYS:(LGWR) log file parallel write

7% 1879 .5
-> JSCHPROD:(JDBC Thin Client) log file sync -> SYS:(LGWR) LGWR-LNS wait on channel

2% 654 .2
-> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) ON CPU

1% 288 .1
-> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) ON CPU

1% 149 0
-> JSCHPROD:(JDBC Thin Client) log file sync -> SYS:(LGWR) ON CPU

0% 128 0
-> JSCHPROD:(JDBC Thin Client) enq: TM - contention

0% 112 0
-> JSCHPROD:(JDBC Thin Client) log file sync

0% 86 0
-> JSCHPROD:(JDBC Thin Client) db file scattered read

0% 43 0
-> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) ON CPU

0% 37 0
-> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention

0% 25 0
-> JSCHPROD:(JDBC Thin Client) log file sync -> SYS:(LGWR) LGWR wait on LNS

0% 13 0
-> JSCHPROD:(plsqldev.exe) ON CPU

0% 11 0
-> SYS:(plsqldev.exe) ON CPU

0% 10 0
-> JSCHPROD:(JDBC Thin Client) SQL*Net more data from client

0% 9 0
-> JSCHPROD:(JDBC Thin Client) db file sequential read

0% 9 0
-> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) ON
CPU

0% 6 0
-> JSCHPROD:(JDBC Thin Client) read by other session -> JSCHPROD:(JDBC Thin Client) ON CPU

0% 4 0
-> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention -> JSCHPROD:(JDBC Thin Client) enq: TM - contention

0% 3 0
-> JSCHPROD:(JDBC Thin Client) SQL*Net more data to client

0% 3 0
-> JSCHPROD:(JDBC Thin Client) buffer busy waits [data block]

0% 3 0
-> SYS:(oraagent.bin) Disk file operations I/O

0% 3 0
-> JSCHPROD:(JDBC Thin Client) log file sync -> SYS:(LGWR) LGWR wait for redo copy

0% 2 0
-> JSCHPROD:(JDBC Thin Client) enq: TX - row lock contention

0% 2 0
-> JSCHPROD:(JDBC Thin Client) log file sync -> SYS:(LGWR) LGWR wait for redo copy -> JSCHPROD:(JDBC Thin Client) ON CPU

0% 2 0
-> JSCHPROD:(JDBC Thin Client) enq: TX - index contention -> JSCHPROD:(JDBC Thin Client) ON CPU

0% 1 0
-> JSCHPROD:(plsqldev.exe) log file sync -> SYS:(LGWR) log file parallel write

0% 1 0
-> SYS:(plsqldev.exe) Disk file operations I/O

0% 1 0
-> JSCHPROD:(JDBC Thin Client) enq: TX - row lock contention -> JSCHPROD:(JDBC Thin Client) ON CPU


30 rows selected.

可以看到,TM锁的争用很多,再看一份当时awr报告的top10

虽然占DBTIME不多,但本来是很快的操作,短时间内给人的感觉就是业务处理缓慢
查一下当时等待事件的p1,p2,p3的值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
select ash.SAMPLE_TIME,
ash.EVENT,
ash.SESSION_ID,
ash.BLOCKING_SESSION,
ash.P1TEXT,
ash.P1,
ash.P2TEXT,
ash.p2,
ash.p3text,
ash.p3,
ash.SESSION_STATE,
ash.SQL_OPNAME,
ash.SQL_ID
--ash.*
from v$active_session_history ash
where ash.SAMPLE_TIME >
to_date('20160425 10:00:00', 'yyyymmdd HH24:MI:SS')
and ash.SAMPLE_TIME <
to_date('20160425 12:10:00', 'yyyymmdd HH24:MI:SS')
and ash.WAIT_CLASS <> 'Idle'
and ash.EVENT like 'enq: TM - contention'
order by sample_time desc;

下面是部分结果

1
2
3
4
5
6
7
8
9
10
11
12
13
enq: TM - contention	1828	2401	name|mode	1414332421	object #	110417	table/partition	0	WAITING	UPDATE	ak25v8q8p6fzd
enq: TM - contention 1873 2504 name|mode 1414332419 object # 110415 table/partition 0 WAITING INSERT 7w0tma5up32wt
enq: TM - contention 2504 1828 name|mode 1414332421 object # 110415 table/partition 0 WAITING UPDATE ak25v8q8p6fzd
enq: TM - contention 772 4 name|mode 1414332421 object # 110417 table/partition 0 WAITING UPDATE ak25v8q8p6fzd
enq: TM - contention 1781 1828 name|mode 1414332419 object # 110415 table/partition 0 WAITING INSERT 7w0tma5up32wt
enq: TM - contention 1828 772 name|mode 1414332421 object # 110415 table/partition 0 WAITING UPDATE ak25v8q8p6fzd
enq: TM - contention 2401 1828 name|mode 1414332419 object # 110415 table/partition 0 WAITING INSERT 7w0tma5up32wt
enq: TM - contention 2504 772 name|mode 1414332421 object # 110415 table/partition 0 WAITING UPDATE ak25v8q8p6fzd
enq: TM - contention 4 772 name|mode 1414332419 object # 110415 table/partition 0 WAITING INSERT 7w0tma5up32wt
enq: TM - contention 148 1781 name|mode 1414332420 object # 110428 table/partition 0 WAITING UPDATE 9gd6xhd0xyhph
enq: TM - contention 772 148 name|mode 1414332421 object # 110415 table/partition 0 WAITING UPDATE ak25v8q8p6fzd
enq: TM - contention 1828 148 name|mode 1414332421 object # 110415 table/partition 0 WAITING UPDATE ak25v8q8p6fzd
enq: TM - contention 1873 772 name|mode 1414332419 object # 110415 table/partition 0 WAITING INSERT 7w0tma5up32wt

可以看到p2的值为产生TM争用的对象id,经过查证,这些object均是session正在更新的表的子表,而且通过v$sql查看update语句均更改了主表的主键,问题到这里已经很明朗了,由于外键没加索引,导致了主表在更新主表主键或删除主表记录时对子表的锁定,而且这张主表被大量的子表引用,此时子表上也同时进行事务处理,所以造成了更新主表的session 不时hang住。

通过对所有子表的外键加索引,消除了争用,检测未加索引的外键语句:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
SELECT TABLE_NAME,
CONSTRAINT_NAME,
CNAME1 || NVL2(CNAME2, ',' || CNAME2, NULL) ||
NVL2(CNAME3, ',' || CNAME3, NULL) ||
NVL2(CNAME4, ',' || CNAME4, NULL) ||
NVL2(CNAME5, ',' || CNAME5, NULL) ||
NVL2(CNAME6, ',' || CNAME6, NULL) ||
NVL2(CNAME7, ',' || CNAME7, NULL) ||
NVL2(CNAME8, ',' || CNAME8, NULL) COLUMNS
FROM (SELECT B.TABLE_NAME,
B.CONSTRAINT_NAME,
MAX(DECODE(POSITION, 1, COLUMN_NAME, NULL)) CNAME1,
MAX(DECODE(POSITION, 2, COLUMN_NAME, NULL)) CNAME2,
MAX(DECODE(POSITION, 3, COLUMN_NAME, NULL)) CNAME3,
MAX(DECODE(POSITION, 4, COLUMN_NAME, NULL)) CNAME4,
MAX(DECODE(POSITION, 5, COLUMN_NAME, NULL)) CNAME5,
MAX(DECODE(POSITION, 6, COLUMN_NAME, NULL)) CNAME6,
MAX(DECODE(POSITION, 7, COLUMN_NAME, NULL)) CNAME7,
MAX(DECODE(POSITION, 8, COLUMN_NAME, NULL)) CNAME8,
COUNT(*) COL_CNT
FROM (SELECT SUBSTR(TABLE_NAME, 1, 30) TABLE_NAME,
SUBSTR(CONSTRAINT_NAME, 1, 30) CONSTRAINT_NAME,
SUBSTR(COLUMN_NAME, 1, 30) COLUMN_NAME,
POSITION
FROM USER_CONS_COLUMNS) A,
USER_CONSTRAINTS B
WHERE A.CONSTRAINT_NAME = B.CONSTRAINT_NAME
AND B.CONSTRAINT_TYPE = 'R'
GROUP BY B.TABLE_NAME, B.CONSTRAINT_NAME) CONS
WHERE COL_CNT > ALL
(SELECT COUNT(*)
FROM USER_IND_COLUMNS I
WHERE I.TABLE_NAME = CONS.TABLE_NAME
AND I.COLUMN_NAME IN (CNAME1, CNAME2, CNAME3, CNAME4, CNAME5,
CNAME6, CNAME7, CNAME8)
AND I.COLUMN_POSITION <= CONS.COL_CNT
GROUP BY I.INDEX_NAME);

这是摘自TOM大师的语句,外键不加索引也是导致死锁的常见原因之一,因此对于主表经常进行更新删除操作的情况,外键一定要加索引。
至于外键未加索引是如何导致锁定的,以及为何加了索引后争用就消失了? 敬请关注enq: TM - contention解决之道——外键无索引导致锁争用 (下)

一次掉电后,DG主库启动alert中报如下错误

1
2
3
4
5
6
7
8
9
Errors in file /home/oracle/app/oracle/diag/rdbms/min/min/trace/min_m002_4099.trc  (incident=16328):
ORA-00600: internal error code, arguments: [13011], [6443], [8463760], [14], [8488694], [0], [], [], [], [], [], []
Incident details in: /home/oracle/app/oracle/diag/rdbms/min/min/incident/incdir_16328/min_m002_4099_i16328.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /home/oracle/app/oracle/diag/rdbms/min/min/trace/min_m002_4099.trc (incident=16329):
ORA-00600: internal error code, arguments: [kewrose_1], [600], [ORA-00600: internal error code, arguments: [13011], [6443], [8463760], [14], [8488694], [0], [], [], [], [], [], []
], [], [], [], [], [], [], [], [], []
Incident details in: /home/oracle/app/oracle/diag/rdbms/min/min/incident/incdir_16329/min_m002_4099_i16329.trc

查看trace文件,部分内容如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

*** 2016-07-20 18:21:04.185
*** SESSION ID:(169.43) 2016-07-20 18:21:04.185
*** CLIENT ID:() 2016-07-20 18:21:04.185
*** SERVICE NAME:(SYS$BACKGROUND) 2016-07-20 18:21:04.185
*** MODULE NAME:(MMON_SLAVE) 2016-07-20 18:21:04.185
*** ACTION NAME:(Auto-Purge Slave Action) 2016-07-20 18:21:04.185

Dump continued from file: /home/oracle/app/oracle/diag/rdbms/min/min/trace/min_m000_2615.trc
ORA-00600: internal error code, arguments: [13011], [6665], [8395075], [14], [8462787], [0], [], [], [], [], [], []

========= Dump for incident 14804 (ORA 600 [13011]) ========

*** 2016-07-20 18:21:04.186
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=c8taax4bfzsjc) -----
delete from WRH$_RSRC_PLAN tab where (:beg_snap <= tab.snap_id and tab.snap_id <= :end_snap and dbid = :dbid) and not exists (select 1 from WRM$_BASELINE b wh
ere (tab.dbid = b.dbid) and (tab.snap_id >= b.start_snap_id) and (tab.snap_id <= b.end_snap_id))

MOS中关于ORA-600 [13013]描述

1
2
3
4
5
6
7
Format: ORA-600 [13013] [a] [b] {c} [d] [e] [f]
Arg [a] Passcount
Arg [b] Data Object number
Arg {c} Tablespace Decimal Relative DBA (RDBA) of block containing the row to be updated
Arg [d] Row Slot number
Arg [e] Decimal RDBA of block being updated (Typically same as {c})
Arg [f] Code

确定object

1
2
3
4
5
SQL> select owner,object_name,object_type  from dba_objects where object_id=6665;

OWNER OBJECT_NAME OBJECT_TYPE
------------------------------ ------------------------------------------------------------ -------------------
SYS WRH$_RSRC_PLAN TABLE

可见与trc文件中的sql中对象名吻合,都是WRH$_RSRC_PLAN
查看trc文件中的PLAN TABLE

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
----- Plan Table -----

============
Plan Table
============
-----------------------------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------+-----------------------------------+
| 0 | DELETE STATEMENT | | | | 626 | |
| 1 | DELETE | WRH$_RSRC_PLAN | | | | |
| 2 | FILTER | | | | | |
| 3 | INDEX RANGE SCAN | WRH$_RSRC_PLAN_PK| 94 | 1598 | 8 | 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID | WRM$_BASELINE | 1 | 33 | 2 | 00:00:01 |
| 5 | INDEX RANGE SCAN | WRM$_BASELINE_PK | 1 | | 1 | 00:00:01 |
-----------------------------------------------------------+-----------------------------------+

1.The most common cause of this error is an index corruption. The first step is to check the indexes for corruption, i.e. run

1
2
3
SQL> ANALYZE INDEX WRH$_RSRC_PLAN_PK VALIDATE STRUCTURE;

Index analyzed.

2.If the indexes do not report corruption, further test for corruption the base tables referenced in the execution plan or the statement producing the error:

1
2
3
4
5
SQL> ANALYZE TABLE WRH$_RSRC_PLAN VALIDATE STRUCTURE CASCADE;
ANALYZE TABLE WRH$_RSRC_PLAN VALIDATE STRUCTURE CASCADE
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file

查看生成的trace文件内容

1
2
3
4
5
6
7
8
9
10
*** 2016-07-21 01:31:31.840
*** SESSION ID:(52.5) 2016-07-21 01:31:31.840
*** CLIENT ID:() 2016-07-21 01:31:31.840
*** SERVICE NAME:(SYS$USERS) 2016-07-21 01:31:31.840
*** MODULE NAME:(sqlplus@primary (TNS V1-V3)) 2016-07-21 01:31:31.840
*** ACTION NAME:() 2016-07-21 01:31:31.840

Table/Index row count mismatch
table 2901 : index 3203, 296
Index root = tsn: 1 rdba: 0x0080194a

对于文件内容,简单介绍如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
trace文件中包含:

<description>: tsn: <tablespace number> rdba: <relative dba>

description有以下值:

"row not found in index"
"Table/Index row count mismatch"
"row mismatch in index dba"
"Table row count/Bitmap index bit count mismatch"
"kdavls: kdcchk returns %d when checking cluster dba 0x%08lx objn %d\n"

tsn: Tablespace Number表示的是索引存储的表空间编号。
rdba: 是索引段头相对于数据块的存储地址。

运行以下sql查询索引的文件和块号

1
2
3
4
5
6
7
8
9
SELECT dbms_utility.data_block_address_file(
to_number(trim(leading '0' from
replace('&&rdba','0x','')),'XXXXXXXX')
) AS rfile#,
dbms_utility.data_block_address_block(
to_number(trim(leading '0' from
replace('&&rdba','0x','')),'XXXXXXXX')
) AS block#
FROM dual;

结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SQL> SELECT dbms_utility.data_block_address_file(
2 to_number(trim(leading '0' from
3 replace('&&rdba','0x','')),'XXXXXXXX')
4 ) AS rfile#,
5 dbms_utility.data_block_address_block(
6 to_number(trim(leading '0' from
7 replace('&&rdba','0x','')),'XXXXXXXX')
8 ) AS block#
9 FROM dual;
Enter value for rdba: 0x0080194a
old 3: replace('&&rdba','0x','')),'XXXXXXXX')
new 3: replace('0x0080194a','0x','')),'XXXXXXXX')
old 7: replace('&&rdba','0x','')),'XXXXXXXX')
new 7: replace('0x0080194a','0x','')),'XXXXXXXX')

RFILE# BLOCK#
---------- ----------
2 6474

接下来运行如下查询,定位具体的segment

1
2
3
4
select owner, segment_name, segment_type
from dba_segments
where header_file = <rfile#>
and header_block = <block#>

结果如下:

1
2
3
4
5
6
7
SQL> select owner, segment_name, segment_type
2 from dba_segments
3 where header_file = 2
4 and header_block = 6474;
OWNER SEGMENT_NAME SEGMENT_TYPE
------------------------------ --------------------------------------------------------------------------------- ------------------
SYS WRH$_RSRC_PLAN_PK INDEX

看来仍然是上面分析过的索引,这里是因为表和索引的数据不一致,那么重建索引

1
2
3
4
alter table WRH$_RSRC_PLAN drop constraint WRH$_RSRC_PLAN_PK cascade;

alter table WRH$_RSRC_PLAN
add constraint WRH$_RSRC_PLAN_PK primary key (DBID, SNAP_ID, INSTANCE_NUMBER, SEQUENCE#);

再次分析表,发现错误消失了

1
2
3
SQL> ANALYZE TABLE WRH$_RSRC_PLAN VALIDATE STRUCTURE CASCADE;

Table analyzed.

至此,问题解决

掉电仍是数据库的大敌,而此次并非只在这一张表上有讹误,还有其他一些WRH$表,这些表均属于sysaux表空间,这些表都是awr系列表,写入频繁,掉电容易引起数据的不一致性,但是,正因为是awr数据,并不是特别重要,修复起来也相对容易。

记录一下AWR Performance Tables介绍

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
The Oracle10g dynamic performance tables constitute the foundation of sophisticated automations such as Automatic Memory Management  (AMM ) as well as intelligent advisory tools such as ADDM and the SQL Tuning   Advisor.

Remember, the AWR is a core feature of the 10g database kernel and automatically collects and stores important run-time performance information for our historical analysis.

The tables that store this information are prefixed with wrh$ and are very similar in function to the STATSPACK tables. This could make STATSPACK appear somewhat obsolete, although it is still available in the $ORACLE_HOME/rdbms/admin directory.

Unlike the more cumbersome STATSPACK utility, which requires knowledge of the table structure and creation of complex query scripts, the 10g Enterprise Manager (OEM) automatically displays and interprets this valuable time-series performance data.

The wrh$ AWR tables store important historical statistical information about the database in the form of periodic snapshots. Each snapshot is a capture of the in–memory x$ fixed view and other control structures at a certain point in time. Each of the AWR table names is prefixed with wrm$ (Metadata tables), wrh$ (History tables), or wri$ (Advisory tables).

1. The wrm$ tables store metadata information for the Workload Repository.

2. The wrh$ tables store historical data or snapshots.

3. The wri$ tables: These 49 tables store data related to advisory functions.

Once you start using the Broker you must always use the Broker to make any changes to your Data Guard configuration. This means that you must use Grid Control or the Broker CLI DGMGRL to change any Data Guard settings. If you use SQL*Plus to make configuration changes, the Broker will put things back the way it sees the world or this will lead to inconsistencies between the Broker configuration parameters and the database.

一旦开始使用Broker,就必须一直使用它修改DG的配置,这意味着必须使用grid control或者dgmgrl修改DG的配置,如果使用sqlplus修改配置,broker会将配置改回原样,或者导致broker配置参数与数据库之间不一致

此例记录修改保护模式,由最高可用降级为最大性能

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
DGMGRL> show configuration;
Configuration - minconf
Protection Mode: MaxAvailability
Databases:
minstd - Primary database
min - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
DGMGRL> EDIT DATABASE 'minstd' SET PROPERTY 'LogXptMode'='ASYNC';
Property "LogXptMode" updated
DGMGRL> EDIT DATABASE 'min' SET PROPERTY 'LogXptMode'='ASYNC';
Error: ORA-16805: change of LogXptMode property violates overall protection mode
Failed.
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxPerformance;
Succeeded.
DGMGRL> show configuration;
Configuration - minconf
Protection Mode: MaxPerformance
Databases:
minstd - Primary database
min - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
````

使用broker修改会使LOG_ARCHIVE_DEST_2参数变得较为复杂,例如

service="minstd", LGWR SYNC AFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300 db_unique_name="minstd" net_timeout=30, valid_for=(all_logfiles,primary_role)


而且修改后在sqlplus中show parameter LOG_ARCHIVE_DEST_2仅有主库显示更改之后的状态,而备库还保留原样,但是在show database verbose中是修改好的,仅当在发生切换时,备库变为主库时才会更正参数,可见最好使用broker来管理配置
0%