前几天go1.13发布,modules默认开启,从此modules转正成为golang官方原生的包依赖管理方式;除了modules,go1.13中还增加了新的语法,如二进制、八进制、十六进制字面量表示法,defer性能的增强,新的errors等等,社区已有很多相关特性的论述文章;此文仅简单讨论一下go1.13中modules的一些改变,毕竟包的管理跟我们日常开发是息息相关的,行文仓促,若有不当之处,希望读者斧正。

本文仅介绍modules在1.11/1.12/1.13等版本中的变化,并不介绍modules的使用,如需了解modules的详细使用方法,请参考官方文档或其他社区文章

Go Moudles

modules是go1.11推出的特性,官方称是GOPATH的替代品,是一个完整的支持包分发和版本控制的工具,使用modules,工作区不再局限于GOPATH之内,从而使构建更加可靠和可重复,但modules在go1.11版本中仅仅是一个实验性的功能,紧接着在go1.12中得到了增强,而刚刚发布的go1.13中得到了转正,GOPATH的作用进一步被弱化,Go Moudles开始大规模使用。

两个模式

对于modules这种模式官网有一个称呼是Module-aware,我不知道如何去翻译这个组合词,与之相对的,就是在Module-aware mode之前我们使用的包管理方式称为GOPATH mode,他们的区别如下:

  • GOPATH mode: go command从vendorGOPATH下寻找依赖,依赖会被下载至GOPATH/src

  • Module-aware mode: go command不再考虑GOPATH,仅仅使用GOPATH/pkg/mod存储下载的依赖,并且是多版本并存

注意:Module-aware开启和关闭的情况下,go get 的使用方式不是完全相同的。在 modules 模式开启的情况下,可以通过在 package 后面添加 @version 来表明要升级(降级)到某个版本。如果没有指明 version 的情况下,则默认先下载打了 tag 的 release 版本,比如 v0.4.5 或者 v1.2.3;如果没有 release 版本,则下载最新的 pre release 版本,比如 v0.0.1-pre1。如果还没有则下载最新的 commit。这个地方给我们的一个启示是如果我们不按规范的方式来命名我们的 package 的 tag,则 modules 是无法管理的。version 的格式为 v(major).(minor).(patch) ,

在 modules 开启的模式下,go get 还支持 version 模糊查询,比如 > v1.0.0 表示大于 v1.0.0 的可使用版本;< v1.12.0 表示小于 v1.12.0 版本下最近可用的版本。version 的比较规则按照 version 的各个字段来展开。

除了指定版本,我们还可以使用如下命名使用最近的可行的版本:

  • go get -u 使用最新的 minor 或者 patch 版本

  • go get -u=patch 使用最新的 patch 版本

GO111MODULE

在 1.12 版本之前,使用 Go modules 之前需要环境变量 GO111MODULE:

  • GO111MODULE=off: 不使用 Module-aware mode。

  • GO111MODULE=on: 使用 Module-aware mode,不会去 GOPATH 下面查找依赖包。

  • GO111MODULE=auto或unset: Golang 自己检测是不是使用Module-aware mode。

go1.11时GO111MODULE=on有一个很不好的体验,就是go command依赖go.mod文件,也就是如果在module文件夹外使用go get等命令会报如下错误:

1
2
3
4
5
[root@k8s-node1 ~]# go get github.com/google/go-cmp
go: cannot find main module; see 'go help modules'
[root@k8s-node1 ~]# touch go.mod
[root@k8s-node1 ~]# go get github.com/google/go-cmp
go: cannot determine module path for source directory /root (outside GOPATH, no import comments)

这个情况在go1.12中得到了解决,可以在module directory之外使用go command。

但我个人比较喜欢不去设置GO111MODULE,根据官方描述在不设置GO111MODULE的情况下或者设为auto的时候,如果在当前目录或者父目录中有go.mod文件,那么就使用Module-aware mode, 而go1.12中,如果包位于GOPATH/src下,且GO111MODULE=auto, 即使有go.mod的存在,go仍然使用GOPATH mode:

1
2
3
4
[root@VM_0_6_centos test]# go get github.com/jlaffaye/ftp        
go get: warning: modules disabled by GO111MODULE=auto in GOPATH/src;
ignoring go.mod;
see 'go help modules'

这个现象在go1.13中又发生了改变.

modules in go1.13

GO111MODULE

The GO111MODULE environment variable continues to default to auto, but the auto setting now activates the module-aware mode of the go command whenever the current working directory contains, or is below a directory containing, a go.mod file — even if the current directory is within GOPATH/src.

Go 1.13 includes support for Go modules. Module-aware mode is active by default whenever a go.mod file is found in, or in a parent of, the current directory.

可见,modules 在 Go 1.13 的版本下是默认开启的,GOPATH的地位进一步被弱化。

GOPROXY

GOPROXY环境变量是伴随着modules而生的,在go1.13中得到了增强,可以设置为逗号分隔的url列表来指定多个代理,其默认值为https://proxy.golang.org,direct,也就是说google为我们维护了一个代理服务器,但是因为墙的存在,这个默认设置对中国的gopher并无卵用,应第一时间修改。

go命令在需要下载库包的时候将逐个试用设置中的各个代理,直到发现一个可用的为止。direct表示直连,所有direct之后的proxy都不会被使用,一个设置例子:

1
GOPROXY=https://proxy.golang.org,https://myproxy.mysite:8888,direct

GOPROXY环境变量可以帮助我们下载墙外的第三方库包,比较知名的中国区代理goproxy.cn。当然,通过设置https_proxy环境变量设也可以达到此目的。但是一个公司通过在内部架设一个自己的goproxy服务器来缓存第三方库包,库包下载速度会更快,可以感觉到module有一点maven的意思了,但是易用性上还有很长的路要走。

GOPRIVATE

使用GOPROXY可以获取公共的包,这些包在获取的时候会去https://sum.golang.org进行校验,这对中国的gopher来说又是一个比较坑的地方,Go为了安全性推出了Go checksum database(sumdb),环境变量为GOSUMDB,go命令将在必要的时候连接此服务来检查下载的第三方依赖包的哈希是否和sumdb的记录相匹配。很遗憾,在中国也被墙了,可以选择设置为一个第三方的校验库,也可更直接点将GOSUMDB设为off关闭哈希校验,当然就不是很安全了。

除了public的包,在现实开发中我们更多的是使用很多private的包,因此就不适合走代理,所以go1.13推出了一个新的环境变量GOPRIVATE,它的值是一个以逗号分隔的列表,支持正则(正则语法遵守 Golang 的 包 path.Match)。在GOPRIVATE中设置的包不走proxy和checksum database,例如:

1
GOPRIVATE=*.corp.example.com,rsc.io/private

GONOSUMDB 和 GONOPROXY

这两个环境变量根据字面意思就能明白是设置不进行校验和不走代理的包,设置方法也是以逗号分隔

go env -w

可能是go也觉得环境变量有点多了,干脆为go env增加了一个选项-w,来设置全局环境变量,在Linux系统上我们可以这样用:

1
2
3
go env -w GOPROXY=https://goproxy.cn,direct
go env -w GOPRIVATE=*.gitlab.com,*.gitee.com
go env -w GOSUMDB=off

总结

1.13中包管理的改变来看,有些乏善可陈,go的包管理很难让大多数开发者满意,当我看到越来越多的环境变量时,心里忍不住唾弃,使用这么多环境变量是一个多么蠢的方法,希望未来Go能给大家带来更好的包管理方式吧,就像Java的maven那样。

参考文献:

  1. Go 1.13 Release Notes
  2. Go 1.12 Release Notes
  3. Go 1.11 Release Notes
  4. Go Modules 不完全教程

%0A的问题

在使用Go语言的net/url包进行编码组装url的时候,遇到如下报错:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
2019-07-31 16:55:46.850 ERROR   executor/driver_rollback.go:41  encounter an error:bad response code: 404
github.com/glory-cd/agent/executor.(*HttpFileHandler).Get
/home/liupeng/cdp/src/agent/executor/file_http.go:115
github.com/glory-cd/agent/executor.(*Client).Get
/home/liupeng/cdp/src/agent/executor/client.go:66
github.com/glory-cd/agent/executor.Get
/home/liupeng/cdp/src/agent/executor/filehandler.go:33
github.com/glory-cd/agent/executor.(*Roll).getCode
/home/liupeng/cdp/src/agent/executor/driver_rollback.go:77
github.com/glory-cd/agent/executor.(*Roll).Exec
/home/liupeng/cdp/src/agent/executor/driver_rollback.go:52
runtime.goexit
/usr/lib/go/src/runtime/asm_amd64.s:1337
the kv is: {"url":"http://admin:xxxx@192.168.1.75:32749/test/1.0.0/Gateway.zip%0A"}

可见,最终组装成的url末尾多了%0A,从而导致http请求返回404,那么%0A是怎么来的呢? 那就回溯一下url的组装过程吧。

我用来组装url的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
//创建url.URL
func (hu *HttpFileHandler) newPostUrl() string {
requestURL := new(url.URL)

requestURL.Scheme = "http"

requestURL.User = url.UserPassword(hu.client.User, hu.client.Pass)

requestURL.Host = hu.client.Addr

requestURL.Path += hu.client.RelativePath

return requestURL.String()
}

其中Path部分是使用hu.client.RelativePath进行拼接的,RelativePathient的来源如下:

1
2
3
4
5
6
7
8
9
func (d *driver) readServiceVerion() (string, error) {
versionFile := filepath.Join(d.Dir, common.PathFile)
path, err := ioutil.ReadFile(versionFile)
if err != nil {
return "", errors.WithStack(err)
}

return string(path), nil
}

发现RelativePathient是从文件中读取的,我这个文件中只有一行内容,那么再结合url encode来看,这个%0A就是一个linefeed,是一个换行符,那么处理方法就简单了,返回的时候Trim一下就可以解决换行和空格的问题。

1
return strings.TrimSpace(string(path)), nil

接下来再简单聊一下url encode

url encode


百分号编码(英语:Percent-encoding,又称:URL编码(英语:URL encoding)),是特定上下文的统一资源定位符 (URL)的编码机制. 实际上也适用于统一资源标志符(URI)的编码。也用于为application/x-www-form-urlencodedMIME准备数据,因为它用于通过HTTP的请求操作(request)提交HTML表单数据。


上面是维基百科对url编码的解释,通常如果一样东西需要编码,说明这样东西并不适合传输。原因多种多样,如Size过大,包含隐私数据,对于Url来说,之所以要进行编码,是因为Url中有些字符会引起歧义。

例如,Url参数字符串中使用key=value键值对这样的形式来传参,键值对之间以&符号分隔,如/s?q=abc& ie=utf-8。如果你的value字符串中包含了=或者&,那么势必会造成接收Url的服务器解析错误,因此必须将引起歧义的&和= 符号进行转义,也就是对其进行编码。

又如,Url的编码格式采用的是ASCII码,而不是Unicode,这也就是说你不能在Url中包含任何非ASCII字符,例如中文。否则如果客户端浏览器和服务端浏览器支持的字符集不同的情况下,中文可能会造成问题。

Url编码的原则就是使用安全的字符(没有特殊用途或者特殊意义的可打印字符)去表示那些不安全的字符。

语法构成

URI是统一资源标识的意思,通常我们所说的URL只是URI的一种。典型URL的格式如下所示。下面提到的URL编码,实际上应该指的是URI编码。

rfc3986中解释URI的构成如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
   The generic URI syntax consists of a hierarchical sequence of
components referred to as the scheme, authority, path, query, and
fragment.

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

hier-part = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty

The following are two example URIs and their component parts:

foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
| _____________________|__
/ \ / \
urn:example:animal:ferret:nose

哪些字符需要编码

RFC3986文档规定,Url中只允许包含英文字母(a-zA-Z)、数字(0-9)、-_.~4个特殊字符以及所有保留字符。 RFC3986文档对Url的编解码问题做出了详细的建议,指出了哪些字符需要被编码才不会引起Url语义的转变,以及对为什么这些字符需要编码做出了相 应的解释。

US-ASCII字符集中没有对应的可打印字符:Url中只允许使用可打印字符。US-ASCII码中的10-7F字节全都表示控制字符,这些 字符都不能直接出现在Url中。同时,对于80-FF字节(ISO-8859-1),由于已经超出了US-ACII定义的字节范围,因此也不可以放在 Url中。

保留字符:Url可以划分成若干个组件,协议、主机、路径等。有一些字符(:/?#[]@)是用作分隔不同组件的。例如:冒号用于分隔协议和主 机,/用于分隔主机和路径,?用于分隔路径和查询参数,等等。还有一些字符(!$&'()*+,;=)用于在每个组件中起到分隔作用的,如=用于 表示查询参数中的键值对,&符号用于分隔查询多个键值对。当组件中的普通数据包含这些特殊字符时,需要对其进行编码。

RFC3986中指定了以下字符为保留字符:! * ' ( ) ; : @ & = + $ , / ? # [ ]

不安全字符:还有一些字符,当他们直接放在Url中的时候,可能会引起解析程序的歧义。这些字符被视为不安全字符,原因有很多。

  • 空格:Url在传输的过程,或者用户在排版的过程,或者文本处理程序在处理Url的过程,都有可能引入无关紧要的空格,或者将那些有意义的空格给去掉。
  • 引号以及<>:引号和尖括号通常用于在普通文本中起到分隔Url的作用
  • #:通常用于表示书签或者锚点
  • %:百分号本身用作对不安全字符进行编码时使用的特殊字符,因此本身需要编码
  • {}|^[]`~:某一些网关或者传输代理会篡改这些字符

需要注意的是,对于Url中的合法字符,编码和不编码是等价的,但是对于上面提到的这些字符,如果不经过编码,那么它们有可能会造成Url语义 的不同。因此对于Url而言,只有普通英文字符和数字,特殊字符$-_.+!*'()还有保留字符,才能出现在未经编码的Url之中。其他字符均需要经过 编码之后才能出现在Url中。


如何对url进行编码

Url编码通常也被称为百分号编码(Url Encoding,also known as percent-encoding),是因为它的编码方式非常简单,使用%百分号加上两位的字符——0123456789ABCDEF——代表一个字节的 十六进制形式。Url编码默认使用的字符集是US-ASCII。例如a在US-ASCII码中对应的字节是0x61,那么Url编码之后得到的就 是%61,我们在地址栏上输入http://g.cn/search?q=%61%62%63,实际上就等同于在google上搜索abc了。又如@符号 在ASCII字符集中对应的字节为0x40,经过Url编码之后得到的是%40。

对于非ASCII字符,需要使用ASCII字符集的超集进行编码得到相应的字节,然后对每个字节执行百分号编码。对于Unicode字 符,RFC文档建议使用utf-8对其进行编码得到相应的字节,然后对每个字节执行百分号编码。如"中文"使用UTF-8字符集得到的字节为0xE4 0xB8 0xAD 0xE6 0x96 0x87,经过Url编码之后得到"%E4%B8%AD%E6%96%87"。

如果某个字节对应着ASCII字符集中的某个非保留字符,则此字节无需使用百分号表示。例如"Url编码",使用UTF-8编码得到的字节是 0x55 0x72 0x6C 0xE7 0xBC 0x96 0xE7 0xA0 0x81,由于前三个字节对应着ASCII中的非保留字符"Url",因此这三个字节可以用非保留字符"Url"表示。最终的Url编码可以简化 成"Url%E7%BC%96%E7%A0%81" ,当然,如果你用"%55%72%6C%E7%BC%96%E7%A0%81"也是可以的。

使用vim去除EOL

还可以使用vim去除文件末尾的换行

1
2
3
4
5
6
You can turn off the 'eol' option and turn on the 'binary' option to write
a file without the EOL at the end of the file:

:set binary
:set noeol
:w

什么是inode

理解inode,要从文件储存说起。

文件储存在硬盘上,硬盘的最小存储单位叫做"扇区"(Sector)。每个扇区储存512字节(相当于0.5KB)。

操作系统读取硬盘的时候,不会一个个扇区地读取,这样效率太低,而是一次性连续读取多个扇区,即一次性读取一个"块"(block)。这种由多个扇区组成的"块",是文件存取的最小单位。"块"的大小,最常见的是4KB,即连续八个 sector组成一个 block。

文件数据都储存在"块"中,那么很显然,我们还必须找到一个地方储存文件的元信息,比如文件的创建者、文件的创建日期、文件的大小等等。这种储存文件元信息的区域就叫做inode,中文译名为"索引节点"。

每一个文件都有对应的inode,里面包含了与该文件有关的一些信息。

inode的内容

inode包含文件的元信息,具体来说有以下内容:

  • 文件的字节数
  • 文件拥有者的User ID
  • 文件的Group ID
  • 文件的读、写、执行权限
  • 文件的时间戳
  • 链接数,即有多少文件名指向这个inode
  • 文件数据block的位置

使用stat命令查看某个文件的inode信息

1
2
3
4
5
6
7
8
9
 $ stat proxy.sh 
File: proxy.sh
Size: 89 Blocks: 8 IO Block: 4096 regular file
Device: 802h/2050d Inode: 9714171 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 1000/ liupeng) Gid: ( 1000/ liupeng)
Access: 2019-07-29 10:25:57.880429139 +0800
Modify: 2019-07-29 10:25:57.880429139 +0800
Change: 2019-07-29 10:26:05.910683677 +0800
Birth: 2019-07-29 10:25:57.880429139 +0800

inode的大小

inode也会消耗硬盘空间,所以硬盘格式化的时候,操作系统自动将硬盘分成两个区域。一个是数据区,存放文件数据;另一个是inode区(inode table),存放inode所包含的信息。

每个inode节点的大小,一般是128字节或256字节,甚至可以手动指定到2K,inode节点的总数,在格式化时就给定,以ext3/ext4为例:

  • 每个 inode 大小为 256byte,block 大小为 4k byte;
  • 根据 block count 和 inode count,我们也可以算出 16k bytes-per-inode(15728384*4096/3932160)

也就是文件系统在创建的时候每16k空间自动划分一个inode,如果你需要存储的是大量的小文件,那么你应该在格式化分区的时候手动修改bytes-per-inode的值,例如:

1
mkfs.ext4 -i 8192 /dev/sda1

而在xfs文件系统中-i inode_options中的maxpct=value描述如下:

This specifies the maximum percentage of space in the filesystem that can be allocated to inodes. The default value is 25% for filesystems under 1TB, 5% for filesystems under 50TB and 1% for filesystems over 50TB.

可见默认情况下xfs文件系统要比ext文件系统分配更多的inode

可以使用df -i查看inode的大小和使用率

1
2
3
4
5
6
7
8
9
10
$ df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
dev 1015658 414 1015244 1% /dev
run 1017797 680 1017117 1% /run
/dev/sda2 14057472 1003259 13054213 8% /
tmpfs 1017797 129 1017668 1% /dev/shm
tmpfs 1017797 18 1017779 1% /sys/fs/cgroup
tmpfs 1017797 629 1017168 1% /tmp
/dev/sda1 0 0 0 - /boot/efi
tmpfs 1017797 37 1017760 1% /run/user/1000

inode号

每个inode都有一个号码,操作系统用inode号码来识别不同的文件。

这里值得重复一遍,Unix/Linux系统内部不使用文件名,而使用inode号码来识别文件。对于系统来说,文件名只是inode号码便于识别的别称或者绰号。

表面上,用户通过文件名,打开文件。实际上,系统内部这个过程分成三步:首先,系统找到这个文件名对应的inode号码;其次,通过inode号码,获取inode信息;最后,根据inode信息,找到文件数据所在的block,读出数据。

使用ls -i命令,可以看到文件名对应的inode号码:

1
2
$ ls -i proxy.sh 
9714171 proxy.sh

目录文件

Unix/Linux系统中,目录(directory)也是一种文件。打开目录,实际上就是打开目录文件。

目录文件的结构非常简单,就是一系列目录项(dirent)的列表。每个目录项,由两部分组成:所包含文件的文件名,以及该文件名对应的inode号码。

查看目录的inode

通过介绍我们知道,通常情况下每个文件对应一个inode,那么如果想查找某个目录使用的inode数量,则可以使用如下命令:

1
clear;echo "Detailed Inode usage: $(pwd)" ; for d in `find -maxdepth 1 -type d |cut -d\/ -f2 |grep -xv . |sort`; do c=$(find $d |wc -l) ; printf "$c\t\t- $d\n" ; done ; printf "Total: \t\t$(find $(pwd) | wc -l)\n"

输出如下:

1
2
3
4
5
6
7
8
Detailed Inode usage: /home/liupeng/liupzmin.github.io
312 - .git
8049 - node_modules
294 - public
4 - scaffolds
45 - source
409 - themes
Total: 9120

要清理inode,只要找到包含大量文件的目录删除之即可。

参考文献:

  1. 理解inode
  2. How to find the INODE usage on Linux

Golang提供了几个包可以将文件压缩为不同的类型,这篇博客主要展示一下archive/zip这个包的用法,如何将文件或文件夹压缩为zip格式,以及如何进行解压缩。

Compressing

usage

1
2
zipit("/tmp/documents", "/tmp/backup.zip", "*.log")
zipit("/tmp/report.txt", "/tmp/report-2015.zip", "*.log")

func Zipit

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
//压缩为zip格式
//source为要压缩的文件或文件夹, 绝对路径和相对路径都可以
//target是目标文件
//filter是过滤正则(Golang 的 包 path.Match)
func Zipit(source, target, filter string) error {
var err error
if isAbs := filepath.IsAbs(source); !isAbs {
source, err = filepath.Abs(source) // 将传入路径直接转化为绝对路径
if err != nil {
return errors.WithStack(err)
}
}
//创建zip包文件
zipfile, err := os.Create(target)
if err != nil {
return errors.WithStack(err)
}

defer func() {
if err := zipfile.Close(); err != nil{
log.Slogger.Errorf("*File close error: %s, file: %s", err.Error(), zipfile.Name())
}
}()

//创建zip.Writer
zw := zip.NewWriter(zipfile)

defer func() {
if err := zw.Close(); err != nil{
log.Slogger.Errorf("zipwriter close error: %s", err.Error())
}
}()

info, err := os.Stat(source)
if err != nil {
return errors.WithStack(err)
}

var baseDir string
if info.IsDir() {
baseDir = filepath.Base(source)
}

err = filepath.Walk(source, func(path string, info os.FileInfo, err error) error {

if err != nil {
return errors.WithStack(err)
}

//将遍历到的路径与pattern进行匹配
ism, err := filepath.Match(filter, info.Name())

if err != nil {
return errors.WithStack(err)
}
//如果匹配就忽略
if ism {
return nil
}
//创建文件头
header, err := zip.FileInfoHeader(info)
if err != nil {
return errors.WithStack(err)
}

if baseDir != "" {
header.Name = filepath.Join(baseDir, strings.TrimPrefix(path, source))
}

if info.IsDir() {
header.Name += "/"
} else {
header.Method = zip.Deflate
}
//写入文件头信息
writer, err := zw.CreateHeader(header)
if err != nil {
return errors.WithStack(err)
}

if info.IsDir() {
return nil
}
//写入文件内容
file, err := os.Open(path)
if err != nil {
return errors.WithStack(err)
}

defer func() {
if err := file.Close(); err != nil{
log.Slogger.Errorf("*File close error: %s, file: %s", err.Error(), file.Name())
}
}()
_, err = io.Copy(writer, file)

return errors.WithStack(err)
})

if err != nil {
return errors.WithStack(err)
}

return nil
}

Extracting

usage

1
unzip("/tmp/report-2015.zip", "/tmp/reports/")

func Unzip

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
//解压zip
func Unzip(archive, target string) error {
reader, err := zip.OpenReader(archive)
if err != nil {
return errors.WithStack(err)
}

if err := os.MkdirAll(target, 0755); err != nil {
return errors.WithStack(err)
}

for _, file := range reader.File {
unzippath := filepath.Join(target, file.Name)
if file.FileInfo().IsDir() {
err := os.MkdirAll(unzippath, file.Mode())
if err != nil {
return errors.WithStack(err)
}
continue
}

fileReader, err := file.Open()
if err != nil {
return errors.WithStack(err)
}
defer fileReader.Close()

targetFile, err := os.OpenFile(unzippath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, file.Mode())
if err != nil {
return errors.WithStack(err)
}
defer targetFile.Close()

if _, err := io.Copy(targetFile, fileReader); err != nil {
return errors.WithStack(err)
}
}

return nil
}

参考文献:

  1. Golang: Working with ZIP archives

Passing arguments to ... parameters

If f is variadic with a final parameter p of type ...T, then within f the type of p is equivalent to type []T. If f is invoked with no actual arguments for p, the value passed to p is nil. Otherwise, the value passed is a new slice of type []T with a new underlying array whose successive elements are the actual arguments, which all must be assignable to T. The length and capacity of the slice is therefore the number of arguments bound to p and may differ for each call site.

Given the function and calls

1
2
3
func Greeting(prefix string, who ...string)
Greeting("nobody")
Greeting("hello:", "Joe", "Anna", "Eileen")

within Greeting, who will have the value nil in the first call, and []string{"Joe", "Anna", "Eileen"} in the second.

If the final argument is assignable to a slice type []T, it may be passed unchanged as the value for a ...T parameter if the argument is followed by .... In this case no new slice is created.

Given the slice s and call

1
2
s := []string{"James", "Jasmine"}
Greeting("goodbye:", s...)

within Greeting, who will have the same value as s with the same underlying array.

上面是官方文档中关于...的用法的描述,主要有两种使用形式:

参数列表

如果最后一个函数参数的类型的是...T,那么在调用这个函数的时候,我们可以在参数列表的最后使用若干个类型为T的参数。这里,...T在函数内部的类型实际是[]T,是隐式创建的slice:

1
2
3
4
5
6
7
8
9
func Sum(nums ...int) int {
res := 0
for _, n := range nums {
res += n
}
return res
}

Sum(1,2,3)

slice参数

如果传入的最后一个函数参数是slice []T,直接在slice后跟...,那么就不会创建新的slice:

1
2
primes := []int{2, 3, 5, 7}
fmt.Println(Sum(primes...)) // 17

因此,当我们想把一个slice追加到另外一个slice时,可以使用如下方式:

1
2
3
4
s1 := []int{0, 1, 2, 3}
s2 := []int{4, 5, 6, 7}
s1 = append(s1, s2...) // instead of FOR
fmt.Println(s1)

或者,当我们想在slice中删除一个元素时:

1
2
3
4
5
6
s1 := []int{0, 1, 2, 3}
fo index, v := range s1{
if index == 2{
s1 = append(s1[:index], s1[index+1:]...)
}
}

标识数组元素个数

默认情况下,数组的每个元素都被初始化为元素类型对应的零值,对于数字类型来说就是0。我们也可以使用数组字面值语法用一组值来初始化数组:

1
2
3
var q [3]int = [3]int{1, 2, 3}
var r [3]int = [3]int{1, 2}
fmt.Println(r[2]) // "0"

在数组字面值中,如果在数组的长度位置出现的是“...”省略号,则表示数组的长度是根据初始化值的个数来计算。因此,上面q数组的定义可以简化为

1
2
q := [...]int{1, 2, 3}
fmt.Printf("%T\n", q) // "[3]int"

Go命令行中的通配符

描述包文件的通配符。
在这个例子中,会单元测试当前目录和所有子目录的所有包:

1
go test ./...

参考文献:

  1. golang 三个点(three dots)的用法
  2. Passing arguments to ... parameters

Cannot run program "python3": error=2, 没有那个文件或目录

spark2中默认使用的是python2,可以通过以下三种方式之一使用python3:

  1. PYSPARK_PYTHON=python3 pyspark2
  2. 修改~/.bash_profile,增加 PYSPARK_PYTHON=python3
  3. 修改spark-env.sh增加PYSPARK_PYTHON=/usr/local/bin/python3

如果使用前2种不带绝对路径的变量声明时可能会遇到Cannot run program "python3": error=2, 没有那个文件或目录错误,原因是我的spark环境默认的是运行在yarn上的,当执行RDD任务时会在其他节点报错:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
[root@hadoop-04 ~]# PYSPARK_PYTHON=python3 pyspark2
Python 3.6.4 (default, Mar 21 2018, 13:55:56)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.2.0.cloudera2
/_/

Using Python version 3.6.4 (default, Mar 21 2018 13:55:56)
SparkSession available as 'spark'.
>>> lines = sc.textFile('/afis/flume/auth/2018/03/16/auth.1521129675887.log')
>>> pythonlines = lines.filter(lambda line:"python" in line)
>>> pythonlines.count()
[Stage 0:> (0 + 2) / 2]18/03/22 13:25:22 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, hadoop-02, executor 2): java.io.IOException: Cannot run program "python3": error=2, 没有那个文件或目录
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163)
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:65)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=2, 没有那个文件或目录
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 14 more
......
>>> 18/03/22 13:25:23 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, hadoop-03, executor 1): TaskKilled (stage cancelled)

hadoop-02这个节点上找不到python3导致任务终止,既然提示在其他节点上找不到,那在本地节点运行会是哪种结果呢?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[root@hadoop-04 ~]# PYSPARK_PYTHON=python3 pyspark2 --master local
Python 3.6.4 (default, Mar 21 2018, 13:55:56)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.2.0.cloudera2
/_/

Using Python version 3.6.4 (default, Mar 21 2018 13:55:56)
SparkSession available as 'spark'.
>>> lines = sc.textFile('/afis/flume/auth/2018/03/16/auth.1521129675887.log')
>>> pythonlines = lines.filter(lambda line:"python" in line)
>>> pythonlines.count()
0

可见在本地运行是没有问题的,那问题就出在python3的可执行文件少了绝对路径,猜测是spark内部的任务调度执行的时候没有使用操作系统的PATH导致找不到可执行文件,现在把python3的可执行文件路径补全:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[root@hadoop-04 ~]# PYSPARK_PYTHON=/usr/local/bin/python3 pyspark2
Python 3.6.4 (default, Mar 21 2018, 13:55:56)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/03/22 14:54:52 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.2.0.cloudera2
/_/

Using Python version 3.6.4 (default, Mar 21 2018 13:55:56)
SparkSession available as 'spark'.
>>> lines = sc.textFile('/afis/flume/auth/2018/03/16/auth.1521129675887.log')
>>> pythonlines = lines.filter(lambda line:"python" in line)
>>> pythonlines.count()
0
>>> pythonlines = lines.filter(lambda line:"SessionTask" in line)
>>> pythonlines.count()
719

可见,要在spark2上使用python3需要设置PYSPARK_PYTHON为可执行文件的绝对路径,优先推荐设置spark-env.sh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@hadoop-01 ~]# more /etc/spark2/conf/spark-env.sh 
#!/usr/bin/env bash
##
# Generated by Cloudera Manager and should not be modified directly
##

SELF="$(cd $(dirname $BASH_SOURCE) && pwd)"
if [ -z "$SPARK_CONF_DIR" ]; then
export SPARK_CONF_DIR="$SELF"
fi

export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2
export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/hadoop
export PYSPARK_PYTHON=/usr/local/bin/python3

1. 几种方式

目前要把kafka中的数据传输到elasticsearch集群大概有一下几种方法:

  • logstash

  • flume

  • spark streaming

  • kafka connect

  • 自己开发程序读取kafka写入elastic

其中logstash看到网上说不太稳定,且我目前用过版本2.3 ,确实经常出现crash的情况,所以一开始并未考虑;首先尝试的是通过flume到es,因为目前kafka到HDFS中间用的是flume,想再加一个通道和sink到es,而且flume也有es的sink,但是我的flume是最新版1.8,elasticsearch也是最新版6.2.2,中间碰到了兼容性问题,未能成功;转而去研究kafka connect,按照《kafka权威指南》上的例子研究了一下,同样遇到兼容性问题,在我的版本组合中无法奏效,但我不想去修改已经安装好的flume或者es集群,spark streaming过于复杂,自己开发程序成本过高、且周期较长;最终去尝试logstash,结果配置非常容易,简单奏效,稳定性问题暂时无法看出,留待日后详测,现记录一下配置。

2. logstash配置

2.1 kafka input plugin

1
2
3
4
5
6
7
8
9
10
11
12
input{
kafka{
bootstrap_servers => ["192.168.1.120:9092,192.168.1.121:9092,192.168.1.122:9092"]
client_id => "test"
group_id => "logstash-es"
auto_offset_reset => "latest"
consumer_threads => 5
decorate_events => "true"
topics => ["auth","base","clearing","trademgt"]
type => "kafka-to-elas"
}
}

其中配置项decorate_events 颇为有用,如果只用了单个logstash,订阅了多个主题,你肯定希望在es中为不同主题创建不同的索引,那么decorate_events 就是你想要的,看官方解释:

decorate_events

  • Value type is boolean
  • Default value is false

Option to add Kafka metadata like topic, message size to the event. This will add a field named kafka to the logstash event containing the following attributes: topic: The topic this message is associated with consumer_group: The consumer group used to read in this event partition: The partition this message is associated with offset: The offset from the partition this message is associated with key: A ByteBuffer containing the message key

大意是指定这个选项为true时,会附加kafka的一些信息到logstash event的一个名为kafka的域中,例如topic、消息大小、偏移量、consumer_group等,具体如下:

Metadata fields

The following metadata from Kafka broker are added under the [@metadata] field:

  • [@metadata][kafka][topic]: Original Kafka topic from where the message was consumed.
  • [@metadata][kafka][consumer_group]: Consumer group
  • [@metadata][kafka][partition]: Partition info for this message.
  • [@metadata][kafka][offset]: Original record offset for this message.
  • [@metadata][kafka][key]: Record key, if any.
  • [@metadata][kafka][timestamp]: Timestamp when this message was received by the Kafka broker.

Please note that @metadata fields are not part of any of your events at output time. If you need these information to be inserted into your original event, you’ll have to use the mutate filter to manually copy the required fields into your event.

值得注意的是,在output的时候这些域的元数据信息并不是event的一部分,如果希望这些元数据插入到原生的event中,就需要利用mutate手动copy进event,我们接下来会在filter中利用kafka域的内容构建自定义的域。

2.2 构建[@metadata][index]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
filter {
if [@metadata][kafka][topic] == "auth" {
mutate {
add_field => {"[@metadata][index]" => "auth-%{+YYYY.MM.dd}"}
}
} else if [@metadata][kafka][topic] == "base" {
mutate {
add_field => {"[@metadata][index]" => "base-%{+YYYY.MM.dd}"}
}

}else if [@metadata][kafka][topic] == "clearing" {

mutate {
add_field => {"[@metadata][index]" => "clearing-%{+YYYY.MM.dd}"}
}
}else{
mutate {
add_field => {"[@metadata][index]" => "trademgt-%{+YYYY.MM.dd}"}
}
}
# remove the field containing the decorations, unless you want them to land into ES
mutate {
remove_field => ["kafka"]
}
}

为每个主题构建对应的[@metadata][index],并在接下来output中引用

2.3 elasticsearch output plugin

1
2
3
4
5
6
7
8
output{
elasticsearch{
hosts => ["192.168.1.123:9200","192.168.1.122:9200","192.168.1.121:9200"]
index => "%{[@metadata][index]}"
timeout => 300
}

}

启动logstash就可以在kibana中观察到数据了。

3. 特殊的metadata

The @metadata field

In Logstash 1.5 and later, there is a special field called @metadata. The contents of @metadata will not be part of any of your events at output time, which makes it great to use for conditionals, or extending and building event fields with field reference and sprintf formatting.

The following configuration file will yield events from STDIN. Whatever is typed will become the message field in the event. The mutate events in the filter block will add a few fields, some nested in the @metadata field.

1
2
3
4
5
6
7
8
9
10
11
12
13
input { stdin { } }

filter {
mutate { add_field => { "show" => "This data will be in the output" } }
mutate { add_field => { "[@metadata][test]" => "Hello" } }
mutate { add_field => { "[@metadata][no_show]" => "This data will not be in the output" } }
}

output {
if [@metadata][test] == "Hello" {
stdout { codec => rubydebug }
}
}

Let’s see what comes out:

1
2
3
4
5
6
7
8
9
10
$ bin/logstash -f ../test.conf
Pipeline main started
asdf
{
"@timestamp" => 2016-06-30T02:42:51.496Z,
"@version" => "1",
"host" => "example.com",
"show" => "This data will be in the output",
"message" => "asdf"
}

The "asdf" typed in became the message field contents, and the conditional successfully evaluated the contents of the test field nested within the @metadata field. But the output did not show a field called @metadata, or its contents.

The rubydebug codec allows you to reveal the contents of the @metadata field if you add a config flag, metadata => true:

1
stdout { codec => rubydebug { metadata => true } }

Let’s see what the output looks like with this change:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ bin/logstash -f ../test.conf
Pipeline main started
asdf
{
"@timestamp" => 2016-06-30T02:46:48.565Z,
"@metadata" => {
"test" => "Hello",
"no_show" => "This data will not be in the output"
},
"@version" => "1",
"host" => "example.com",
"show" => "This data will be in the output",
"message" => "asdf"
}

Now you can see the @metadata field and its sub-fields.

Important

Only the rubydebug codec allows you to show the contents of the @metadata field.

Make use of the @metadata field any time you need a temporary field but do not want it to be in the final output.

Perhaps one of the most common use cases for this new field is with the date filter and having a temporary timestamp.

This configuration file has been simplified, but uses the timestamp format common to Apache and Nginx web servers. In the past, you’d have to delete the timestamp field yourself, after using it to overwrite the @timestamp field. With the @metadata field, this is no longer necessary:

1
2
3
4
5
6
7
8
9
10
input { stdin { } }

filter {
grok { match => [ "message", "%{HTTPDATE:[@metadata][timestamp]}" ] }
date { match => [ "[@metadata][timestamp]", "dd/MMM/yyyy:HH:mm:ss Z" ] }
}

output {
stdout { codec => rubydebug }
}

Notice that this configuration puts the extracted date into the [@metadata][timestamp] field in the grok filter. Let’s feed this configuration a sample date string and see what comes out:

1
2
3
4
5
6
7
8
9
$ bin/logstash -f ../test.conf
Pipeline main started
02/Mar/2014:15:36:43 +0100
{
"@timestamp" => 2014-03-02T14:36:43.000Z,
"@version" => "1",
"host" => "example.com",
"message" => "02/Mar/2014:15:36:43 +0100"
}

That’s it! No extra fields in the output, and a cleaner config file because you do not have to delete a "timestamp" field after conversion in the date filter.

Another use case is the CouchDB Changes input plugin (See https://github.com/logstash-plugins/logstash-input-couchdb_changes). This plugin automatically captures CouchDB document field metadata into the @metadata field within the input plugin itself. When the events pass through to be indexed by Elasticsearch, the Elasticsearch output plugin allows you to specify the action(delete, update, insert, etc.) and the document_id, like this:

1
2
3
4
5
6
7
8
9
output {
elasticsearch {
action => "%{[@metadata][action]}"
document_id => "%{[@metadata][_id]}"
hosts => ["example.com"]
index => "index_name"
protocol => "http"
}
}

利用DBMS_SCHEDULER在Oracle 11gR2 RAC上执行rman备份

RMAN backup in Oracle 11gR2 RAC is exactly same like RMAN backup in Oracle 11gR2 single node.
The only difference is: Typically, in case of Oracle single node database, we will schedule RMAN scripts with the help of CRON job and it will run according to our convenience, but in case of Oracle RAC if we schedule RMAN script and if unfortunately that RAC node goes down ( where we configured RMAN scripts ), then RMAN backup won’t run obviously.

So, Same strategy will not be work in Oracle RAC node. For RMAN consistent backups use dbms_scheduler & we need to place RMAN scripts in shared directory. ( Or in my case, I have created identical scripts on both cluster node’s )

注意: 需要将脚本放在共享位置或者每个节点的相同位置

看一下rman的备份脚本,此脚本将备份放在ASM中,将日志放在节点本地

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
################################################################
# rman_backup_rac.sh FOR RAC #
# created by lp #
# 2017/03/22 #
# usage: rman_backup_rac.sh <$BACKUP_LEVEL> #
# BACKUP_LEVEL: #
# F: full backup #
# 0: level 0 #
# 1: level 1 #
################################################################


#!/bin/bash
# User specific environment and startup programs
export ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1
export RMAN_BAK_LOG_BASE=/home/oracle/DbBackup
export RMAN_BAK_DATA_BASE=+data/NXRAC/backup
export ORACLE_SID=`ps -ef|grep pmon|grep -v ASM|grep -v grep|awk -F'_' '{print $NF}'`
export TIMESTAMP=`date +%Y%m%d%H%M`;

#the destination of rman backuppiece
export RMAN_DATA=${RMAN_BAK_DATA_BASE}/rman

#the destination of rman backup logs
export RMAN_LOG=${RMAN_BAK_LOG_BASE}/logs


if [[ ! -z $1 ]] && echo $1 |grep -Ew "[01F]" >/dev/null 2>&1
then
export RMAN_LEVEL=${1}

# Check rman level
if [ "$RMAN_LEVEL" == "F" ];
then unset INCR_LVL
BACKUP_TYPE=full
else
INCR_LVL="INCREMENTAL LEVEL ${RMAN_LEVEL}"
BACKUP_TYPE=lev${RMAN_LEVEL}
fi
else
echo "${1} wrong argument!" >${RMAN_LOG}/wrong_argument_${TIMESTAMP}.log
exit 1
fi





#the prefix of rman backuppiece
export RMAN_FILE=${RMAN_DATA}/${BACKUP_TYPE}_${TIMESTAMP}

#the logfile of shell script including the rman logs contents
export SSH_LOG=${RMAN_LOG}/${BACKUP_TYPE}_${TIMESTAMP}.log

#the size of backuppiece
export MAXPIECESIZE=4G

####################################################################
# #
# the name of rman logs excluding the file expanded-name, #
# when the shell is complete,the content of this file will be #
# appended to the $SSH_LOG and the rman logfile will be deleted. #
# #
####################################################################
export RMAN_LOG_FILE=${RMAN_LOG}/${BACKUP_TYPE}_${TIMESTAMP}_1


#Check RMAN Backup Path

if ! test -d ${RMAN_LOG}
then
mkdir -p ${RMAN_LOG}
fi

echo "---------------------------------" >>${SSH_LOG}
echo " " >>${SSH_LOG}
echo "Rman Begin to Working ........." >>${SSH_LOG}
echo "Begin time at:" `date` --`date +%Y%m%d%H%M` >>${SSH_LOG}

#Startup rman to backup

$ORACLE_HOME/bin/rman log=${RMAN_LOG_FILE}.log <<EOF
connect target /
run {
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 7 DAYS;
CONFIGURE BACKUP OPTIMIZATION ON;
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE DEVICE TYPE DISK PARALLELISM 4 BACKUP TYPE TO BACKUPSET;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '${RMAN_DATA}/control_auto_%F';
ALLOCATE CHANNEL 'ch1' TYPE DISK maxpiecesize=${MAXPIECESIZE} CONNECT 'SYS/passwd@node1';
ALLOCATE CHANNEL 'ch2' TYPE DISK maxpiecesize=${MAXPIECESIZE} CONNECT 'SYS/passwd@node1';
ALLOCATE CHANNEL 'ch3' TYPE DISK maxpiecesize=${MAXPIECESIZE} CONNECT 'SYS/passwd@mode2';
ALLOCATE CHANNEL 'ch4' TYPE DISK maxpiecesize=${MAXPIECESIZE} CONNECT 'SYS/passwd@node2';
CROSSCHECK ARCHIVELOG ALL;
DELETE NOPROMPT OBSOLETE;
DELETE NOPROMPT EXPIRED BACKUP;
BACKUP AS COMPRESSED BACKUPSET
${INCR_LVL}
DATABASE FORMAT '${RMAN_FILE}_db_%U' TAG '${BACKUP_TYPE}_${TIMESTAMP}';
SQL 'ALTER SYSTEM ARCHIVE LOG CURRENT';
BACKUP FILESPERSET 20 ARCHIVELOG ALL FORMAT '${RMAN_FILE}_arc_%U' TAG '${ORACLE_SID}_arc_${TIMESTAMP}'
DELETE INPUT;
RELEASE CHANNEL ch1;
RELEASE CHANNEL ch2;
RELEASE CHANNEL ch3;
RELEASE CHANNEL ch4;
ALLOCATE CHANNEL ch00 TYPE DISK;
BACKUP
FORMAT '${RMAN_DATA}/cntrl_%U'
CURRENT CONTROLFILE;
RELEASE CHANNEL ch00;
}
exit;
EOF

RC=$?

cat ${RMAN_LOG_FILE}.log >>${SSH_LOG}
echo "Rman Stop working @ time:"`date` `date +%Y%m%d%H%M` >>${SSH_LOG}

if [ $RC -ne "0" ]; then
echo "------ error ------" >>${SSH_LOG}
else
echo "------ Success during RMAN backup peroid------" >>${SSH_LOG}
rm -rf ${RMAN_LOG_FILE}.log
fi

exit

DBMS_SCHEDULER:

Here we are using DBMS_SCHEDULER instead of DBMS_JOB, because DBMS_SCHEDULER is RAC aware.

Before jump into real DBMS_SCHEDULER configuration, we need to focus on an important thing, That:

Both RAC nodes local time zone must be identical with DBMS_SCHEDULER default time.

On all RAC node, Ensure local time zone and set it accordingly.

1
2
[oracle@node2 ]$ cat /etc/sysconfig/clock
ZONE="Asia/Shanghai"

configure default time zone for DBMS_SCHEDULER

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SQL> select value from dba_scheduler_global_attribute where attribute_name = 'DEFAULT_TIMEZONE';

VALUE
--------------------------------------------------------------------------------
PRC

SQL> exec dbms_scheduler.set_scheduler_attribute ('DEFAULT_TIMEZONE', 'Asia/Shanghai');

PL/SQL procedure successfully completed.

SQL> select value from dba_scheduler_global_attribute where attribute_name = 'DEFAULT_TIMEZONE';

VALUE
--------------------------------------------------------------------------------
Asia/Shanghai

Now we need to create credential so that are assigned to DBMS_SCHEDULER jobs so that they can authenticate with a local/remote host operating system or a remote Oracle database.

1
2
3
SQL> exec dbms_scheduler.create_credential(credential_name => 'oracle', username => 'oracle', password => 'oracle');

PL/SQL procedure successfully completed.

Now its time to create DBMS_SCHEDULER job for RMAN incremental level 0 backup, Here in this procedure I am going to create RMAN_INC0_BACKUP job with required attributes.

1
2
3
4
5
6
7
8
9
10
11
12
begin
dbms_scheduler.create_job(
job_name => 'RMAN_INC0_BACKUP',
job_type => 'EXECUTABLE',
job_action => '/bin/sh',
number_of_arguments => 2,
start_date => SYSTIMESTAMP,
credential_name => 'oracle',
auto_drop => FALSE,
enabled => FALSE);
end;
/

Set argument_position & argument_value ( i.e. Path of the RMAN script ) for the same job:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
begin
dbms_scheduler.set_job_argument_value(
job_name => 'RMAN_INC0_BACKUP',
argument_position => 1,
argument_value => '/home/oracle/rman.sh');
end;
/

begin
dbms_scheduler.set_job_argument_value(
job_name => 'RMAN_INC0_BACKUP',
argument_position => 2,
argument_value => 0);
end;
/

Set start_date for the same job, In my case RMAN_INC0_BACKUP job will execute every week on sunday @03am, so job start date and its first run timing would according to my convenience.

1
2
3
4
5
6
7
begin
dbms_scheduler.set_attribute(
name => 'RMAN_INC0_BACKUP',
attribute => 'start_date',
value => trunc(sysdate)+3/24);
end;
/

Test your backup job manually in SQL prompt by instantiating RMAN_INC0_BACKUP job.

1
2
SQL> exec dbms_scheduler.run_job('RMAN_INC0_BACKUP');
PL/SQL procedure successfully completed.

Verify running RMAN backup status by issuing following SQL query, It will show you RMAN backup details with start time & end time.

1
2
3
4
5
6
select SESSION_KEY, INPUT_TYPE, STATUS,
to_char(START_TIME,'mm/dd/yy hh24:mi') start_time,
to_char(END_TIME,'mm/dd/yy hh24:mi') end_time,
elapsed_seconds/3600 hrs
from V$RMAN_BACKUP_JOB_DETAILS
order by session_key;

In case of any error while test run, you can make sure details of error by issuing the following query, OR You can also query to dba_scheduler_job_run_details dictionary view for more details.

1
select JOB_NAME,STATUS,STATE,ERROR#,CREDENTIAL_NAME from dba_scheduler_job_run_details where CREDENTIAL_NAME like 'RMAN%';

After successfully completion of test run, Enable & schedule it by following procedure by setting value to repeat_interval parameter, In my case RMAN_INC0_BACKUP job will execute every week on Sunday @03pm.

1
2
3
4
5
6
7
8
begin
dbms_scheduler.set_attribute(
name => 'RMAN_INC0_BACKUP',
attribute => 'repeat_interval',
value => 'freq=daily;byday=sun;byhour=03');
dbms_scheduler.enable( 'RMAN_INC0_BACKUP' );
end;
/

Ensure dbms_scheduler job details by issuing the following query OR you can also query to dba_scheduler_jobs and dba_scheduler_job_args.

1
SQL> select job_name,enabled,owner, state from dba_scheduler_jobs where job_name in ('RMAN_INC0_BACKUP');

Keep your eye on behavior of dbms_scheduler job by issuing the following query:

1
2
SQL> select job_name,RUN_COUNT,LAST_START_DATE,NEXT_RUN_DATE from dba_scheduler_jobs where job_name in ('RMAN_INC0_BACKUP');
SQL> select * from dba_scheduler_job_args where job_name like 'RMAN%';

In accordance with the above method to create a level 1 backup job RMAN_INC1_BACKUP,The only difference is the repeat_interval

1
2
3
4
5
6
7
8
begin
dbms_scheduler.set_attribute(
name => 'RMAN_INC1_BACKUP',
attribute => 'repeat_interval',
value => 'freq=daily;byday=mon,tue,wed,thu,fri,sat;byhour=03');
dbms_scheduler.enable( 'RMAN_INC1_BACKUP' );
end;
/

Important Note:
DBMS_SCHEDULER is smart enough to start backup on the node where the last backup was successfully executed.

参考:

https://dbatricksworld.com/how-to-backup-oracle-rac-11gr2-database-with-rman-backup-utility-with-the-help-of-dbms_scheduler-part-i-rman-full-database-backup/

http://dbatricksworld.com/how-to-backup-oracle-rac-11gr2-database-with-rman-backup-utility-with-the-help-of-dbms_scheduler-part-ii-rman-incremental-database-backup/

本文主要介绍SPM的原理与运行机制,针对12c做一些补充记录,并添加一些11g和12c里面的知识点,以做备忘。

干预SQL执行计划的历史

SQL的执行效率,取决于它的执行计划是否高效。 优化器的算法是一个平衡,需要收集尽量少的信息,用尽量快的速度试图去得到一个最优的执行计划,这也决定了它不是万能的。 所以Oracle提供了一些辅助手段来“修复”优化器可能产生的错误,并不断改进这些方法。

  • Oracle 8: hint
  • Oracle 8i&9i: stored outline
  • Oracle 10g: sql profile
  • Oracle 11g: sql plan manangement 、adaptive cursor sharing
  • Oracle 12c: sql plan manangement 、adaptive cursor sharing、Adaptive Execution Plans

SQL Plan Management

SPM组成

SQL Plan Management(以下简称SPM)是一种优化器自动管理执行计划的预防机制,它确保数据库仅使用已知的、经过验证的执行计划。在Oracle 11g之前,执行计划一直是作为“运行时”生成的对象存在。虽然oracle提供了一些方法去指导它的生成,但Oracle一直没有试图去保存完整的执行计划。 从11g开始,执行计划就可以作为一类资源被保存下来,允许特定SQL语句只能选择“已知”的执行计划。

同其他方法相比,SPM更加的灵活。如我们所熟知的,一条带有绑定变量的SQL语句,最好的执行计划会根据绑定变量的值而不同,11g以前的方法都无法解决这个问题。在11g中,与adaptive cursor sharing配合,SPM允许你同时接受多个执行计划。执行时,根据不同的变量值,SPM会花费很少的运算从中选择一条最合适的。

SPM有以下三个主要组件:

  • Plan capture

    捕捉并存储与SQL执行计划相关的信息

  • Plan selection

    使用SQL plan baselines选择合适的执行计划以避免性能退化

  • Plan evolution

    向SQL plan baselines增加新的执行计划

SQL Management Base

SMB是数据字典的一部分,位于SYSAUX表空间,其中存储着statement logs, plan histories, SQL plan baselines, and SQL profiles等内容。

Plan History是优化器生成的所有执行计划的总称;SQL Plan Baseline是Plan History里那些被标记为“ACCEPTED”的执行计划的总称,称为SQL执行计划基线;SQL Statement Log是一系列的查询签名(signatures),用于在自动捕获执行计划时辨别重复执行的sql;关系如下图所示:

SQL Management Base

Plan capture(执行计划的捕获)

Automatic Initial Plan Capture(自动捕获)

开启自动捕获只需在初始化参数中将OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES设置为TRUE(默认为FALSE),开启之后数据库就会为每个重复执行的sql创建SQL Plan Baseline,甄别是否是重复执行的SQL就是利用的上文提到的SQL Statement Log,基线包括供优化器重新生成该执行计划的所有信息,例如SQL text, outline, bind variable values, and compilation environment,这个初始的执行计划会被自动标记为“accepted”,如果之后又有新的执行计划生成,那么该执行计划会被加入到Plan History但是会被标记为“unaccepted”。

Oracle Database 12c Release 2 增加了自动捕获时sql过滤的功能,利用DBMS_SPM.CONFIGURE存储过程可以创建automatic capture filter。因此,你可以设置仅仅捕获你想要捕获的SQL,可以从DBA_SQL_MANAGEMENT_CONFIG视图中查询当前的设置:

SPM CONFIG

自动捕获状态下执行计划的匹配算法如下:

  • 如果SQL plan baseline不存在,那么优化器会为该SQL创建SQL plan baseline和Plan History,并将捕获的plan标记为accepted,加入SQL plan baseline

  • 如果SQL plan baseline存在,那么优化器的行为就依赖于解析时生产的基于成本的执行计划

    • 如果该plan与SQL plan baseline中的plan都不匹配,那么优化器将其标记为unaccepted加入Plan History
    • 如果该plan匹配到SQL plan baseline中的plan,那么执行该plan,不对现有的SQL plan baseline与Plan History做任何改动

    Manual Plan Capture(手动捕获)

    手动加载已存在的执行计划到SPM是最常用的方式,需要注意的是,默认情况下手动load的plan,会被自动标记为”accepted“,创建新的SQL plan baseline或加入到已有的SQL plan baseline中,oracle提供以下5种方式手动加载Execution Plan:

  • From a SQL Tuning Set(DBMS_SPM.LOAD_PLANS_FROM_SQLSET)

  • From the cursor cache (DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE)

  • From the AWR repository (new to Oracle Database 12c Release 2)(DBMS_SPM.LOAD_PLANS_FROM_AWR)

  • Unpacked from a staging table(DBMS_SPM.UNPACK_STGTAB_BASELINE)

  • From existing stored outlines(DBMS_SPM.MIGRATE_STORED_OUTLINE)

Loading Plans into a SQL Plan Baseline

手动load的结果有2种情况,由baseline是否存在而决定:

  • 如果SQL Plan Baseline不存在,则数据库做如下事情

    1. Creates a plan history and plan baseline for the statement
    2. Marks the initial plan for the statement as accepted
    3. Adds the plan to the new baseline
  • 如果SQL Plan Baseline存在,则数据库做如下事情

    1. Marks the loaded plan as accepted
    2. Adds the plan to the plan baseline for the statement without verifying the plan's performance

    手动load的plan会被自动标为accepted,因为优化器会假定任何由管理员手动加载的plan都是性能上可接受的;当然,你也可以在load的时候在DBMS_SPM.LOAD_PLANS_FROM_% 函数中将enable设为NO,从而达到禁用的目的。

    具体的load过程,本文暂不做讨论,详情请参阅官方文档 Managing SQL Plan Baselines,这里需要说明的一点是从AWR导入执行计划是oracle 12.2才开始加入的,12.2以前要想实现从AWR中导入执行计划,需要先创建SQL Tuning set,并从AWR中将plan load进STS,再使用DBMS_SPM.LOAD_PLANS_FROM_SQLSET将计划加载到SPM中。

    下面再谈一下,对于已存在SQL Plan Baseline的SQL,其后续的新的执行计划的捕获情况;

    不论使用哪种方式创建了SQL Plan Baseline,后续新的plan都会被加入到Plan History并被标记为”unaccepted“,此行为不依赖于OPTIMIZER_CAPTURE_SQL_PLAN_ BASELINES的设置,新加入的plan不会被使用,直到其经过验证比已accepted的plan性能更好,并演化为accepted加入SQL Plan Baseline。

Plan Selection(执行计划的选择)

SPM通过几个标记来实现对执行计划的控制,这些标记可以在视图DBA_SQL_PLAN_BASELINES中查到:

  • Enabled(控制活动)

    • YES(活动的,但不一定会被使用)
    • NO(禁用的,不活动的,肯定不被使用)
  • Accepted(控制使用)

    • YES(只有 “Enabled” 并且 “Accepted” 的计划才会被选择使用)
    • NO(如果是“Enabled” 那么只有被evolve成“Accepted”才有可能被执行)
  • Fixed(控制优先级)

    • YES(如果是“Enabled”并且“Accepted”,会优先选择这个计划,这个计划会被视为不需要改变的、固定的)
    • NO(普通的计划,无需优先)
  • Reproduced(有效性)

    • YES(优化器可以使用这个计划)
    • NO(计划无效,比如索引被删除)
  • ADAPTIVE(12c引入,标记是否是Adaptive Plans)

    • YES(是一个adaptive plan,在evolve的时候会被考虑,evolve时会测试执行,验证后final plan会变为accepted)
    • NO(adaptive plan被evolve后会被标记为NO)

    先来看一下SQL Plan Selection的决策树:

Decision Tree for SQL Plan Selection

当数据库为一条sql执行硬解析的时候,优化器会生成一个best-cost plan,如果初始化参数OPTIMIZER_USE_SQL_PLAN_BASELINES被设置成为TRUE(默认为TRUE),那么在执行best-cost plan之前会先检查是否存在对应的SQL Plan Baseline,匹配的时候使用SQL语句的signature,signature是一个由SQL文本规范化后的SQL标识符(case insensitivity and with whitespaces removed)(由此可见只要一条sql生成了baseline之后,那么无论大小写改变、空格多少都会被认为是一条sql),这个比较是内存操作,因此开销很小。

如果baseline不存在,就按生成的计划执行。如果baseline存在,那么要查看history里是否有这个计划,如果没有,就将这个计划插入,并标记为ENABLED,NON-ACCEPTED。

在baseline中查看是否有FIXED的计划存在,如果存在,执行FIXED的计划,如果存在多个FIXED的计划,根据统计信息重新计算cost,选择cost小的那个。

如果FIXED的计划不存在,就选择ACCEPTED的计划执行。 如果存在多个ACCEPTED的计划,根据统计信息重新计算cost,选择cost小的那个。

如果因为某些系统的改变(例如索引删除)导致已accepted的计划无法reproducible,那么优化器将使用新生成的best-cost plan,并将其加入plan history标记为unaccepted

* 注意,这里每次重新计算cost的代价不大,因为执行计划是已知的,优化器不必遍历所有的可能,只需根据算法计算出已知计划的cost便可。

* 注意,当sql plan baseline中有Fixed的时候,新生成的执行计划是不会被加入到plan history中的。

Plan Evolution(执行计划演化)

当优化器生成一个新的执行计划后,将其加入到plan history中作为一个unaccepted的plan,它需要被verified之后才可以使用,Verification是一个比较unaccepted和accepted(所有accepted中最小cost的)plan的执行性能的过程,verify的过程是实际执行该plan的过程,通过与accepted的plan比较elapse time, CPU time and buffer gets等性能统计信息来判断新plan的性能,如果新的plan的性能优于原plan或者相差无几,那么该plan会被标记为accepted加入SQL Plan Baseline,否则它仍然是一个unaccepted的plan,但是LAST_VERIFIED属性会被更新为当前时刻,12c之后的Automatic plan evolution过程还会考虑自上次被verify之后超过30天的plan,oracle认为如果系统有所改变,也许之前的plan会有更好的性能,当然这个功能可以通过dbms_spm.alter_sql_plan_baseline来禁止。

12c之前没有自动Evolve的机制,从12.1开始automatic plan evolution由SPM Evolve Advisor来完成,11g时代的DBMS_SPM.EVOLVE_SQL_PLAN_BASELINE被废弃,下面将分别针对11g和12c来介绍Plan Evolution的过程:

11g DBMS_SPM.EVOLVE_SQL_PLAN_BASELINE

11g可以使用Oracle Enterprise Manager或者DBMS_SPM.EVOLVE_SQL_PLAN_BASELINE函数来演化SQL Plan,语法如下:

1
2
3
4
5
6
7
DBMS_SPM.EVOLVE_SQL_PLAN_BASELINE (
sql_handle IN VARCHAR2 := NULL,
plan_name IN VARCHAR2 := NULL,
time_limit IN INTEGER := DBMS_SPM.AUTO_LIMIT,
verify IN VARCHAR2 := 'YES',
commit IN VARCHAR2 := 'YES')
RETURN CLOB;

这里由两个标记控制:

- Verify 
    - YES (验证比较性能)
    - NO (不验证性能)
- Commit
    - YES (演化)
    - NO (只生成报告,不演化)

这里可以通过不同的排列组合,达到不同的效果:

  • 自动接收所有性能更好的执行计划,并生成report (Verify->YES, Commit->YES)
  • 自动接收所有新的执行计划,不验证性能,生成report (Verify->NO, Commit->YES)
  • 比较性能,生成report,人工确认是否演化 (Verify->YES, Commit->NO)

12c SPM Evolve Advisor Task & Manually Evolve Task

SPM Evolve Advisor Task

从12.1开始,自动演化任务由SPM Evolve Advisor每天在维护窗口期内自动执行,SPM Evolve Advisor是一个自动任务(SYS_AUTO_SPM_EVOLVE_TASK),它每天做如下操作:

  1. Locates unaccepted plans

  2. Ranks all unaccepted plans

  3. Performs test executions of as many plans as possible during the maintenance window

  4. Selects the lowest-cost plan to compare against each unaccepted plan

  5. Accepts automatically any unaccepted plan that performs sufficiently better, using a cost-based algorithm, than the existing accepted plan

    * 需要注意的是,没有单独针对Automatic SPM Evolve Advisor task的scheduler client,Automatic SQL Tuning Advisor 和 Automatic SPM Evolve Advisor共用一个client,因此它们两个同时启用或同时禁用。

    自动任务属性可以通过DBMS_SPM.SET_EVOLVE_TASK_PARAMETER来配置,下面列举几个重要的属性:

    1
    2
    3
    4
    5
    6
    7
    8
    COL PARAMETER_NAME FORMAT a25
    COL VALUE FORMAT a42
    SELECT PARAMETER_NAME, PARAMETER_VALUE AS "VALUE"
    FROM DBA_ADVISOR_PARAMETERS
    WHERE ( (TASK_NAME = 'SYS_AUTO_SPM_EVOLVE_TASK') AND
    ( (PARAMETER_NAME = 'ACCEPT_PLANS') OR
    (PARAMETER_NAME LIKE '%ALT%') OR
    (PARAMETER_NAME = 'TIME_LIMIT') ) );
  6. 2的默认值如下(各个属性的意义详见官方文档):

    1
    2
    3
    4
    5
    6
    7
    PARAMETER_NAME		    VALUE
    ------------------------- ------------------------------------------
    TIME_LIMIT 3600
    ALTERNATE_PLAN_LIMIT 10
    ALTERNATE_PLAN_SOURCE CURSOR_CACHE+AUTOMATIC_WORKLOAD_REPOSITORY
    ALTERNATE_PLAN_BASELINE EXISTING
    ACCEPT_PLANS TRUE

    修改属性:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
      BEGIN
    DBMS_SPM.SET_EVOLVE_TASK_PARAMETER(
    task_name => 'SYS_AUTO_SPM_EVOLVE_TASK'
    , parameter => 'TIME_LIMIT'
    , value => '1200'
    );
    DBMS_SPM.SET_EVOLVE_TASK_PARAMETER(
    task_name => 'SYS_AUTO_SPM_EVOLVE_TASK'
    , parameter => 'ACCEPT_PLANS'
    , value => 'true'
    );
    DBMS_SPM.SET_EVOLVE_TASK_PARAMETER(
    task_name => 'SYS_AUTO_SPM_EVOLVE_TASK'
    , parameter => 'ALTERNATE_PLAN_LIMIT'
    , value => '500'
    );
    END;
    /

    查看结果:

    1
    2
    3
    4
    5
    6
    7
     PARAMETER_NAME            VALUE
    ------------------------- ------------------------------------------
    ALTERNATE_PLAN_LIMIT 500
    ALTERNATE_PLAN_SOURCE CURSOR_CACHE+AUTOMATIC_WORKLOAD_REPOSITORY
    ALTERNATE_PLAN_BASELINE EXISTING
    ACCEPT_PLANS true
    TIME_LIMIT 1200

    自动演化的报告可以通过函数DBMS_SPM.REPORT_AUTO_EVOLVE_TASK来查询:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    SQL> SET LONG 1000000 PAGESIZE 1000 LONGCHUNKSIZE 100 LINESIZE 100
    SQL> SELECT DBMS_SPM.report_auto_evolve_task FROM dual;

    REPORT_AUTO_EVOLVE_TASK
    ----------------------------------------------------------------------------------------------------
    GENERAL INFORMATION SECTION
    ---------------------------------------------------------------------------------------------

    Task Information:
    ---------------------------------------------
    Task Name : SYS_AUTO_SPM_EVOLVE_TASK
    Task Owner : SYS
    Description : Automatic SPM Evolve Task
    Execution Name : EXEC_1173
    Execution Type : SPM EVOLVE
    Scope : COMPREHENSIVE
    Status : COMPLETED
    Started : 04/20/2017 22:00:05
    Finished : 04/20/2017 22:00:09
    Last Updated : 04/20/2017 22:00:09
    Global Time Limit : 3600
    Per-Plan Time Limit : UNUSED
    Number of Errors : 0
    ---------------------------------------------------------------------------------------------

    SUMMARY SECTION
    ---------------------------------------------------------------------------------------------
    Number of plans processed : 0
    Number of findings : 0
    Number of recommendations : 0
    Number of errors : 0
    ---------------------------------------------------------------------------------------------
Manually Evolve Task

手动演化的过程大致如下图:

Evolving SQL Plan Baselines

  1. Create an evolve task

  2. Optionally, set evolve task parameters(在12.2.0.1中仅TIME_LIMIT有效)

  3. Execute the evolve task

  4. Implement the recommendations in the task

  5. Report on the task outcome

    个人认为4、5无绝对先后顺序

    具体操作,本文暂不讨论,请浏览Manually Evolving SQL Plan Baselines in Oracle Database 12c Release 2

Managing the SQL Management Base

利用DBMS_SPM.CONFIGURE存储过程可以配置SMB,视图DBA_SQL_MANAGEMENT_CONFIG显示了各个配置项的状态。

参数有如下几个:

  • SPACE_BUDGET_PERCENT(SYSAUX表空间的最大使用率)

  • PLAN_RETENTION_WEEKS(没被使用的plan的最大保留weeks,默认是53周)

  • AUTO_CAPTURE_PARSING_SCHEMA_NAME(自动捕获过滤schema)

  • AUTO_CAPTURE_MODULE

  • AUTO_CAPTURE_MODULE

  • AUTO_CAPTURE_SQL_TEXT(自动捕获指定的sql)

    修改SMB磁盘使用限额

    1.查询当前配置

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
      
    SELECT PARAMETER_NAME, PARAMETER_VALUE AS "%_LIMIT",
    ( SELECT sum(bytes/1024/1024) FROM DBA_DATA_FILES
    WHERE TABLESPACE_NAME = 'SYSAUX' ) AS SYSAUX_SIZE_IN_MB,
    PARAMETER_VALUE/100 *
    ( SELECT sum(bytes/1024/1024) FROM DBA_DATA_FILES
    WHERE TABLESPACE_NAME = 'SYSAUX' ) AS "CURRENT_LIMIT_IN_MB"
    FROM DBA_SQL_MANAGEMENT_CONFIG
    WHERE PARAMETER_NAME = 'SPACE_BUDGET_PERCENT';

    PARAMETER_NAME %_LIMIT SYSAUX_SIZE_IN_MB CURRENT_LIMIT_IN_MB
    -------------------- ---------- ----------------- -------------------
    SPACE_BUDGET_PERCENT 10 211.4375 21.14375

    2.修改配额

    1
    EXECUTE DBMS_SPM.CONFIGURE('space_budget_percent',30);

    3.查看结果

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
      SELECT PARAMETER_NAME, PARAMETER_VALUE AS "%_LIMIT", 
    ( SELECT sum(bytes/1024/1024) FROM DBA_DATA_FILES
    WHERE TABLESPACE_NAME = 'SYSAUX' ) AS SYSAUX_SIZE_IN_MB,
    PARAMETER_VALUE/100 *
    ( SELECT sum(bytes/1024/1024) FROM DBA_DATA_FILES
    WHERE TABLESPACE_NAME = 'SYSAUX' ) AS "CURRENT_LIMIT_IN_MB"
    FROM DBA_SQL_MANAGEMENT_CONFIG
    WHERE PARAMETER_NAME = 'SPACE_BUDGET_PERCENT';

    PARAMETER_NAME %_LIMIT SYSAUX_SIZE_IN_MB CURRENT_LIMIT_IN_MB
    -------------------- ---------- ----------------- -------------------
    SPACE_BUDGET_PERCENT 30 211.4375 63.43125

修改保留策略

1.查看当前配置

1
2
3
4
5
6
7
SQL> SELECT PARAMETER_NAME, PARAMETER_VALUE
2 FROM DBA_SQL_MANAGEMENT_CONFIG
3 WHERE PARAMETER_NAME = 'PLAN_RETENTION_WEEKS';

PARAMETER_NAME PARAMETER_VALUE
------------------------------ ---------------
PLAN_RETENTION_WEEKS 53

2.修改配置

1
EXECUTE DBMS_SPM.CONFIGURE('plan_retention_weeks',105);

3.查看结果

1
2
3
4
5
6
7
SQL> SELECT PARAMETER_NAME, PARAMETER_VALUE
2 FROM DBA_SQL_MANAGEMENT_CONFIG
3 WHERE PARAMETER_NAME = 'PLAN_RETENTION_WEEKS';

PARAMETER_NAME PARAMETER_VALUE
------------------------------ ---------------
PLAN_RETENTION_WEEKS 105

Monitoring SQL plan baselines

视图DBA_SQL_PLAN_BASELINES展示了当前SQL plan baselines的信息:

1
2
3
4
5
6
7
8
9
10
11
select SIGNATURE,
SQL_HANDLE,
SQL_TEXT,
PLAN_NAME,
PARSING_SCHEMA_NAME,
LAST_EXECUTED,
ENABLED,
ACCEPTED,
REPRODUCED,
EXECUTIONS
from dba_sql_plan_baselines;

结果如下:

1
2
3
4
5
6
7
8
9
                SIGNATURE SQL_HANDLE           SQL_TEXT                       PLAN_NAME                      PARSING_SC LAST_EXECUTED                       ENA ACC REP EXECUTIONS
----------------------- -------------------- ------------------------------ ------------------------------ ---------- ----------------------------------- --- --- --- ----------
here id=1

1962643652257108320 SQL_1b3cb600d175d160 select /*liu*/ * from test wh SQL_PLAN_1qg5q038rbnb025a3834b SCOTT 13-APR-17 02.23.12.000000 PM YES YES YES 0
ere id=1

1962643652257108320 SQL_1b3cb600d175d160 select /*liu*/ * from test wh SQL_PLAN_1qg5q038rbnb097bbe3d0 HR 13-APR-17 02.35.09.000000 PM YES YES YES 0
ere id=1

需要注意的一点是,该视图中LAST_EXECUTED和EXECUTIONS并不是实时更新的。

函数DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE可以查看baseline中的执行计划,语法如下:

1
2
3
4
5
 DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE (
sql_handle IN VARCHAR2 := NULL,
plan_name IN VARCHAR2 := NULL,
format IN VARCHAR2 := 'TYPICAL')
RETURN dbms_xplan_type_table;

例如:

1
2
3
4
5
6
7
8
select 
*
from table(
dbms_xplan.display_sql_plan_baseline(
sql_handle=>'SQL_5671036d51fd678f',
format=>'basic'));
--或者
select * from table(dbms_xplan.display_sql_plan_baseline('SQL_17574e83c195631c',null,'BASIC +NOTE'));

也可以通过V$SQL视图查看一条SQL是否使用SQL Base Line,如果使用了baseline,那么它的sql_plan_baseline列将会显示plan_name,因此可以连接DBA_SQL_PLAN_BASELINES和V$SQL视图:

1
2
3
4
select s.SQL_ID, s.SQL_TEXT, b.plan_name, b.origin, b.accepted
from dba_sql_plan_baselines b, v$sql s
where s.EXACT_MATCHING_SIGNATURE = b.signature
and s.SQL_PLAN_BASELINE = b.plan_name;

查看v$sql中的sql是否有baseline:

1
2
3
4
5
6
SELECT sql_handle,
plan_name
FROM dba_sql_plan_baselines
WHERE signature IN
( SELECT exact_matching_signature FROM v$sql WHERE sql_id='1s6a8wn4p6rym'
);

Dropping SQL Plan Baselines

1.查询要删除的baseline

1
2
3
4
5
6
7
8
9
10
SQL> SELECT SQL_HANDLE, SQL_TEXT, PLAN_NAME, ORIGIN,
2 ENABLED, ACCEPTED
3 FROM DBA_SQL_PLAN_BASELINES
4 WHERE SQL_TEXT LIKE 'SELECT /* repeatable_sql%';

SQL_HANDLE SQL_TEXT PLAN_NAME ORIGIN ENA ACC
-------------------- -------------------- ------------------------------ -------------- --- ---
SQL_b6b0d1c71cd1807b SELECT /* repeatable SQL_PLAN_bdc6jswfd303v2f1e9c20 AUTO-CAPTURE YES YES
_sql */ count(*) fro
m hr.jobs

2.删除sql plan baseline

1
2
3
4
5
6
7
8
9
DECLARE
v_dropped_plans number;
BEGIN
v_dropped_plans := DBMS_SPM.DROP_SQL_PLAN_BASELINE (
sql_handle => 'SQL_b6b0d1c71cd1807b'
);
DBMS_OUTPUT.PUT_LINE('dropped ' || v_dropped_plans || ' plans');
END;
/

3.确认删除

1
2
3
4
5
6
SELECT SQL_HANDLE, SQL_TEXT, PLAN_NAME, ORIGIN,
ENABLED, ACCEPTED
FROM DBA_SQL_PLAN_BASELINES
WHERE SQL_TEXT LIKE 'SELECT /* repeatable_sql%';

no rows selected

参考文献:

  1. Oracle 11g 针对SQL性能的新特性(三)- SQL Plan Management
  2. Database SQL Tuning Guide
  3. White Paper:SQL Plan Management in Oracle Database 11g
  4. White Paper:SQL Plan Management with Oracle Database 12c Release 2

LOAD_PLANS_FROM_CURSOR_CACHE函数中有一种语法组合如下:

1
2
3
4
5
6
7
DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE (
sql_id IN VARCHAR2,
plan_hash_value IN NUMBER := NULL,
sql_handle IN VARCHAR2, --注意这里
fixed IN VARCHAR2 := 'NO',
enabled IN VARCHAR2 := 'YES')
RETURN PLS_INTEGER;

Oracle对其中sql_handle的解释为:

SQL handle to use in identifying the SQL plan baseline into which the plans are loaded. The sql_handle must denote an existing SQL plan baseline. The use of handle is crucial when the user tunes a SQL statement by adding hints to its text and then wants to load the resulting plan(s) into the SQL plan baseline of the original SQL statement.

意思是说,使用sql_handle可以将cursor cache中的执行计划load进一个已存在的SQL Plan Baseline,这就意味着你可以不需要更改应用的代码,就可以让SQL跑出你期望的执行计划,前提是你有一个已经tuning好的plan在cacahe里,你就可以将这个plan load进原SQL的Baseline,下面将针对此调优方法展开详细论述:

我要进行的实验是要一个原本走索引的sql,在不改变原sql的基础上,让它走全表扫描

  1. 创建测试环境

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    CREATE TABLE spm_test_tab (
    id NUMBER,
    description VARCHAR2(50)
    );

    INSERT /*+ APPEND */ INTO spm_test_tab
    SELECT level,
    'Description for ' || level
    FROM dual
    CONNECT BY level <= 10000;

    COMMIT;
    --创建索引
    CREATE INDEX spm_test_tab_idx ON spm_test_tab(id);
    --收集统计信息
    begin
    dbms_stats.gather_table_stats(ownname => 'SYS',
    tabname =>'SPM_TEST_TAB',
    estimate_percent =>DBMS_STATS.AUTO_SAMPLE_SIZE,
    method_opt => 'FOR ALL COLUMNS SIZE AUTO',
    granularity=>'ALL',
    cascade => TRUE,
    no_invalidate => false);
    end;
    /
  2. 执行例句,查看其执行计划

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    	
    SELECT description FROM spm_test_tab WHERE id = 1113;

    SELECT * FROM TABLE (SELECT DBMS_XPLAN.DISPLAY_CURSOR(null,null,'advanced') from dual);

    SQL_ID gsttbra9z0ddw, child number 0
    -------------------------------------
    SELECT description FROM spm_test_tab WHERE id = 1113

    Plan hash value: 3121206333

    ------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | | | 2 (100)| |
    | 1 | TABLE ACCESS BY INDEX ROWID| SPM_TEST_TAB | 1 | 25 | 2 (0)| 00:00:01 |
    |* 2 | INDEX RANGE SCAN | SPM_TEST_TAB_IDX | 1 | | 1 (0)| 00:00:01 |
    ------------------------------------------------------------------------------------------------

    Query Block Name / Object Alias (identified by operation id):
    -------------------------------------------------------------

    1 - SEL$1 / SPM_TEST_TAB@SEL$1
    2 - SEL$1 / SPM_TEST_TAB@SEL$1

    Outline Data
    -------------

    /*+
    BEGIN_OUTLINE_DATA
    IGNORE_OPTIM_EMBEDDED_HINTS
    OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
    DB_VERSION('11.2.0.4')
    ALL_ROWS
    OUTLINE_LEAF(@"SEL$1")
    INDEX_RS_ASC(@"SEL$1" "SPM_TEST_TAB"@"SEL$1" ("SPM_TEST_TAB"."ID"))
    END_OUTLINE_DATA
    */

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    2 - access("ID"=1113)

    Column Projection Information (identified by operation id):
    -----------------------------------------------------------

    1 - "DESCRIPTION"[VARCHAR2,50]
    2 - "SPM_TEST_TAB".ROWID[ROWID,10]
  3. 从v$sql中查找语句的sql_id,并使用DBMS_SPM.LOAD_PLAN_FROM_CURSOR_CACHE创建SQL Plan Baseline

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    SYS@DB41 2017/04/26 14:57:55> SELECT SQL_ID,SQL_FULLTEXT FROM V$SQL WHERE SQL_FULLTEXT LIKE  '%1113%';

    SQL_ID SQL_FULLTEXT
    ------------- --------------------------------------------------------------------------------
    gsttbra9z0ddw SELECT description FROM spm_test_tab WHERE id = 1113

    declare
    v_sql_plan_id pls_integer;
    begin
    v_sql_plan_id := dbms_spm.load_plans_from_cursor_cache(sql_id => 'gsttbra9z0ddw');
    end;
    /

    SELECT SQL_HANDLE,PLAN_NAME,ENABLED,ACCEPTED
    FROM dba_sql_plan_baselines
    WHERE signature IN
    ( SELECT exact_matching_signature FROM v$sql WHERE sql_id='gsttbra9z0ddw'
    );


    SQL_HANDLE PLAN_NAME ENA ACC
    ------------------------------ ------------------------------ --- ---
    SQL_c1c9aa52fd90f3ae SQL_PLAN_c3kdaabyt1wxfed3324c0 YES YES
  4. 使用ALTER_SQL_PLAN_BASELINE禁用原执行计划

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    declare
    v_sql_plan_id pls_integer;
    begin
    v_sql_plan_id := dbms_spm.ALTER_SQL_PLAN_BASELINE(sql_handle => 'SQL_c1c9aa52fd90f3ae',
    plan_name => 'SQL_PLAN_c3kdaabyt1wxfed3324c0',
    attribute_name => 'enabled',
    attribute_value => 'NO');
    end;
    /

    SQL_HANDLE PLAN_NAME ENA ACC
    ------------------------------ ------------------------------ --- ---
    SQL_c1c9aa52fd90f3ae SQL_PLAN_c3kdaabyt1wxfed3324c0 NO YES
  1. 加入hint使sql走全表扫描

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    SELECT /*+ FULL(SPM_TEST_TAB) */description FROM  spm_test_tab WHERE  id = 1113;

    SELECT * FROM TABLE (SELECT DBMS_XPLAN.DISPLAY_CURSOR(null,null,'advanced') from dual);

    SQL_ID fr5dd23pfbjfz, child number 0
    -------------------------------------
    SELECT /*+ FULL(SPM_TEST_TAB) */description FROM spm_test_tab WHERE
    id = 1113

    Plan hash value: 1107868462

    ----------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ----------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | | | 13 (100)| |
    |* 1 | TABLE ACCESS FULL | SPM_TEST_TAB | 1 | 25 | 13 (0)| 00:00:01|
    ----------------------------------------------------------------------------------

    Query Block Name / Object Alias (identified by operation id):
    -------------------------------------------------------------

    1 - SEL$1 / SPM_TEST_TAB@SEL$1

    Outline Data
    -------------

    /*+
    BEGIN_OUTLINE_DATA
    IGNORE_OPTIM_EMBEDDED_HINTS
    OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
    DB_VERSION('11.2.0.4')
    ALL_ROWS
    OUTLINE_LEAF(@"SEL$1")
    FULL(@"SEL$1" "SPM_TEST_TAB"@"SEL$1")
    END_OUTLINE_DATA
    */

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    1 - filter("ID"=1113)

    Column Projection Information (identified by operation id):
    -----------------------------------------------------------

    1 - "DESCRIPTION"[VARCHAR2,50]
  1. 在V$SQL中找到加入hint的语句的sql_id和plan_hash_value

    1
    2
    3
    4
    5
    6
    SELECT SQL_ID,PLAN_HASH_VALUE,SQL_FULLTEXT FROM V$SQL WHERE SQL_FULLTEXT LIKE  '%1113%';

    SQL_ID PLAN_HASH_VALUE SQL_FULLTEXT
    ------------- --------------- --------------------------------------------------------------------------------
    fr5dd23pfbjfz 1107868462 SELECT /*+ FULL(SPM_TEST_TAB) */description FROM spm_test_tab WHERE id = 1113
    08vxbgd0qrm8s 903671040 SELECT SQL_ID,SQL_FULLTEXT FROM V$SQL WHERE SQL_FULLTEXT LIKE '%1113%'
  1. 利用SQL_ID和PLAN_HASH_VALUE创建一个新的accepted的plan,并通过SQL_HANDLE将修改过的plan与原SQL联系起来

    1
    2
    3
    4
    5
    6
    7
    8
    declare
    v_sql_plan_id pls_integer;
    begin
    v_sql_plan_id := dbms_spm.load_plans_from_cursor_cache(sql_id => 'fr5dd23pfbjfz',
    plan_hash_value => 1107868462,
    sql_handle => 'SQL_c1c9aa52fd90f3ae');
    end;
    /
  1. 查询DBA_SQL_PLAN_BASELINES会看到2个plan

    1
    2
    3
    4
    SQL_HANDLE		       		   PLAN_NAME		              ENA ACC
    ------------------------------ ------------------------------ --- ---
    SQL_c1c9aa52fd90f3ae SQL_PLAN_c3kdaabyt1wxfb65c37c8 YES YES
    SQL_c1c9aa52fd90f3ae SQL_PLAN_c3kdaabyt1wxfed3324c0 NO YES
  1. 现在执行原sql查看其执行计划

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37

    SYS@DB41 2017/04/26 15:50:44> SET AUTOTRACE TRACE
    SYS@DB41 2017/04/26 15:50:57> SELECT description FROM spm_test_tab WHERE id = 1113;

    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1107868462

    ----------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ----------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 25 | 13 (0)| 00:00:01 |
    |* 1 | TABLE ACCESS FULL| SPM_TEST_TAB | 1 | 25 | 13 (0)| 00:00:01 |
    ----------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    1 - filter("ID"=1113)

    Note
    -----
    - SQL plan baseline "SQL_PLAN_c3kdaabyt1wxfb65c37c8" used for this statement

    Statistics
    ----------------------------------------------------------
    7 recursive calls
    0 db block gets
    45 consistent gets
    0 physical reads
    0 redo size
    367 bytes sent via SQL*Net to client
    476 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    1 rows processed

可见,执行计划已改变

注: 测试过程中发现,使用SQL Plan Baseline的sql无法再用SELECT * FROM TABLE (SELECT DBMS_XPLAN.DISPLAY_CURSOR(null,null,'advanced') from dual)获取其上次的执行计划。

0%