一 套Linux上的11.1.0.7系统,操作系统管理人员最近对该服务器上的网络配置文件/etc/nsswitch.conf进行了调整,调整前其主机 名解析选项为”hosts:files dns” ,调整后被修改成了”hosts:files [NOTFOUND=continue] dns”;此后应用人员尝试在该主机上使用
“sqlplus username/password@connect_string”远程登录数据库都会出现多达五分钟的延迟,使用lsnrctl status命令查看监听器状态,发现LISTENER一切正常;初步可以判断是dns解析导致了长时间的延迟。
针对以上问题,首先想到的是设置client端Oracle network trace以了解造成延迟的具体原因,在$ORACLE_HOME/network/admin/sqlnet.ora配置文件中加入以下记录:
之后再次尝试登录就会触发Oracle Network Trace文件在$TRACE_DIRECTORY_CLIENT指定的目录下产生(如果DIAG_ADR_ENABLED未设置为false,那么 11g下TRACE_DIRECTORY_CLIENT并不生效,而会产生在11g特有的diag目录下)。
登录测试产生的trace文件记录:
可以看到以上出现了”snlinGetAddrInfo: getaddrinfo() failed with error -2″的记录,通过关键词”snlinGetAddrInfo”在MOS上搜索可以找到以下Note:
11g中对hosts的解析实在变得有些西斯底里,这个case通过在移除nsswitch.conf中hosts的dns选项最后解决了,对于Oracle使用最简单的文件解析方式似乎仍是最稳妥的办法,不管版本有多新。
“sqlplus username/password@connect_string”远程登录数据库都会出现多达五分钟的延迟,使用lsnrctl status命令查看监听器状态,发现LISTENER一切正常;初步可以判断是dns解析导致了长时间的延迟。
针对以上问题,首先想到的是设置client端Oracle network trace以了解造成延迟的具体原因,在$ORACLE_HOME/network/admin/sqlnet.ora配置文件中加入以下记录:
1 2 3 4 5 | TRACE_LEVEL_CLIENT = 16 TRACE_FILE_CLIENT = client TRACE_DIRECTORY_CLIENT = [ any valid directory path] TRACE_TIMESTAMP_CLIENT = ON DIAG_ADR_ENABLED= off |
登录测试产生的trace文件记录:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | [02-SEP-2010 07:36:57:719] nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=m218279apss2012-vip)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=MOTOIDP.MOT.COM)(INSTANCE_NAME=MOTOIDP1)(CID=(PROGRAM=sqlplus)(HOST=m218279apss2012.mot.com)( USER =oraoid)))) [02-SEP-2010 07:36:57:719] nttbnd2addr: entry [02-SEP-2010 07:36:57:719] snlinGetAddrInfo: entry [02-SEP-2010 07:36:57:719] snlinGetAddrInfo: getaddrinfo() failed with error -2 [02-SEP-2010 07:36:57:719] snlinGetAddrInfo: exit [02-SEP-2010 07:36:57:719] nttbnd2addr: looking up IP addr for host: m218279apss2012-vip [02-SEP-2010 07:36:57:719] snlinGetAddrInfo: entry [02-SEP-2010 07:36:57:719] snlinGetAddrInfo: exit [02-SEP-2010 07:36:57:719] snlinFreeAddrInfo: entry [02-SEP-2010 07:36:57:719] snlinFreeAddrInfo: exit [02-SEP-2010 07:36:57:719] nttbnd2addr: exit .................. [02-SEP-2010 07:36:57:722] nspsend: packet dump [02-SEP-2010 07:36:57:722] nspsend: 01 1E 00 00 01 00 00 00 |........| [02-SEP-2010 07:36:57:722] nspsend: 01 3A 01 2C 0C 41 20 00 |.:.,.A..| [02-SEP-2010 07:36:57:722] nspsend: 7F FF 7F 08 00 00 01 00 |........| [02-SEP-2010 07:36:57:722] nspsend: 00 E4 00 3A 00 00 02 00 |...:....| [02-SEP-2010 07:36:57:722] nspsend: 41 41 00 00 00 00 00 00 |AA......| [02-SEP-2010 07:36:57:722] nspsend: 00 00 00 00 00 00 00 00 |........| [02-SEP-2010 07:36:57:722] nspsend: 00 00 00 00 00 00 00 00 |........| [02-SEP-2010 07:36:57:722] nspsend: 00 00 28 44 45 53 43 52 |..(DESCR| [02-SEP-2010 07:36:57:722] nspsend: 49 50 54 49 4F 4E 3D 28 |IPTION=(| [02-SEP-2010 07:36:57:722] nspsend: 41 44 44 52 45 53 53 3D |ADDRESS=| [02-SEP-2010 07:36:57:722] nspsend: 28 50 52 4F 54 4F 43 4F |(PROTOCO| [02-SEP-2010 07:36:57:722] nspsend: 4C 3D 54 43 50 29 28 48 |L=TCP)(H| [02-SEP-2010 07:36:57:722] nspsend: 4F 53 54 3D 6D 32 31 38 |OST=m218| [02-SEP-2010 07:36:57:722] nspsend: 32 37 39 61 70 73 73 32 |279apss2| [02-SEP-2010 07:36:57:722] nspsend: 30 31 32 2D 76 69 70 29 |012-vip)| [02-SEP-2010 07:36:57:722] nspsend: 28 50 4F 52 54 3D 31 35 |(PORT=15| [02-SEP-2010 07:36:57:722] nspsend: 32 31 29 29 28 43 4F 4E |21))(CON| [02-SEP-2010 07:36:57:722] nspsend: 4E 45 43 54 5F 44 41 54 |NECT_DAT| [02-SEP-2010 07:36:57:722] nspsend: 41 3D 28 53 45 52 56 45 |A=(SERVE| [02-SEP-2010 07:36:57:722] nspsend: 52 3D 44 45 44 49 43 41 |R=DEDICA| [02-SEP-2010 07:36:57:722] nspsend: 54 45 44 29 28 53 45 52 |TED)(SER| [02-SEP-2010 07:36:57:722] nspsend: 56 49 43 45 5F 4E 41 4D |VICE_NAM| [02-SEP-2010 07:36:57:722] nspsend: 45 3D 4D 4F 54 4F 49 44 |E=MOTOID| [02-SEP-2010 07:36:57:722] nspsend: 50 2E 4D 4F 54 2E 43 4F |P.MOT.CO| [02-SEP-2010 07:36:57:722] nspsend: 4D 29 28 49 4E 53 54 41 |M)(INSTA| [02-SEP-2010 07:36:57:722] nspsend: 4E 43 45 5F 4E 41 4D 45 |NCE_NAME| [02-SEP-2010 07:36:57:722] nspsend: 3D 4D 4F 54 4F 49 44 50 |=MOTOIDP| [02-SEP-2010 07:36:57:722] nspsend: 31 29 28 43 49 44 3D 28 |1)(CID=(| [02-SEP-2010 07:36:57:722] nspsend: 50 52 4F 47 52 41 4D 3D |PROGRAM=| [02-SEP-2010 07:36:57:722] nspsend: 73 71 6C 70 6C 75 73 29 |sqlplus)| [02-SEP-2010 07:36:57:722] nspsend: 28 48 4F 53 54 3D 6D 32 |(HOST=m2| [02-SEP-2010 07:36:57:722] nspsend: 31 38 32 37 39 61 70 73 |18279aps| [02-SEP-2010 07:36:57:722] nspsend: 73 32 30 31 32 2E 6D 6F |s2012.mo| [02-SEP-2010 07:36:57:722] nspsend: 74 2E 63 6F 6D 29 28 55 |t.com)(U| [02-SEP-2010 07:36:57:722] nspsend: 53 45 52 3D 6F 72 61 6F |SER=orao| [02-SEP-2010 07:36:57:722] nspsend: 69 64 29 29 29 29 |id)))) | [02-SEP-2010 07:36:57:722] nspsend: 286 bytes to transport [02-SEP-2010 07:36:57:722] nspsend: normal exit [02-SEP-2010 07:36:57:722] nscon: exit (0) [02-SEP-2010 07:36:57:722] nsdo: nsctxrnk=0 [02-SEP-2010 07:36:57:722] nsdo: normal exit [02-SEP-2010 07:36:57:722] nsdo: entry [02-SEP-2010 07:36:57:722] nsdo: cid=0, opcode=68, *bl=512, *what=9, uflgs=0x0, cflgs=0x3 [02-SEP-2010 07:36:57:722] nsdo: rank=64, nsctxrnk=0 [02-SEP-2010 07:36:57:722] nsdo: nsctx: state=2, flg=0x4005, mvd=0 [02-SEP-2010 07:36:57:722] nsdo: gtn=10, gtc=10, ptn=10, ptc=8155 [02-SEP-2010 07:36:57:722] nscon: entry [02-SEP-2010 07:36:57:722] nscon: recving a packet [02-SEP-2010 07:36:57:722] nsprecv: entry [02-SEP-2010 07:36:57:722] nsprecv: reading from transport... [02-SEP-2010 07:36:57:722] nttrd: entry [02-SEP-2010 07:41:57:741] nttrd: socket 9 had bytes read =8 [02-SEP-2010 07:41:57:741] nttrd: exit [02-SEP-2010 07:41:57:741] nsprecv: 8 bytes from transport [02-SEP-2010 07:41:57:741] nsprecv: tlen=8, plen=8, type=11 [02-SEP-2010 07:41:57:741] nsprecv: packet dump [02-SEP-2010 07:41:57:741] nsprecv: 00 08 00 00 0B 00 00 00 |........| [02-SEP-2010 07:41:57:741] nsprecv: normal exit |
Remote Connections Take Very Long to Establish这个文档指出snlinGetAddrInfo函数用以主机名到ip地址的映射,并建议使用在/etc/nsswitch.conf网络配置文件中 设置过程为”hosts: files [NOTFOUND=continue] dns”,而我们恰恰正是这样做的!另一个文档[ID 803838.1]指出在其他UNIX平台上,可以使用”ipnodes: files [NOTFOUND=continue] dns”方式避免延迟问题,但Linux平台上并没有ipnodes方式。
Applies to:
Oracle Net Services – Version: 11.1.0.6 to 11.1.0.7 – Release: 11.1 to 11.1
Information in this document applies to any platform.
Symptoms
When you connect remotely, it takes very long until the connection is established. Once connected, everything works fine.
Local bequeath connections work in a timely manner.
Resolution of hostnames is done via a centralized DNS server.
Cause
Listener traces show the listener is wasting time while calling the “snlinGetAddrInfo” function :
2009-07-08 13:58:35.135311 : nttcnp:exit
2009-07-08 13:58:35.135327 : snlinGetAddrInfo:entry
2009-07-08 13:58:55.135643 : snlinGetAddrInfo:getaddrinfo() failed with error -5
2009-07-08 13:58:55.135703 : snlinGetAddrInfo:exit
Solution
We need to make sure that the DNS server(s) configured are reachable.
The “snlinGetAddrInfo” function is a TCP layer function which Oracle uses (since version 11g) for hostname-to-IP mappings.
When the DNS server is unreachable, the listener will wait for some time until it will time out and fail over to the next method configured for resolution of hostnames (usually the local “hosts” file).
Note : On UNIX systems, the order of methods used for resolution of hostnames can be specified via the /etc/nsswitch.conf and /etc/host.conf files :
on Linux systems, you can specify the order by setting “hosts” to “files” and / or “dns”Example for /etc/nsswitch.conf :
hosts: files [NOTFOUND=continue] dns
on HP-UX and Solaris systems, you need to specify the order using the “ipnodes” keywordExample for /etc/nsswitch.conf :
ipnodes: files [NOTFOUND=continue] dnsNote : Since 11.2, this behaviour has changed and naming lookup is no longer performed at this stage (the changes was addressed through unpublished Bug 9593134).
11g中对hosts的解析实在变得有些西斯底里,这个case通过在移除nsswitch.conf中hosts的dns选项最后解决了,对于Oracle使用最简单的文件解析方式似乎仍是最稳妥的办法,不管版本有多新。
© 2010, www.oracledatabase12g.com. 版权所有.
Applies to:
Oracle Net Services – Version: 11.1.0.6 to 11.1.0.7Generic UNIX
Symptoms
After upgrading to 11g Oracle functionality bypasses the /etc/hosts file when resolving hostnames to IPs and instead query the DNS server. This can introduce some delay in establishing a connection to a remote host compared with 10g.SQL*Plus and tnsping show this behavior but this delay can appear also when opening a dblink or anytime tcpip connections are established by the Oracle Network layer.
This happens even if Name Server Switch configuration (nsswitch) specifies the hosts file prior to dns lookups:
hosts: files [NOTFOUND=continue] dns
or only:
hosts: files
On 10g after reading the nsswitch.conf file, library “libnss_files.so” is loaded then /etc/hosts is read and the socket is opened:
ioctl(5, TCGETA, 0x9fffffffffffaca0) ERR#25 ENOTTY
read(5, “# \n# / e t c / n s s w i t c “.., 8192) = 92
read(5, 0x60000000001e6078, 8192) = 0
close(5) = 0
open(“/usr/lib/hpux64/libnss_files.so.1″, O_RDONLY|0×800, 0) = 5
fstat(5, 0x9fffffffffffa720) = 0
pread(5, “7fE L F 0202010101\0\0\0\0\0\0\0″.., 1024, 0) = 1024
stat(“/usr/lib/hpux64/dpd”, 0x9fffffffffff9cd0) = 0
open(“/usr/lib/hpux64/dpd/libnss_files.so.1.bpd”, O_RDONLY|0×800, 0) ERR#2 ENOENT
getuid() = 305 (305)
getgid() = 303 (303)
mmap(NULL, 85872, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_SHLIB, 5, 0) = 0xc0000000008d8000
mmap(NULL, 3159, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_SHLIB, 5, 131072) = 0x9fffffffbf63d000
close(5) = 0
getuid() = 305 (305)
getgid() = 303 (303)
open(“/etc/hosts”, O_RDONLY|0×800, 0666) = 5
ioctl(5, TCGETA, 0x9fffffffffffac30) ERR#25 ENOTTY
read(5, “# / e t c / h o s t s \n# \n# “.., 8192) = 8192
lseek(5, 18446744073709549410, SEEK_CUR) = 5986
close(5) = 0
socket(AF_INET, SOCK_STREAM, 0) = 5
connect(5, 0x60000000001f0680, 16) = 0
getsockname(5, 0x9fffffffffffb750, 0x9fffffffffffb740) = 0
getsockopt(5, SOL_SOCKET, SO_SNDBUF, 0x9fffffffffffb890, 0x9fffffffffffb894) = 0
getsockopt(5, SOL_SOCKET, SO_RCVBUF, 0x9fffffffffffb890, 0x9fffffffffffb894) = 0
setsockopt(5, 0×6, TCP_NODELAY, 0x9fffffffffffb89c, 4) = 0
But on 11g after reading the nsswitch.conf library “libnss_dns.so” is loaded then /etc/resolv.conf (which specifies available domain name servers) is read and much later a socket for TCP/IP (SOCK_STREAM) is opened. :
ioctl(5, TCGETA, 0x9fffffffffffa360) ERR#25 ENOTTY
read(5, “# \n# / e t c / n s s w i t c “.., 8192) = 92
read(5, 0x60000000001c9058, 8192) = 0
close(5) = 0
open(“/usr/lib/hpux64/libnss_dns.so.1″, O_RDONLY|0×800, 0) = 5
fstat(5, 0x9fffffffffff9de0) = 0
pread(5, “7fE L F 0202010101\0\0\0\0\0\0\0″.., 1024, 0) = 1024
stat(“/usr/lib/hpux64/dpd”, 0x9fffffffffff9390) = 0
open(“/usr/lib/hpux64/dpd/libnss_dns.so.1.bpd”, O_RDONLY|0×800, 0) ERR#2 ENOENT
getuid() = 305 (305)
getgid() = 303 (303)
mmap(NULL, 49440, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_SHLIB, 5, 0) = 0xc00000000b054000
mmap(NULL, 800, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_SHLIB, 5, 65536) = 0x9fffffffbf68d000
close(5) = 0
getuid() = 305 (305)
getgid() = 303 (303)
getuid() = 305 (305)
getgid() = 303 (303)
open(“/test/local/oracle/11.1.0.7/lib/libdl.so.1″, O_RDONLY|0×800, 0) ERR#2 ENOENT
open(“/test/local/oracle/11.1.0.7/lib32/libdl.so.1″, O_RDONLY|0×800, 0) ERR#2 ENOENT
getuid() = 305 (305)
getgid() = 303 (303)
open(“/usr/lib/hpux64/libdl.so.1″, O_RDONLY|0×800, 0) = 5
fstat(5, 0x9fffffffffff9cf0) = 0
read(5, “7fE L F 0202010101\0\0\0\0\0\0\0″.., 64) = 64
close(5) = 0
socket(AF_INET, SOCK_DGRAM, 0) = 5
ioctl(5, SIOCGIFNUM, 0x9fffffffffff9680) = 0
ioctl(5, SIOCGIFCONF, 0x9fffffffffff9690) = 0
socket(AF_INET6, SOCK_DGRAM, 0) = 6
ioctl(6, SIOCGLIFNUM, 0x9fffffffffff9684) = 0
ioctl(6, SIOCGLIFCONF, 0x9fffffffffff96a0) = 0
ioctl(5, SIOCGIFFLAGS, 0x9fffffffffff96b0) = 0
close(5) = 0
close(6) = 0
gettimeofday(0x9fffffffffff7dd0, NULL) = 0
getpid() = 22968 (22967)
open(“/etc/resolv.conf”, O_RDONLY|0×800, 0666) = 5
ioctl(5, TCGETA, 0x9fffffffffff7da0) ERR#25 ENOTTY
read(5, “d o m a i n t e s t . c o m \n”.., 8192) = 453
read(5, 0x60000000001dddf8, 8192) = 0
close(5) = 0
…………….
………….
socket(AF_INET, SOCK_STREAM, 0) = 5
connect(5, 0x60000000001eba50, 16) = 0
getsockname(5, 0x9fffffffffff9da0, 0x9fffffffffff94c0) = 0
getsockopt(5, SOL_SOCKET, SO_SNDBUF, 0x9fffffffffffa000, 0x9fffffffffffa004) = 0
getsockopt(5, SOL_SOCKET, SO_RCVBUF, 0x9fffffffffffa000, 0x9fffffffffffa004) = 0
setsockopt(5, 0×6, TCP_NODELAY, 0x9fffffffffffa00c, 4) = 0
Changes
Nothing was changed in the configuration of the OS, only the upgrade from Oracle 10g to 11g was done.Cause
What was changed between the two is the way Oracle resolves hostnames to IPs, more specifically the system call used to do that.Oracle 11g use now getaddrinfo() while 10g used gethostbyname().
These system functions requires different configuration in /etc/nsswitch.conf.
gethostbyname() require the use of keyword “hosts” while getaddrinfo() the use the keyword “ipnodes”
Notes:
Even though this has only been observed on HP-UX and Solaris, this may be UNIX generic.
With Solaris, ipnodes has a different meaning (specify a file for IPV6 addresses resolution, gethostbyname and getaddrinfo both use hosts or ipnodes file).
Linux on the other hand does not use ipnodes in nsswitch.conf
Solution
Add a line in the /etc/nsswitch.conf file similar to the following:ipnodes: files [NOTFOUND=continue] dns
The line starting with keyword “hosts” must not be deleted.
This way calls made by getaddrinfo() will search first in /etc/hosts then, if the name is not found, will contact the dns server.
Thus there will be no connection delay for any lookup of host names existing in the local /etc/hosts file.
Applies to:
Oracle Server – Enterprise Edition – Version: 10.1.0.2 to 11.2.0.1.0 – Release: 10.1 to 11.2Information in this document applies to any platform.
Symptoms
10 nodes RAC cluster, sqlnet connection via TCP consistently takes 10 seconds before the connection with the server is finally established.Tue Feb 2 17:00:39 CST 2010 SQL*Plus: Release 11.1.0.7.0 – Production on Tue Feb 2 17:00:39 2010
Copyright (c) 1982, 2008, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 – 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Real Application Testing options
TO_CHAR(SYSDATE,’YYYY-MM-DDHH24:MI:SS’)
————————————————————————————————————————
2010-02-02 17:00:49
2010-02-02 17:00:39.432277 : nttcnp:exit
2010-02-02 17:00:39.432308 : snlinGetAddrInfo:entry *** 2010-02-02 17:00:49.440
2010-02-02 17:00:49.440655 : snlinGetAddrInfo:getaddrinfo() failed with error -2
2010-02-02 17:00:49.440736 : snlinGetAddrInfo:exit
2010-02-02 17:00:49.440751 : nttcon:exit
2010-02-02 17:00:49.440767 : nsopen:transport is open
2010-02-02 17:00:49.440785 : nsoptions:entry
2010-02-02 17:00:49.440802 : nsoptions:lcl[0]=0×0, lcl[1]=0×0, gbl[0]=0×0, gbl[1]=0×0, cha=0×0
2010-02-02 17:00:39.418598 : nscon:recving a packet
2010-02-02 17:00:39.418714 : nsprecv:entry
2010-02-02 17:00:39.418833 : nsprecv:reading from transport…
2010-02-02 17:00:39.418949 : nttrd:entry
2010-02-02 17:00:49.444643 : nttrd:socket 12 had bytes read=8
2010-02-02 17:00:49.444832 : nttrd:exit
Changes
This is a new installation.Cause
This issue is caused by invalid nameserver entry in /etc/resolv.conf.Examine system configuration files show:
——–
nameserver 192.168.1.20
nameserver 192.168.4.12 nsswitch.conf
———-
#hosts: db files ldap nis dns
hosts: files dns
Solution
1. Remove incorrect setting for nameserver in /etc/resolv.conf if DNS is not in use.or
2. Consult with the network administrator to provide correct nameserver IP address if DNS is in use.
Interim Patch for Base Bug: 9593134
=========================================================================
Date: Tue Aug 10 18:18:48 2010
————————————————————————-
Platform Patch for : Linux-x86
Product Patched : SQLNET
Product Version # : 11.1.0.7.0
RAC Rolling Installable : YES
Bugs Fixed by this patch:
————————-
9593134: DNS OR NIS MIS-CONFIGURATION CAN CAUSE SLOW DATABASE CONNECTS
Hdr: 8307164 11.1.0.7 NET 11.1.0.7 PRODID-115 PORTID-197
Abstract: TNSPING 11G USING DNS AND NOT HOSTS FILE
*** 03/04/09 05:24 am ***
TAR:
—-
Problem:
——–
Customer has 2 oracle homes on the same HP-UX server,
one 10g and one 11.1.0.7.
He execute tnsping in both versions, and truss them truss -o out
tnsping
In 10g truss shows that the hostname from the connect descriptor is resolved
using /etc/hosts file but in 11g the same hostname is resolved using dns.
The same behavior is observed for sqlplus.
Because of this connections to the database ar slow because of the slow
hostname resolution using the dns.
Versions:
———
Diagnostic Analysis:
——————–
nsswitch.conf
==============
hosts: files [NOTFOUND=continue] dns
Reproducibility:
—————-
on customer site, also I observered same behavior on
internal HP-UX server with 11g database
Test Case:
———-
Workaround:
———–
none