<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Ardent Performance Computing &#187; Technical</title>
	<atom:link href="http://www.ardentperf.com/category/technical/feed" rel="self" type="application/rss+xml" />
	<link>http://www.ardentperf.com</link>
	<description>Jeremy's Oracle Resources and Ramblings</description>
	<pubDate>Sat, 25 Oct 2008 02:52:32 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.3</generator>
	<language>en</language>
			<item>
		<title>Oracle Fully Automated Install and Patch</title>
		<link>http://www.ardentperf.com/2008/10/22/oracle-fully-automated-install-and-patch/</link>
		<comments>http://www.ardentperf.com/2008/10/22/oracle-fully-automated-install-and-patch/#comments</comments>
		<pubDate>Wed, 22 Oct 2008 23:08:30 +0000</pubDate>
		<dc:creator>Jeremy</dc:creator>
		
		<category><![CDATA[Linux]]></category>

		<category><![CDATA[Oracle]]></category>

		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://www.ardentperf.com/?p=625</guid>
		<description><![CDATA[Before I started consulting, I was an Oracle engineer in a very large software development organization. The company had a number of major products and the one I worked with was used by hospitals and radiology offices world-wide.  (These guys are one of the biggest companies worldwide in the field.)  Our product included [...]]]></description>
			<content:encoded><![CDATA[<p>Before I started consulting, I was an Oracle engineer in a very large software development organization. The company had a number of major products and the one I worked with was used by hospitals and radiology offices world-wide.  (These guys are one of the biggest companies worldwide in the field.)  Our product included the hardware and software; you could order it with Sun or HP hardware (Solaris or Linux). It had an Oracle backend and a web-based middle tier built with lots of C++ and Java code.</p>
<p>Any software engineer who has worked on large projects - industry or community - can tell you the importance of solid change control processes. So&#8230; since everything for the build had to be checked into clearcase&#8230; yes, we checked Oracle 10g into the repo. A 2G tarball. And whenever there was a 1K patch to the oracle install? A brand new 2G tarball.  The clearcase guys loved me.</p>
<p>And that was how I started thinking about an automated build process. I don&#8217;t need to check B24792-01_1of5.zip into the repository because it&#8217;s straight off edelivery. I can skip p7150622_10203_Linux-x86-64.zip since it&#8217;s direct from metalink. The only thing missing was a solid, simple, flexible program for automating the oracle install, patchset, CPUs and oneoffs - taking Oracle&#8217;s official bits as input.</p>
<p>Anyone else out there who could use a program like this? How about for rapid provisioning of servers? (Like all that grid buzz.) Grid Control does some basic stuff (if you can make it work) and there are advanced kits for the big data centers - but there have to be more people than just me who would love to have this program.</p>
<h3>Proposing orainstall</h3>
<p>That&#8217;s why I&#8217;m proposing to write a bit of software called <strong>orainstall</strong>. It would be script-based and cross-platform and intended for community use. I have some ideas for a design but I&#8217;m really interested to hear how you might use something like this and what features might be useful to you. Do you think it&#8217;s a good design? I&#8217;d also be interested in hearing if you&#8217;d be interested to use it or to help test it.</p>
<p>The program would be launched by running <span id="more-625"></span>the main executable &#8220;orainstall&#8221;.  The syntax could look something like this:</p>
<pre><code>Command line syntax:
  ./orainstall &lt;stepfile&gt; [-oh &lt;ORACLE_HOME&gt;] [-tmp &lt;ORAINST_TMP&gt;]

Example:
  ./orainstall ora10203.sf -oh /u01/app/oracle/product/10.2.0/db_1 -tmp /u01/app/oracle/tmp

Return Value:
  0 - success
  1 - invalid stepfile
  2 - installation error
  3 - patching error</code></pre>
<h3>The Stepfile</h3>
<p>What I&#8217;m thinking is that the install process is defined in an input file called the <strong>stepfile</strong>.  The stepfile needs to accomodate a large variety of organizational requirements.  Here&#8217;s my first draft for what a stepfile would look like:</p>
<pre><code># This stepfile instructs the oracle auto-installer how to setup the oracle
# home. Each line is executed in order from top to bottom.
#
# There are ten defined directives for this stepfile:
#   ORACLE_HOME - required
#     This will cause a search-replace of each response file to explicitly
#     define the oracle home. It must be defined before any open, install or
#     patch directives and it can only be defined once per stepfile.
#       Order of precedence:
#         1. command line
#         2. stepfile
#         3. environment variable
#   ORAINST_TMP - optional
#     Top-level temp directory for extracting contents of Oracle installers
#     and patches. Each installer or patch will be extracted into a
#     subdirectory of this temp location. It must be defined before any
#     open or patch directives. It can be defined multiple times in a
#     stepfile.
#       Order of precedence:
#         1. command line
#         2. stepfile
#         3. environment variable
#         4. "/tmp/orainstall"
#   OPEN /path/to/zip
#     Path to a zip file with database installer or major patchset which
#     requires use if OUI. The file is unzipped into a temporary location.
#     If several INSTALL directives are encountered in a row then they are
#     all unzipped into the same location; if there is another directive in
#     between then the next INSTALL is unzipped into a new temporary
#     location.
#   INSTALLPATH Disk1/runInstaller
#   INSTALLRSP install.rsp
#      "runInstaller -waitForCompletion -silent -responseFile &lt;install.rsp&gt;"
#     INSTALLPATH is optional - if not specified then the script will search
#     for it in the most recently created temp location. If there is more
#     than one runInstaller then it will use the first one it finds; search
#     order is undefined.
#     If no path is specified for the response file then orainstall will
#     search for it in this order:
#       1. current working directory (pwd) when orainstall was launched
#       2. same directory as orainstall binary (dirname $0)
#   OPATCH/path/p8000.zip
#     Path to special "OPatch" update - contents are unzipped directly to
#     ORACLE_HOME.
#   PATCH /path/p2345.zip
#     Path to a patch. Contents are unzipped to a temporary location then
#     applied by calling "opatch" from the ORACLE_HOME.
#   NPATCH /path/p1234.zip
#     Path to a patchset with multiple subpatches which need to be applied
#     with "napply" (such as a CPU). Contents are unzipped to temp location
#     then applied properly to the ORACLE_HOME.
#   CLEAN_TMP
#     Deletes all files from the current top-level temp directory.
#   REMOVE /path/to/file
#     Runs "rm -rf $ORACLE_HOME/path/to/file" - use carefully! Some sites
#     may require  removal of components for security requirements.
#
# If there are any errors (install or patch) then the process will fail.
#
#oracle_home /u01/app/oracle/product/10.2.0/db_1
orainst_tmp /u01/app/oracle/installers.tmp
open /net/rh5lab12/u10/stage/ora10201/B24792-01_1of5.zip
open /net/rh5lab12/u10/stage/ora10201/B24792-01_2of5.zip
open /net/rh5lab12/u10/stage/ora10201/B24792-01_3of5.zip
open /net/rh5lab12/u10/stage/ora10201/B24792-01_4of5.zip
open /net/rh5lab12/u10/stage/ora10201/B24792-01_5of5.zip
installpath database/runInstaller
installrsp /net/rh5lab12/u10/stage/rsp/database10201.rsp
installpath companion/runInstaller
installrsp /net/rh5lab12/u10/stage/rsp/companion10201.rsp
open /net/rh5lab12/u10/stage/ora10203/p5337014_10203_Linux-x86-64.zip
installrsp /net/rh5lab12/u10/stage/rsp/patchset10203.rsp
clean_tmp
orainst_tmp /u01/app/oracle/patches
opatch /net/rh5lab12/u10/stage/patch10203/p6880880_10203_Linux-x86-64.zip
patch /net/rh5lab12/u10/stage/patch10203/p5892355_10203_Linux-x86-64.zip
patch /net/rh5lab12/u10/stage/patch10203/p6455161_10203_Linux-x86-64.zip
npatch /net/rh5lab12/u10/stage/patch10203/p7150622_10203_Linux-x86-64.zip
remove /htmldb</code></pre>
<h3>Program Flow</h3>
<p>The program flow would be pretty basic; it would validate the stepfile then execute it one line at a time.  I thought of six things to validate before starting execution:</p>
<ol>
<li>can read stepfile</li>
<li>one oracle_home, defined before open/install/patch, can create/write</li>
<li>orainst_tmp defined before open/install/patch, can create/write each</li>
<li>clean_tmp not defined before orainst_tmp</li>
<li>can read all open/install/patch input files</li>
<li>free space in oracle_home is greater than size of input files</li>
</ol>
<h3>Feedback</h3>
<p>Maybe there aren&#8217;t as many people as I think who would use this. After all - most DBAs don&#8217;t install the database software very often; they just want to keep it running! But please let me know if it interests you. I&#8217;ll factor in the feedback I receive as I&#8217;m developing it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ardentperf.com/2008/10/22/oracle-fully-automated-install-and-patch/feed/</wfw:commentRss>
		</item>
		<item>
		<title>ASMLIB Performance vs Udev</title>
		<link>http://www.ardentperf.com/2008/10/08/asmlib-performance-vs-udev/</link>
		<comments>http://www.ardentperf.com/2008/10/08/asmlib-performance-vs-udev/#comments</comments>
		<pubDate>Wed, 08 Oct 2008 09:28:40 +0000</pubDate>
		<dc:creator>Jeremy</dc:creator>
		
		<category><![CDATA[Linux]]></category>

		<category><![CDATA[Oracle]]></category>

		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://www.ardentperf.com/?p=615</guid>
		<description><![CDATA[Is asmlib obsolete on a modern Linux system?  I&#8217;m still undecided but starting to lean toward &#8220;yes&#8221;.
Everybody knows that asmlib was very useful when it was first introduced with Oracle 10.1 to simplify a host of issues on Linux: direct async device access without raw devices, file permissions &#038; ownership without custom code, and [...]]]></description>
			<content:encoded><![CDATA[<p>Is asmlib obsolete on a modern Linux system?  I&#8217;m still undecided but starting to lean toward &#8220;yes&#8221;.</p>
<p>Everybody knows that asmlib was very useful when it was first introduced with Oracle 10.1 to simplify a host of issues on Linux: direct async device access without raw devices, file permissions &#038; ownership without custom code, and persistent device naming without devlabel.</p>
<p>But I&#8217;m now involved in setting some standards to be used across a large organization for Oracle 10.2 on RHEL5 and I&#8217;m wondering if there&#8217;s still a case for using asmlib.  So I did a little trolling for info - which was suprisingly sparse.  Had a hard time finding much, but after a lot of digging I think I&#8217;ve compiled a useful bit of information about benefits and drawbacks.</p>
<h4>Benefits</h4>
<p>It seems to me that the ASMLIB API was originally introduced to do more than just simplify file permissions - sounds like it was an alternative I/O API to the standard unix one, allowing ASM to access the underlying storage more efficiently and completely.  I don&#8217;t think it&#8217;s just an &#8220;extra layer&#8221; - it&#8217;s an alternative code path to the std unix I/O libs.  Like an ODM for block devices - and the idea was that there could be additional vendor implementations.  And Oracle released an initial generic implementation on Linux under the GPL.<br />
<span id="more-615"></span></p>
<p>Some theoretical benefits of ASMLIB API:</p>
<ul>
<li>
always uses direct, async i/o
</li>
<li>
solves persistent device naming, even if underlying device moves across reboots
</li>
<li>
solves file permissions and ownership
</li>
<li>
reduced user mode to kernel mode context switches during I/O, possibly reducing CPU usage
</li>
<li>
reduced file handle usage
</li>
<li>
pass metadata such as I/O prioritization to storage device (don&#8217;t think this is implemented in the Linux version)
</li>
</ul>
<p><br/></p>
<h4>Drawbacks</h4>
<p>However from what I can tell, the ASMLIB API didn&#8217;t quite catch on.  Wim recommends it (as of Apr &#8216;07) but Closson doesn&#8217;t really talk about it on his blog but has one disparaging off-hand reference about pitying people who use it.  There have been no vendor implementations that I could find.  (But I wonder what&#8217;s going on now with exadata and &#8220;iDB&#8221; which has I/O prioritization and predicate pre-filtering - could they have used the API here?)  Wim&#8217;s post on Oracle Forums says that there isn&#8217;t &#8220;much of an io performance benefit&#8221; added by the linux ASMLIB implementation. And the OTN page hasn&#8217;t been updated for almost two years.</p>
<p>A RedHat consultant I&#8217;m working with recommended against using it and one of his reasons was that it introduces an unnecessary additional layer and dependency in the kernel.  There have been a small number of past bugs in asmlib and it doesn&#8217;t seem to be frequently updated (once this year according to the changelogs)… but then it&#8217;s very simple code (mostly in a single small file).</p>
<h4>Conclusions</h4>
<p>After all of this, I&#8217;m not convinced that it&#8217;s worth installing ASMLIB if you&#8217;re comfortable with udev.  On Linux, udev is definitely a better solution to handle persistent device naming and permissions and I don&#8217;t see enough benefits in asmlib to outweigh the very slight additional overhead.</p>
<p>So right now my preference would be to only install asmlib on old systems - Oracle 10.1 or Linux 2.4 - but to use udev and block devices for 10.2 and newer on 2.6 kernels.  But the case isn&#8217;t closed and I&#8217;d love to hear from anyone who agrees or disagrees&#8230; thoughts?</p>
<h4>References:</h4>
<ul>
<li>
Wim&#8217;s brief remarks on Oracle Forums - <a href="http://forums.oracle.com/forums/thread.jspa?threadID=498215">http://forums.oracle.com/forums/thread.jspa?threadID=498215</a>
</li>
<li>
In-depth information from ASM reference book - <a href="http://books.google.co.uk/books?id=HB453L86Q6AC&#038;pg=RA1-PA142&#038;lpg=RA1-PA142&#038;dq=asmlib+api&#038;source=web&#038;ots=bEP4epFbp5&#038;sig=47aLUWW9o73Snr1tcnvz_1h3prU&#038;hl=en&#038;sa=X&#038;oi=book_result&#038;resnum=1&#038;ct=result#PRA1-PA142,M1 ">http://books.google.co.uk/books?id=HB453L86Q6AC&#038;pg=RA1-PA142&#038;lpg=RA1-PA142&#038;dq=asmlib+api&#038;source=web&#038;ots=bEP4epFbp5&#038;sig=47aLUWW9o73Snr1tcnvz_1h3prU&#038;hl=en&#038;sa=X&#038;oi=book_result&#038;resnum=1&#038;ct=result#PRA1-PA142,M1</a>
</li>
<li>
Uses submit_bio() call - source code - <a href="http://oss.oracle.com/viewvc/oracleasm/trunk/kernel/oracleasm.c?view=markup">http://oss.oracle.com/viewvc/oracleasm/trunk/kernel/oracleasm.c?view=markup</a>
</li>
<li>
cdos newsgroup thread with speculation about performance - <a href="http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/905296d0aba00b84?pli=1">http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/905296d0aba00b84?pli=1</a>
</li>
<li>
tim hall was going to investigate though he didn&#8217;t publish results - <a href="http://www.oracle-base.com/blog/2006/05/01/asm-with-asmlib-or-raw-devices/">http://www.oracle-base.com/blog/2006/05/01/asm-with-asmlib-or-raw-devices/</a>
</li>
<li>
kevin closson doesn&#8217;t say much but makes this off-hand remark - <a href="http://kevinclosson.wordpress.com/2006/11/02/dbwr-efficiency-aio-io-libraries-with-asm/">http://kevinclosson.wordpress.com/2006/11/02/dbwr-efficiency-aio-io-libraries-with-asm/</a>
</li>
<li>
short anonymous comment possibly by closson - <a href="http://archives.devshed.com/forums/databases-124/asmlib-1157660.html">http://archives.devshed.com/forums/databases-124/asmlib-1157660.html</a>
</li>
<li>
OTN official site for ASMLIB - <a href="http://www.oracle.com/technology/tech/linux/asmlib/index.html">http://www.oracle.com/technology/tech/linux/asmlib/index.html</a>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.ardentperf.com/2008/10/08/asmlib-performance-vs-udev/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Collaborate Wrapup, RAC 11g/VMware Lab</title>
		<link>http://www.ardentperf.com/2008/04/24/collaborate-wrapup-rac-11gvmware-lab/</link>
		<comments>http://www.ardentperf.com/2008/04/24/collaborate-wrapup-rac-11gvmware-lab/#comments</comments>
		<pubDate>Thu, 24 Apr 2008 13:50:43 +0000</pubDate>
		<dc:creator>Jeremy</dc:creator>
		
		<category><![CDATA[Linux]]></category>

		<category><![CDATA[Oracle]]></category>

		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://www.ardentperf.com/2008/04/24/collaborate-wrapup-rac-11gvmware-lab/</guid>
		<description><![CDATA[So Collaborate is over and I&#8217;m back in Chicago&#8230; home sweet home.  I thoroughly enjoyed the week in Denver, in spite of the snow!  Thursday, the last day, was especially fun.
First was a panel debate &#8220;To RAC or Not To RAC: What’s Best for HA.&#8221;  Dan Norris invited me to participate in [...]]]></description>
			<content:encoded><![CDATA[<p>So Collaborate is over and I&#8217;m back in Chicago&#8230; <a href="http://www.youtube.com/watch?v=xuRhaDrnlWo">home sweet home</a>.  I thoroughly enjoyed the week in Denver, in spite of the snow!  Thursday, the last day, was especially fun.</p>
<p>First was a panel debate &#8220;To RAC or Not To RAC: What’s Best for HA.&#8221;  <a href="http://www.dannorris.com">Dan Norris</a> invited me to participate in  this panel along with <a href="http://www.pythian.com/blogs/author/alex">Alex Gorbachev</a> (Pythian), <a href="http://www.predictive-technologies.com/">Neil Greene</a> (Predictive Technologies) and <a href="http://www.gridapp.com/about/m_zito.php">Matt Zito</a> (GridApp) - I certainly felt privileged to take part.  Here are a few of the things that I took away from the debate&#8230; feel free to discuss:<br />
<span id="more-589"></span></p>
<ul>
<li>
<p>There are no silver bullets.  And if there were, RAC wouldn&#8217;t be one.  :)  Many RAC implementation problems can be traced to poorly set expectations; knowing your requirements and RAC&#8217;s capabilities will go a long way towards making any project successful.</p>
</li>
<li>
<p>RAC really shines when solving scalability problems.  The simple act of moving an application from single-instance to RAC is more likely to improve throughput than service time.  (Of course, improvement isn&#8217;t guaranteed: as Dan pointed out during our conversation it depends on where the bottlenecks are.)</p>
</li>
<li>
<p>There are two major application design areas that need to be addressed when moving to RAC: (1) scalability and (2) connectivity.  (There are other issues too but these are probably the biggest.)  Regarding scalability, small inefficiencies will become major bottlenecks when you move your app to a cluster.  Look out for any points of contention or serialization whether they&#8217;re logical (like an ID field that must be sequential) or physical (like free space management).  On the connectivity front, there is both initial connection and load balancing to worry about.</p>
</li>
<li>
<p>When it comes to HA, I personally think that failover clusters should get more serious consideration then they do.  I think that there are essentially two differences between a failover cluster and RAC: (1) a few minutes of downtown each year, and (2) the prices for licenses - probably tens or hundreds of thousands of dollars minimum.  You can even still use TAF with a failover cluster - it&#8217;s not much different than using it with Data Guard.  The few apps that need extremely fast failover often can&#8217;t even wait a few minutes - they commonly run parallel systems in tandem to avoid failover altogether.  Most other apps can tolerate an extra 60 seconds during failover, if we&#8217;re honest.</p>
</li>
</ul>
<p>At any rate, I thought that the debate was informative and that Alex, Neil and Matt were each very well spoken.  Alex also has <a href="http://www.pythian.com/blogs/951/alex-gorbachev-at-collaborate-08">a good write-up on his Pythian blog</a>.</p>
<div class=""><a href="http://www.ardentperf.com/wp-photos/20080418-155143-1.jpg" onclick="window.open('http://www.ardentperf.com/wp-photos/20080418-155143-1.jpg','full_size_image','toolbar=0,scrollbars=0,location=0,status=0,menubar=0,resizable=1,height=660,width=500');return false;"><img src="http://www.ardentperf.com/wp-photos/thumb.20080418-155143-1.jpg" alt="0417081104.jpg" title="0417081104.jpg" style="" class="postie-image" /></a></div>
<p>After the debate panel I slipped over to hear Alex&#8217;s 11g New Features presentation (which was actually <a href="http://www.pythian.com/blogs/author/kutrovsky/">Christo</a>&#8217;s presentation but Alex delivered it since Christo wasn&#8217;t able to fly over from Dubai).  The presentation went great even though Alex hadn&#8217;t written it himself and I snapped this pic with my cell phone while I was there.</p>
<p>Alex&#8217;s session was the last one of the conference, so afterwards we grabbed a bite to eat and spent some time catching up.  Dan joined us a bit later and then we made our way to the airport to have dinner and catch our respective flights.  Thanks for the great company guys!</p>
<div style="clear:left"></div>
<h4>RAC 11g/VMware Lab</h4>
<p>One last thing: on Wednesday we had a lab session where we went through the step-by-step process of setting up 11g RAC in a VMware Server virtual environment with OEL5 and ASM.  I have uploaded the class to the <a href="/publications">publications section</a> of this website.</p>
<p>If you have a 2Ghz processor, 2GB of memory and 20G of disk space, then you can download all of the software for free and try out <a href="/publications">this lab</a> for yourself:</p>
<ul>
<li>
VMware Server - <a href="http://www.vmware.com/download/server/">http://www.vmware.com/download/server/</a>
</li>
<li>
Oracle Enterprise Linux (OEL) 5 - <a href="http://edelivery.oracle.com/linux">http://edelivery.oracle.com/linux</a>
</li>
<li>
Oracle Database Enterprise Edition 11g - <a href="http://www.oracle.com/technology/software/products/database/index.html">http://www.oracle.com/technology/software/products/database/index.html</a>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.ardentperf.com/2008/04/24/collaborate-wrapup-rac-11gvmware-lab/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Oracle Services on RAC at Collaborate</title>
		<link>http://www.ardentperf.com/2008/04/16/oracle-services-on-rac-at-collaborate/</link>
		<comments>http://www.ardentperf.com/2008/04/16/oracle-services-on-rac-at-collaborate/#comments</comments>
		<pubDate>Wed, 16 Apr 2008 23:56:49 +0000</pubDate>
		<dc:creator>Jeremy</dc:creator>
		
		<category><![CDATA[Linux]]></category>

		<category><![CDATA[Oracle]]></category>

		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://www.ardentperf.com/2008/04/16/oracle-services-on-rac-at-collaborate/</guid>
		<description><![CDATA[Just a quick post to say that I&#8217;ve uploaded the slides from my services presentation at Collaborate and you can find them over on the publications page.  Thanks to everyone who attended!!  Great questions and comments throughout the session.  Next time I&#8217;ll try to get through everything faster so that there&#8217;s more [...]]]></description>
			<content:encoded><![CDATA[<p>Just a quick post to say that I&#8217;ve uploaded the slides from my services presentation at Collaborate and you can find them over on the <a href="/publications">publications page</a>.  Thanks to everyone who attended!!  Great questions and comments throughout the session.  Next time I&#8217;ll try to get through everything faster so that there&#8217;s more time for Q&#038;A!</p>
<p>Next week sometime I expect to upload the instructions from today&#8217;s Hands-on Lab (11g RAC on VMware with ASM and OEL5).  I want to clean it up a bit first.  For anyone who didn&#8217;t hear the story, I heard on Sunday evening that they had a room with about 50 computers which was going to sit empty during a few timeslots.  And to me that seemed like a tragedy - how often do I wish I could have a chance to try something new with a bit of guidance and without worrying about hosing my laptop?!  So I decided to write a hands-on lab for 11gRAC/VMware.  I pretty much spent all day yesterday putting it together but it came out great!</p>
<p>Also I&#8217;m thinking about repeating that RAC/VMware lab around Chicago sometime&#8230; is there anyone around the Chicago area who might be interested in something like this?</p>
<p>Guess I should also mention that I&#8217;ve been rather enjoying myself at Collaborate so far too!  (Even though I spent pretty much all day yesterday making that Hands-on Lab&#8230;)  Got a chance to meet <a href="http://www.rittmanmead.com/author/peter-scott/">Peter</a> <a href="http://pjsrandom.wordpress.com/">Scott</a> Monday night - that was fun!  Somehow he spotted my name badge and then I got to finally put a face to someone I&#8217;d only known as a blogger.  :)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ardentperf.com/2008/04/16/oracle-services-on-rac-at-collaborate/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Oracle IOPS and HBA Queue Depth</title>
		<link>http://www.ardentperf.com/2008/03/13/oracle-iops-and-hba-queue-depth/</link>
		<comments>http://www.ardentperf.com/2008/03/13/oracle-iops-and-hba-queue-depth/#comments</comments>
		<pubDate>Thu, 13 Mar 2008 16:07:24 +0000</pubDate>
		<dc:creator>Jeremy</dc:creator>
		
		<category><![CDATA[Linux]]></category>

		<category><![CDATA[Oracle]]></category>

		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://www.ardentperf.com/2008/03/13/oracle-iops-and-hba-queue-depth/</guid>
		<description><![CDATA[About a month ago I wrote an overview of Linux Caching and I/O Queues as they pertain to Oracle.  I was working on a project to architect, install and configure the beginnings of an 8-node cluster consisting of either one or two RAC databases.  During the project, while I was waiting for the [...]]]></description>
			<content:encoded><![CDATA[<p>About a month ago I wrote an overview of <a href="http://www.ardentperf.com/2008/01/31/oracle-io-and-operating-system-caching/">Linux Caching and I/O Queues as they pertain to Oracle</a>.  I was working on a project to architect, install and configure the beginnings of an 8-node cluster consisting of either one or two RAC databases.  During the project, while I was waiting for the OS guys to resolve some networking issues, I ran a bunch of benchmarks on the storage subsystem.  Specifically, I experimented with the size of the HBA Queue Depth to see if it would make a difference in performance.</p>
<p>But before getting into the results, a quick overview of our configuration: it was 11g RAC on Red Hat Enterprise Linux 5; Dell servers with four dual-core Opteron chips each.  The RAC cluster initially had four nodes but will grow to at least eight as the data is migrated.  The system has 4G QLogic cards, a McData switch and a 3Par SAN (which is blazing fast).  ASM (no CFS) and dedicated Oracle Homes. The first spec had an InfiniBand interconnect but after a teleconference with <a href="http://www.pythian.com/blogs/author/alex">Alex from Pythian</a> discussing the project&#8217;s specific requirements, the spec was updated to use redundant Gigabit Ethernet.</p>
<p>Picking up where I left off: the default limit set by the Linux qla2xxx driver for concurrent I/O requests on QLogic cards (32 per LUN) is conservative. So can I increase performance by increasing this limit? The best way to answer a question like this is simply to try it.<br />
<span id="more-572"></span></p>
<h4>Tweaking the HBA Queue Depth</h4>
<p>With QLogic HBA&#8217;s on Linux the queue depth is configured through the <em>ql2xmaxqdepth</em> module option.  I want to run an experiment where I vary this parameter and measure the I/O performance.  To start out, I&#8217;d like to compare the default queue size of 32 with an increased setting of 64.  But how can I effectively measure I/O performance?  I&#8217;m specifically interested in how Oracle&#8217;s RDBMS will perform, so I think that the best tool is <a href="http://www.oracle.com/technology/software/tech/orion/index.html">Oracle&#8217;s Orion benchmarking tool</a> which is designed to simulate database I/O patterns and measure the result.</p>
<p>Now there are two ways to measure I/O performance: IOPS and MBPS.</p>
<ol>
<li>
<p>When you measure <strong>IOPS</strong> you&#8217;re usually investigating small I/O operations and putting stress on the overhead associated with a single read or write.  <em>Throughput is not the main concern when you measure IOPS.</em>  This is most relevant on transactional systems.</p>
</li>
<li>
<p>When you measure <strong>MBPS</strong> you&#8217;re usually investigating large I/O operations and putting stress on the overall throughput.  <em>Latency is not the main concern when you measure MBPS.</em>  This is most relevant on warehouse and analytical systems.</p>
</li>
</ol>
<p>For more reading, <a href="http://www.ittoolbox.com/profiles/jkoopmann">James Koopmann</a> just wrote <a href="http://www.databasejournal.com/article.php/2205281">a few good articles</a> over at Database Journal about <a href="http://www.dbasupport.com/oracle/ora10g/disk_IO.shtml">getting IOPS/MBPS measurements from existing databases</a> and <a href="http://www.dbasupport.com/oracle/ora10g/disk_IO_02.shtml">the relationship between IOPS/MBPS and vendor-supplied disk specs</a>.</p>
<p>Now this particular project&#8217;s database will back several high-volume websites and the traffic is probably only 10-20% writes and 80-90% reads.  However much of the data changes somewhat frequently and it is definitely an OLTP workload - almost entirely index-based reads.  On the AWR report from a current production database, scattered reads almost didn&#8217;t even register while sequential reads were by far the most significant event.  This means that we need to optimize this storage system - and our benchmark - for IOPS rather than raw throughput.</p>
<h4>Orion Results</h4>
<p>The basic idea behind Orion is pretty simple.  Orion is designed to simulate a mixed workload between single-block reads and multi-block reads.  You give it a bunch of parameters that describe your environment - and then it runs its test over and over again, varying the balance between single-block and multi-block reads while measuring the IOPS, latency and MBPS.  One important point: <em>Orion varies the balance by changing the number of threads that are concurrently doing reads/writes.  Unlike swingbench or hammerora, there is no think time - each thread constantly tries to do I/O.</em></p>
<p>Since the database for this project is nearly 100% single-block reads, I decided to just run a &#8220;basic&#8221; matrix - which doesn&#8217;t test mixed workloads.  It runs about 45 tests with varying levels of concurrency between 1 thread and 500 threads for single-block reads and the same for multiblock-reads.  (I just ignored the multi-block results.)  I instructed Orion to do 15% write operations and 85% read operations.</p>
<pre><code>#
# ./orion_linux_em64t -run advanced -testname simple -num_disks 100 -write=15 -matrix=basic -verbose
#
# </code></pre>
<p>I ran this benchmark four times in a row.  Before each run I changed the queue depth and rebooted the server.  I alternated between 32 and 64, using each queue depth twice.  The result was a consistent 7% improvement in IOPS and latency by doubling the queue depth.</p>
<p><a href='http://www.ardentperf.com/wp-content/uploads/2008/03/iops-queuedepth.GIF' title='IOPS for different queue depths'><img src='http://www.ardentperf.com/wp-content/uploads/2008/03/iops-queuedepth.thumbnail.GIF' alt='IOPS for different queue depths' /></a> <a href='http://www.ardentperf.com/wp-content/uploads/2008/03/lat-queuedepth.GIF' title='Latencies for different queue depths'><img src='http://www.ardentperf.com/wp-content/uploads/2008/03/lat-queuedepth.thumbnail.GIF' alt='Latencies for different queue depths' /></a></p>
<div style="clear:both"></div>
<h4>What&#8217;s the Best Setting for Queue Depth?</h4>
<p>That depends on how many clients are accessing the storage device.  Do not max out the queue depth on all your servers based only on this article.  Remember from <a href="http://www.ardentperf.com/2008/01/31/oracle-io-and-operating-system-caching/">the first article</a> that the <em>Storage Device&#8217;s FC Port</em> can concurrently process a limited number of requests.  If you have a large number of devices accessing the same storage array and you increase the queue depth on all of them, then you will start seeing the dreaded &#8220;QUEUE FULL SCSI&#8221; errors!  However if there are only a handful of clients and you know that you won&#8217;t be adding more then you can certainly increase this parameter and get the associated performance boost at peak workload.</p>
<p>To get the optimal value you need to consult the manuals or support channels for your storage system to find out <em>its</em> queue depth.  Factor in the number of clients you have accessing this array plus some buffer for safety and then you can determine optimal values for each server.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ardentperf.com/2008/03/13/oracle-iops-and-hba-queue-depth/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Oracle Clusterware on RHEL5/OEL5 with udev and multipath</title>
		<link>http://www.ardentperf.com/2008/02/13/oracle-clusterware-on-rhel5oel5-with-udev-and-multipath/</link>
		<comments>http://www.ardentperf.com/2008/02/13/oracle-clusterware-on-rhel5oel5-with-udev-and-multipath/#comments</comments>
		<pubDate>Thu, 14 Feb 2008 05:38:39 +0000</pubDate>
		<dc:creator>Jeremy</dc:creator>
		
		<category><![CDATA[Linux]]></category>

		<category><![CDATA[Oracle]]></category>

		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://www.ardentperf.com/2008/02/13/oracle-clusterware-on-rhel5oel5-with-udev-and-multipath/</guid>
		<description><![CDATA[The trouble with Linux? No&#8230; the trouble with computers in general - is that they keep changing!  Solaris 10 comes out, Oracle 11g, Red Hat 5&#8230;  and everything works different!!  It&#8217;s a full-time job just trying to keep up with everything.
Almost exactly one year ago I wrote about using udev on 2.6 [...]]]></description>
			<content:encoded><![CDATA[<p>The trouble with Linux? No&#8230; the trouble with computers in general - is that they keep changing!  Solaris 10 comes out, Oracle 11g, Red Hat 5&#8230;  and everything works different!!  It&#8217;s a full-time job just trying to keep up with everything.</p>
<p>Almost exactly one year ago I wrote about <a href="http://www.ardentperf.com/2007/02/26/udev-for-security-conscious-rac-sysadmins/">using udev on 2.6 kernels</a> to set the <a href="http://www.ardentperf.com/2007/03/22/rac-file-permissions-quick-reference/">proper permissions for Oracle RAC</a>.  Two weeks after that post (March 14) <a href="http://www.redhat.com/docs/manuals/enterprise/">Red Hat Enterprise Linux 5</a> was released and changed everything.</p>
<p>In my original post, I demonstrated how to create a PERMISSIONS file that udev would use when creating the device nodes.  This worked on RHEL4 and SLES9.  However this week I&#8217;ve been helping a client deploy 11g RAC on a RHEL5-based cluster - and I remembered that the PERMISSIONS facility was removed from udev in RH5.  Seems like I remember reading something about having a single source of configuration for udev, which makes sense&#8230; so maybe they picked the RULES.  (You&#8217;ll remember from <a href="http://www.ardentperf.com/2007/02/26/udev-for-security-conscious-rac-sysadmins/">my previous post</a> that RULES are processes right before PERMISSIONS.)  This is just as well since RULES are actually quite a bit more powerful than PERMISSIONS.</p>
<p>So on RHEL5 and OEL5 - in order to conform to Linux Best Practices - we now have to set correct RAC file permissions using udev RULES.  To get started, we need to review how RULES work.  The <a href="http://linuxcommand.org/man_pages/udev8.html">udev manual page</a> gives a good overview of rules processing.  But of course there are <a href="http://www.redhat.com/magazine/002dec04/features/udev/">plenty</a> of great <a href="http://www.reactivated.net/writing_udev_rules.html">tutorials </a>that go deeper if you&#8217;re looking for more.<br />
<span id="more-573"></span></p>
<h4>Block Devices and Raw Devices</h4>
<p>Now of course I&#8217;m not the first person to notice that there&#8217;s no permissions.d directory on RHEL5 and OEL5.  Last September, Grégory Guillou from Pythian <a href="http://www.pythian.com/blogs/613/running-rac-and-asm-on-linux">blogged about installing on Red Hat 5</a> and referenced a post on his French blog that shows <a href="http://arkzoyd.blogspot.com/2007/09/oracle11g-sur-linux-la-fin-des-raw.html">how to setup a RULES file for SCSI block devices</a>.  I don&#8217;t speak French but I was able to copy the text of his RULES files:</p>
<pre><code># Oracle Configuration Registry
KERNEL=="sdb[8-9]", OWNER="root", GROUP="oinstall", MODE="640"
# Voting Disks
KERNEL=="sdb1[0-2]", OWNER="oracle", GROUP="oinstall", MODE="640"
# ASM Devices
KERNEL=="sdb[5-7]", OWNER="oracle", GROUP="dba", MODE="660"</code></pre>
<p>Many people are still configuring raw devices for their voting disks and cluster registries even though this is not necessary.  <a href="http://forums.oracle.com/forums/thread.jspa?threadID=605282&#038;tstart=45">This  Oracle Forums thread</a> gives an sample of using a RULES file to set permissions for raw devices.  It says to create a file called <em>/etc/udev/rules.d/65-raw-permissions.rules</em> with these contents:</p>
<pre><code># Set permissions of raw bindings to Oracle Clusterware devices
KERNEL=="raw1", OWNER="root", GROUP="oinstall", MODE="640"
KERNEL=="raw2", OWNER="oracle", GROUP="oinstall", MODE="640"</code></pre>
<p>In fact there&#8217;s <a href="http://expobadge.com/dldev/dc/file/Linux_device-mapper-udev-CRS-ASM_final9.pdf">a great Oracle Whitepaper on the udev and multipathing</a> that was published all the way back in June of 2007.  It gives another sample configuration for raw devices on Red Hat 5.</p>
<h4>Linux Multipath (Device Mapper) Devices</h4>
<p>That&#8217;s all great.  However&#8230; I&#8217;m doing an implementation on RHEL5 right now and we&#8217;re using the device mapper to multipath connections to the SAN.  And the Oracle white paper - which goes into marvelous depth about multipath, udev and RH5 - never tells us anything about configuring multipath, udev and RH5 <em>all together</em>!  So I have to figure this one out on my own.  No harm done; it provided an interesting challenge for the day.  :)</p>
<p>Multipath devices are a bit tricky.  Usually if you&#8217;re using multipath then you don&#8217;t want to assume that devices will always be discovered in the same order.  (This is the purpose of assigning friendly names by WWN in the multipath.conf file with the <em>alias</em> directive.)  In fact, on our cluster the devices were assigned in different orders on different nodes - the ocr was dm-9 on one node and dm-10 on another.  How do you write a RULES entry if you don&#8217;t know what the name of the device is?</p>
<p>I finally did get the multipath rules file to gave the proper permissions to my OCR and Voting Disks based on alias.  However I was only able to change the /dev/dm-* files and not the /dev/mapper/* nodes.  (Those nodes are not created by udev.)  Therefor I had to use the aliases in /dev/mpath - not the aliases in /dev/mapper - when running the Oracle installer.  How does it work? The symlinks in /dev/mpath are created by udev RULES - so all I had to do was piggyback on the udev config that created them and modify the permissions of the underlying devices.</p>
<p>This client used the aliases vote1, vote2, vote3, ocr1 and ocr2.  Here&#8217;s the config file <em>/etc/udev/rules.d/40-multipath.rules</em> with my changes in bold:</p>
<pre><code># multipath wants the devmaps presented as meaninglful device names
# so name them after their devmap name
SUBSYSTEM!="block", GOTO="end_mpath"
KERNEL!="dm-[0-9]*", ACTION=="add", PROGRAM=="/bin/bash -c '/sbin/lsmod | /bin/grep \
    ^dm_multipath'", RUN+="/sbin/multipath -v0 %M:%m"
KERNEL!="dm-[0-9]*", GOTO="end_mpath"
PROGRAM!="/sbin/mpath_wait %M %m", GOTO="end_mpath"
ACTION=="add", RUN+="/sbin/dmsetup ls --target multipath --exec '/sbin/kpartx -a -p p' \
    -j %M -m %m"
PROGRAM=="/sbin/dmsetup ls --target multipath --exec /bin/basename -j %M -m %m", \
    RESULT=="?*", NAME="%k", SYMLINK="mpath/%c", <strong>GOTO="check_cluster_devs"</strong>
PROGRAM!="/bin/bash -c '/sbin/dmsetup info -c --noheadings -j %M -m %m | /bin/grep \
    -q .*:.*:.*:.*:.*:.*:.*:part[0-9]*-mpath-'", GOTO="end_mpath"
PROGRAM=="/sbin/dmsetup ls --target linear --exec /bin/basename -j %M -m %m", NAME="%k", \
    RESULT=="?*", SYMLINK="mpath/%c", OPTIONS="last_rule"
<strong>GOTO="end_mpath"
LABEL="check_cluster_devs"
RESULT=="ocr*", GROUP="dba", MODE="640"
RESULT=="vote*", OWNER="oracle", GROUP="dba", MODE="640"
OPTIONS="last_rule"</strong>
LABEL="end_mpath"</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://www.ardentperf.com/2008/02/13/oracle-clusterware-on-rhel5oel5-with-udev-and-multipath/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Oracle I/O and Operating System Caching</title>
		<link>http://www.ardentperf.com/2008/01/31/oracle-io-and-operating-system-caching/</link>
		<comments>http://www.ardentperf.com/2008/01/31/oracle-io-and-operating-system-caching/#comments</comments>
		<pubDate>Fri, 01 Feb 2008 05:05:19 +0000</pubDate>
		<dc:creator>Jeremy</dc:creator>
		
		<category><![CDATA[Linux]]></category>

		<category><![CDATA[Oracle]]></category>

		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://www.ardentperf.com/2008/01/31/oracle-io-and-operating-system-caching/</guid>
		<description><![CDATA[Well it&#8217;s been awhile since I&#8217;ve written anything for the blog - during the past four months I went on a trip to Asia, celebrated Thanksgiving and Christmas with both my family and my girlfriend&#8217;s family and then in January I got engaged!  I&#8217;ve also been working on some continuing educational goals - so [...]]]></description>
			<content:encoded><![CDATA[<p>Well it&#8217;s been awhile since I&#8217;ve written anything for the blog - during the past four months I went on a trip to Asia, celebrated Thanksgiving and Christmas with both my family and my girlfriend&#8217;s family and then in January I got engaged!  I&#8217;ve also been working on some continuing educational goals - so needless to say I&#8217;ve been keeping busy.  And I will continue to be very busy over the next few months of wedding planning so I probably won&#8217;t be writing too much.</p>
<p>This post is just another interesting case study from a customer I&#8217;m working with right now.  We were looking at various queues in the I/O stream and wondering how we might be able to tweak them.  I was mainly investigating the host HBA&#8217;s - but in the process of digging into this I also learned about few other queues and Linux internals in general as it relates to Oracle.<br />
<span id="more-569"></span></p>
<h4>Direct Async I/O</h4>
<p><a href='http://www.usenix.org/events/usenix01/full_papers/kroeger/kroeger_html/node8.html' title='linuxcaches.gif'><img src='http://www.ardentperf.com/wp-content/uploads/2008/01/linuxcaches.thumbnail.gif' alt='linuxcaches.gif' /></a>It&#8217;s pretty well documented that the most efficient configuration for Oracle is direct asynchronous I/O.</p>
<p><em>Direct I/O</em> means to avoid unnecessarily copying bits from one memory location to another between the hardware and the SGA.  Every flavor of Unix implements some form of caching for block devices and filesystems; in Linux the <a href="http://tldp.org/LDP/tlk/fs/filesystem.html#tth_sEc9.2">buffer cache</a> (for block devices) or <a href="http://www.linux-security.cn/ebooks/ulk3-html/0596005652/understandlk-CHP-15-SECT-2.html#understandlk-CHP-15-FIG-2">page cache</a> (for filesystem data) can cache database blocks.  Most applications benefit from this caching - however Oracle does not.  There are two primary reasons this caching is bad for Oracle:</p>
<ol>
<li>
<p>Oracle caches this data itself in the SGA and it does a far better job of predicting what blocks will be best retained in memory.  When the OS caches data you essentially end up with two copies of every data block - one that&#8217;s really unused.  You waste half the memory in your server with almost no performance benefit.</p>
</li>
<li>
<p>The data has to be copied extra times before arriving in the SGA - and these extra &#8220;copy&#8221; operations slow down your read operations.</p>
</li>
</ol>
<p><em>Asynchronous I/O</em> means to multitask your read and write operations.  Remember how in MSDOS you could only run one program at a time?  Believe it or not, this is how most Unix filesystems do I/O by default - one thing at a time.  Using Asynchronous is equivalent to upgrading to an Operating System that allows you to run more than one program at a time.  It allows Oracle to issue multiple read and write operations concurrently - this is obviously more efficient.</p>
<p>There are different ways of implementing Direct I/O and Asynchronous I/O in different environments (CIO, QIO, ODM, etc) - but it&#8217;s always best to enable them and then shift memory from the OS caches to the SGA.</p>
<p><img src='http://www.ardentperf.com/wp-content/uploads/2008/01/queues.gif' alt='queues.gif' style='float:right;margin:0px 0px 20px 20px;'/></p>
<h4>The Linux I/O Path</h4>
<p>I should start out by saying that I&#8217;m not a kernel expert.  After some digging through source code and mailing lists I think that I&#8217;ve got a handle on the main concepts here - but I&#8217;m open to correction since this is complicated stuff and I easily could miss some details of how it&#8217;s all implemented!  Let me know if you have anything to add!</p>
<p>On Linux, asynchronous I/O requests are submitted through the <a href="http://www.linuxmanpages.com/man2/io_submit.2.php">io_submit()</a> call.  (Kevin Closson once <a href="http://kevinclosson.wordpress.com/2006/12/05/analyzing-oracle-database-10g-writer-io-activity-on-linux/">wrote about tracing this call</a>.)  This function essentially puts the I/O request into a queue which is managed by the Linux kernel.  I think that this <em>workqueue</em> is a generic kernel object which you can&#8217;t and shouldn&#8217;t need to tweak.  The I/O request is subsequently picked up by one of several kernel AIO background processes and serviced.</p>
<p>What the kernel thread does depends on the underlying device.  FC cards support asynchronous requests to the target (it&#8217;s part of the SCSI spec) - so in this case the kernel thread will issue a number of SCSI read or write requests in parallel.  This is where a <em>second</em> queue comes in!  And this queue, unlike the kernel workqueues, can be tweaked.</p>
<p>The target device (Symetrix, DSxxxx, 3Par, etc) also has an FC port which has its own queue for I/O requests that are being serviced.  On a 3par device these ports only have between 500 and 2000 slots for concurrent I/O requests depending on the model.  This means that all the servers accessing any LUNs over that port cannot issue more than 500-2000 concurrent I/O requests - or you will start to see &#8220;QUEUE FULL SCSI&#8221; errors.</p>
<p>My best guess: this means that the default limit set by the Linux qla2xxx driver for concurrent I/O requests on QLogic cards (32 per LUN) is conservative.  The driver sets this limit by <em>limiting the size of the queue</em>.  So naturally&#8230;  my next question was if I could increase performance by increasing this limit&#8230;</p>
<h4>Tweaking the HBA Queue Depth</h4>
<p>I did run a few tests - but I think I&#8217;ll save the results for a second post since this one has already gotten pretty long.  Any guesses about what the results were?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ardentperf.com/2008/01/31/oracle-io-and-operating-system-caching/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Two days until Asia! Trip Update.</title>
		<link>http://www.ardentperf.com/2007/09/27/two-days-until-asia-trip-update-2/</link>
		<comments>http://www.ardentperf.com/2007/09/27/two-days-until-asia-trip-update-2/#comments</comments>
		<pubDate>Thu, 27 Sep 2007 21:39:43 +0000</pubDate>
		<dc:creator>Jeremy</dc:creator>
		
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://www.ardentperf.com/2007/09/27/two-days-until-asia-trip-update-2/</guid>
		<description><![CDATA[Two days until I leave for Asia.  (You have no idea how busy I&#8217;ve been for the past month!)  If anyone&#8217;s curious about trip details I&#8217;ve posted about it over on the non-technical side of my blog.
http://www.ardentperf.com/2007/09/27/two-days-until-asia-trip-update/
]]></description>
			<content:encoded><![CDATA[<p>Two days until I leave for Asia.  (You have no idea how busy I&#8217;ve been for the past month!)  If anyone&#8217;s curious about trip details I&#8217;ve posted about it over on the non-technical side of my blog.</p>
<p><a href="http://www.ardentperf.com/2007/09/27/two-days-until-asia-trip-update/">http://www.ardentperf.com/2007/09/27/two-days-until-asia-trip-update/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.ardentperf.com/2007/09/27/two-days-until-asia-trip-update-2/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Cache Buffers Chains and Latch Spelunking</title>
		<link>http://www.ardentperf.com/2007/09/13/cache-buffers-chains-and-latch-spelunking/</link>
		<comments>http://www.ardentperf.com/2007/09/13/cache-buffers-chains-and-latch-spelunking/#comments</comments>
		<pubDate>Fri, 14 Sep 2007 00:44:44 +0000</pubDate>
		<dc:creator>Jeremy</dc:creator>
		
		<category><![CDATA[Oracle]]></category>

		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://www.ardentperf.com/2007/09/13/cache-buffers-chains-and-latch-spelunking/</guid>
		<description><![CDATA[Last night I posted a case study where I used the AWR (a blessed new feature) to investigate &#8220;gc buffer busy&#8221; wait events in a RAC environment.  I concluded the write-up by theorizing that the single freelist was pointing all nodes of the cluster to the same small group of blocks for inserts and [...]]]></description>
			<content:encoded><![CDATA[<p>Last night I posted a case study where I used the AWR (a blessed new feature) to investigate &#8220;gc buffer busy&#8221; wait events in a RAC environment.  I concluded the write-up by theorizing that the single freelist was pointing all nodes of the cluster to the same small group of blocks for inserts and thereby causing the blocks on the freelist to always be subject to unearthly contention across the cluster.</p>
<p>One common piece of advice for gc buffer busy waits is to <a href="http://www.freelists.org/archives/oracle-l/03-2007/msg00436.html">treat them like regular buffer busy waits</a>.  Because essentially that&#8217;s what they are - a buffer busy wait on a remote instance.  So another avenue of investigation is to look at what might be causing buffer busy waits across the cluster.</p>
<p>Some people may remember that back in the days before YAPP and the wait interface, latches were usually where the purported &#8220;experts&#8221; looked when you had performance problems.  Particularly those two infamous latches <em>cache buffers chains</em> and <em>library cache</em>.  And of course today these are still an important part of any in-depth investigation and V$LATCH even includes wait time so you can take a time-based approach to analysis.  I spent some time yesterday having a look at the latching in this RAC system and it yielded some results that I thought might be interesting to post.  So here goes&#8230;<br />
<span id="more-564"></span></p>
<h4>Cache Buffers Chains Latch</h4>
<p><a href='http://www.ardentperf.com/wp-content/uploads/2007/09/latches.GIF' title='latches.GIF'><img src='http://www.ardentperf.com/wp-content/uploads/2007/09/latches.thumbnail.GIF' alt='latches.GIF' /></a>Now it was apparent to me pretty quickly that the <em>cache buffers chains</em> latch was the busiest on the system.  And of course it&#8217;s normal for there to be some contention on this latch.  You&#8217;ll notice from the screenshot the enormous difference in gets, spin_gets and sleeps.  So that gave me a bit of a head start - but you could certainly get the same information from the AWR (that blessed new feature) as well.  In fact you can do all of this from the AWR - although look out since latch information is <em>not</em> gathered by default!</p>
<div style="clear:both"></div>
<p>So these guys weren&#8217;t gathering latch info - and so I couldn&#8217;t use the AWR.  Instead I just setup my own temporary table to hold the statistics.</p>
<pre><code>create table jschneider.waits as
select 1 snap, systimestamp timestamp, inst_id
,      CHILD#
,      ADDR
,      GETS
,      MISSES
,      SLEEPS
from gv$latch_children
where name = 'cache buffers chains';

Table created.</code></pre>
<p>At this point I worked on something else for awhile.  After an hour or so I took a second snapshot.</p>
<pre><code>insert into jschneider.waits
select 2, systimestamp, inst_id
,      CHILD#
,      ADDR
,      GETS
,      MISSES
,      SLEEPS
from gv$latch_children
where name = 'cache buffers chains';

393216 rows created.

commit;

Commit complete.</code></pre>
<p>I guess I should briefly explain why I grabbed that information.  The cache buffers chains latch is actually made up of a large number of child latches.  When Oracle needs to access the buffer cache it hashes some of the block&#8217;s information to discover which child it needs to use.  That way each child latch only has a short list of blocks that it is managing.  What I aim to do is find out which child latches are the busiest and then see which segments have blocks protected by those latches.  Not only that but we&#8217;ll be able to see which blocks are the busiest (in real time).</p>
<p>So the next step is to find the child latches suffering from the most contention.  In order to do this we&#8217;ll look for latches that frequently cause processes to sleep (relinquish the processor) while waiting.</p>
<pre><code>with subq as (
select t2.inst_id i, t2.child#, t2.addr,
  t2.gets-t1.gets gets,
  t2.misses-t1.misses misses,
  t2.sleeps-t1.sleeps sleeps
from jschneider.waits6 t1, jschneider.waits6 t2
where t1.child#=t2.child#
  and t1.inst_id=t2.inst_id
  and t1.snap=1 and t2.snap=2
order by sleeps desc
)
select * from subq
where rownum &lt; 40;

         I     CHILD# ADDR                   GETS     MISSES     SLEEPS
---------- ---------- ---------------- ---------- ---------- ----------
         3      32664 00000005F9EA6780     349526       8642        168
         2      32664 00000005F9EA6780     352351       9850        167
         3      21135 00000005F7D9FC80     109896       1284        146
         3      39340 00000005FBA3AD10      58127       1100        142
         5      13536 00000005A8F41F40    2305516      63418        126
         3      57464 0000000608DD4200      45361        909        112
         3      59297 000000060799DF68      31056        690         94
         3      14549 00000005F9C30B08      46444        624         90
         2      53157 00000006078C8808       4727       1129         83
         3      27008 00000005F6E27820      64685       1148         82
         3      62193 00000005FAD17808      28596        550         78
         3      24183 00000005F6DC54B8      32420        505         77
         3      35290 00000005F7F8BE38      65142       1115         75
         3      46376 0000000606C9CA30      29474        653         74
         3       8428 00000005F6BA1900      67146        303         73
         3      15740 00000005F7CE4388      94209        338         72
         3      30375 00000005F7EE1040       4962        663         69
         2      57464 0000000608DD4200      59033        887         68
         3      21180 00000005F7DA1588      65043        743         68
         3      38233 00000005F6FADC08      31475        516         66
         3      25742 00000005F8DE1520      57828        928         65
         3      46198 00000005FAAEB6D0      61915        751         65
         3       6648 00000005F6B63AE0      32022        462         62
         5      18908 00000005A2E51A30    1480518      39246         62
         3      59340 00000005FBCF2210      77557        307         61
         6      63812 00000005A0CD70E8    1071392      63118         61
         2       2526 00000005F5B16B90     339617        303         60
         1      32664 00000005F9EA6780     234869       5319         57
         2      14849 00000005F9C3B1E8      30541        248         57
         3      11539 00000005F9BC80B8     141915        220         57
         2      16944 00000005F9C83F40       3931        961         56
         4      18475 00000005F7D434E0      87239        477         56
         3      33009 00000005F9EB2768      66883        793         56
         2      59438 00000005FACB7B90      27733        742         52
         2      63600 00000005FBD863B0      10316        188         51
         3      44988 00000005FAAC15C0      28087        208         51
         2      52350 00000005FBBFF1E0     570855       3252         50
         3       1388 00000005F6AACD00     203545        163         50
         3       8972 00000005F8B9A4D0      28815        205         49</code></pre>
<h4>Exploring the Buffer Cache</h4>
<p>Notice that I&#8217;ve also queried all latches across the entire cluster.  So this is giving me a system-wide picture of cache buffer chains latches.  Interestingly, the top child is the same one on two different instances (2 and 3).  So next let&#8217;s login locally to instance 3 and see what that latch child is protecting!  Also, I&#8217;m going to grab the current SCN - but only the base - from dbms_flashback.</p>
<pre><code>select mod(dbms_flashback.get_system_change_number,power(2,32)) cur_scn_base from dual;

CUR_SCN_BASE
------------
   279813816

col object format a55
col state format a5
select /*+rule*/ *
from (
select o.owner||'.'||o.object_name||decode(o.subobject_name,NULL,'','.')||
  o.subobject_name||' ['||o.object_type||']' object, sq.*
from (
select
  x.obj,
  x.file#,x.dbablk,
  x.tch,
  decode(x.state,0,'FREE',1,'XCUR',2,'SCUR',3,'CR',4,'READ',
    5,'MREC',6,'IREC',7,'WRITE',8,'PI',9,'MEMORY',10,'MWRITE',
    11,'DONATED',x.state) state,
  decode(x.state,3,cr_scn_bas,NULL) scn_bas
from
  sys.v$latch_children  l,
  sys.x$bh  x
where
  x.hladdr = l.addr and
  x.obj &lt; power(2,22) and
  x.hladdr  = '00000005F9EA6780'
) sq, dba_objects o
where
  o.data_object_id=sq.obj
  order by sq.tch desc, file#, dbablk, scn_bas
) where rownum&lt;40;

OBJECT                                                         OBJ      FILE#     DBABLK        TCH STATE    SCN_BAS
------------------------------------------------------- ---------- ---------- ---------- ---------- ----- ----------
JSCHDER.SPOT_ACTIVITY [TABLE]                               903892        214     110918          3 SCUR
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]           3208618       1209      16393          2 CR     279622172
JSCHDER.BIGTABLE_ORDERS_MF_AND_PARTS [INDEX]                  3136        994     121460          1 SCUR
JSCHDER.BIGTABLE.P_PARTS_APPROVED [TABLE PARTITION]        3064309       1156      33703          1 SCUR
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]           3208618       1209      16393          1 CR     279619002
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]           3208618       1209      16393          1 CR     279619202
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]           3208618       1209      16393          1 CR     279621063
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]           3208618       1209      16393          1 CR     279618998
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]           3208618       1209      16393          1 PI
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]           3208618       1209      16393          1 PI
JSCHDER.BIGTABL_LG_X_CHANGE_DATE.P_2007_09 [INDEX PARTI    3208620       1542       4537          1 SCUR
TION]

JSCHDER.PERSISTENT_QUERY [TABLE]                           1455110         10      99668          0 SCUR
JSCHDER.GENERATED_REPORT [TABLE]                           1455107         13      22427          0 SCUR
JSCHDER.SPOT_ACTIVITY_URL_DETAIL [TABLE]                   2684382        287      51469          0 SCUR
JSCHDER.GENERATED_REPORT [TABLE]                           1455107        476     103892          0 SCUR
JSCHDER.BIGTABLE_TOTALS [TABLE]                            1746552        524     187199          0 SCUR
JSCHDER.PERSISTENT_QUERY [TABLE]                           1455110       1476      21915          0 SCUR

17 rows selected.</code></pre>
<p>Now this query has done more than show me the blocks protected by this latch child.  The &#8220;TCH&#8221; field is telling me the <em>Touch Count</em> for the buffer - an indication of how hot the block is.  This counter is incremented most of the time that Oracle accesses the block and if I remember right it is decremented automatically every 3 seconds or so.  So this will show me - in real time - what blocks are being accessed.</p>
<p>First of all I noticed that there are only 17 blocks on this latch - which should be great for concurrency!  These blocks must have been hammered to have so many sleeps!  Secondly, I noticed that there&#8217;s one block that has <em>seven different versions in this instance alone</em>.  I was immediately suspicious about that block and decided to check something&#8230;</p>
<pre><code>select header_file, header_block from dba_segments
where owner='JSCHDER' and segment_name='BIGTABLE_LOG' and PARTITION_NAME='P_2007_09';

HEADER_FILE HEADER_BLOCK
----------- ------------
       1209        16393</code></pre>
<p>Looks like that&#8217;s the header block!!  Interesting&#8230;  so I wonder how many copies of this block are spread around the cluster?  That&#8217;s pretty easy to check too.</p>
<pre><code>select inst_id, status, dirty, stale from gv$bh
where file#=1209 and block#=16393 order by 1, 2;

   INST_ID STATUS  D S
---------- ------- - -
         1 cr      N Y
         1 cr      N N
         1 cr      N N
         1 cr      N N
         1 cr      N N
         1 pi      Y N
         2 cr      N N
         2 cr      N N
         2 cr      N Y
         2 cr      N N
         2 pi      Y N
         3 cr      N N
         3 cr      N N
         3 cr      N Y
         3 cr      N N
         3 cr      N N
         3 pi      Y N
         4 cr      N N
         4 cr      N N
         4 cr      N N
         4 cr      N N
         4 cr      N N
         4 pi      Y N
         4 xcur    N N</code></pre>
<p>Quite a few.  In fact I discovered that if I repeated that query even a few moments apart the &#8220;xcur&#8221; block (which is the one being updated) is constantly moving between instances.  Normal operation for RAC&#8230;  but interesting to watch nonetheless!</p>
<h4>What Does it All Mean?</h4>
<p>So what&#8217;s the point?  Well we&#8217;ve shored up the theory that BIGTABLE_LOG is the main culprit in this system.  The header block is where the master freelist is and that would be bounced around a lot without freelist groups.  After switching to ASSM or adding freelist groups I would expect to see fewer copies of the header block in the global cache.  Hopefully they&#8217;ll let me know what happens so I can post an update!  (Sounds like they&#8217;re planning to move the partition to ASSM during the downtime window this weekend!)</p>
<h4>Appendix: Quickly Mapping file#/block# to an Object</h4>
<p>I&#8217;m sure that everyone, at some point, has had a block number and wanted to know the object.  And if you&#8217;ve been in this situation with a large database then you realize that there&#8217;s just no good way to do it.  You have to query the dreaded DBA_EXTENTS view&#8230;  which can take years to finish.</p>
<p>If you didn&#8217;t notice, I actually just demonstrated a potential alternative - maybe.  If the block you&#8217;re checking is in the buffer cache (on any instance in RAC) then you can just read from v$bh instead of waiting around for DBA_EXTENTS!  Just use a query something like this:</p>
<pre><code>col object_name format a20
col owner format a10
col subobject_name format a20
select distinct o.owner, o.object_name, o.subobject_name, o.object_type
from dba_objects o, gv$bh b
where o.data_object_id=b.objd and b.objd &lt; power(2,22) and status != 'free'
  and b.file#=1209
  and b.block#=16393
;

OWNER      OBJECT_NAME          SUBOBJECT_NAME       OBJECT_TYPE
---------- -------------------- -------------------- -------------------
JSCHDER    BIGTABLE_LOG         P_2007_09            TABLE PARTITION</code></pre>
<p>Very handy shortcut in my opinion.  If you&#8217;re not on RAC then this will go even faster.</p>
<p>Happy hacking!</p>
<p><em>Update 9/14/07 - updated queries against dba_objects/v$bh to eliminate rollback and temp segs.  also fixed last query in the post to use data_object_id.</em></p>
<h4>More Resources</h4>
<p><a href="http://oracletoday.blogspot.com/">Yas</a> left a comment pointing me to a post from Lewis - and that led me to a few other useful posts that seem worth linking here.</p>
<table>
<tr>
<td><a href="http://jonathanlewis.wordpress.com/2006/11/02/but-its-in-the-manual/">But it’s in the manual!</a></td>
<td>Jonathan Lewis&#8217; post discussing the join between v$bh and dba_objects</td>
</tr>
<tr>
<td><a href="http://www.pythian.com/blogs/282/oracle-rac-cache-fusion-efficiency-a-buffer-cache-analysis-for-rac">Oracle RAC Cache Fusion Efficiency: A Buffer Cache Analysis for RAC</a></td>
<td>Script from Christo Kutrovsky - runs a little slow but gives a great overview of your buffer cache across the entire cluster.</td>
</tr>
<tr>
<td><a href="http://www.pythian.com/blogs/275/using-the-oracle-wait-interface-to-troubleshoot-io-issues">Using the Oracle Wait Interface to Troubleshoot I/O Issues</a></td>
<td>Post from Shakir Sadikali demonstrating a Steve Adams script to show a snapshot of current system-wide wait events.</td>
</tr>
</table>
<p>PS - if you haven&#8217;t checked out my <a href="/links">Links</a> page in awhile then have another look - I&#8217;ve added quite a few links over the past few weeks!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ardentperf.com/2007/09/13/cache-buffers-chains-and-latch-spelunking/feed/</wfw:commentRss>
		</item>
		<item>
		<title>GC Buffer Busy Waits in RAC: Finding Hot Blocks</title>
		<link>http://www.ardentperf.com/2007/09/12/gc-buffer-busy-waits-in-rac-finding-hot-blocks/</link>
		<comments>http://www.ardentperf.com/2007/09/12/gc-buffer-busy-waits-in-rac-finding-hot-blocks/#comments</comments>
		<pubDate>Thu, 13 Sep 2007 02:54:31 +0000</pubDate>
		<dc:creator>Jeremy</dc:creator>
		
		<category><![CDATA[Oracle]]></category>

		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://www.ardentperf.com/2007/09/12/gc-buffer-busy-waits-in-rac-finding-hot-blocks/</guid>
		<description><![CDATA[Well I don&#8217;t have a lot of time to write anything up&#8230;  sheesh - it&#8217;s like 10pm and I&#8217;m still messing with this.  I should be in bed.  But before I quit for the night I thought I&#8217;d just do a quick post with some queries that might be useful for anyone [...]]]></description>
			<content:encoded><![CDATA[<p>Well I don&#8217;t have a lot of time to write anything up&#8230;  sheesh - it&#8217;s like 10pm and I&#8217;m still messing with this.  I should be in bed.  But before I quit for the night I thought I&#8217;d just do a quick post with some queries that might be useful for anyone working on a RAC system who sees a lot of the event &#8220;gc buffer busy&#8221;.</p>
<p>Now you&#8217;ll recall that this event simply means that we&#8217;re waiting for <em>another</em> instance who has the block.  But generally if you see lots of these then it&#8217;s an indication of contention across the cluster.  So here&#8217;s how I got to the bottom of a problem on a pretty active 6-node cluster here in NYC.</p>
<h4>Using the ASH</h4>
<p>I&#8217;ll show two different ways here to arrive at the same conclusion.  First, we&#8217;ll look a the ASH to see what the sampled sessions today were waiting on.  Second, we&#8217;ll look at the segment statistics captured by the AWR.</p>
<p>First of all some setup.  I already knew what the wait events looked like from looking at dbconsole but here&#8217;s a quick snapshot using the ASH data from today:<br />
<span id="more-563"></span></p>
<pre><code>select min(begin_interval_time) min, max(end_interval_time) max
from dba_hist_snapshot
where snap_id between 12831 and 12838;

MIN                            MAX
------------------------------ ------------------------------
12-SEP-07 09.00.17.451 AM      12-SEP-07 05.00.03.683 PM</code></pre>
<p>This is the window I&#8217;m going to use; 9am to 5pm today.</p>
<pre><code>select wait_class_id, wait_class, count(*) cnt
from dba_hist_active_sess_history
where snap_id between 12831 and 12838
group by wait_class_id, wait_class
order by 3;

WAIT_CLASS_ID WAIT_CLASS                            CNT
------------- ------------------------------ ----------
   3290255840 Configuration                         169
   2000153315 Network                               934
   4108307767 System I/O                           7199
   3386400367 Commit                               7809
   4217450380 Application                         12248
   3875070507 Concurrency                         14754
   1893977003 Other                               35499
                                                  97762
   3871361733 Cluster                            104810
   1740759767 User I/O                           121999</code></pre>
<p>You can see that there were a very large number of cluster events recorded in the ASH.  Let&#8217;s look a little closer.</p>
<pre><code>select event_id, event, count(*) cnt from dba_hist_active_sess_history
where snap_id between 12831 and 12838 and wait_class_id=3871361733
group by event_id, event
order by 3;

  EVENT_ID EVENT                                           CNT
---------- ---------------------------------------- ----------
3905407295 gc current request                                4
3785617759 gc current block congested                       10
2705335821 gc cr block congested                            15
 512320954 gc cr request                                    16
3794703642 gc cr grant congested                            17
3897775868 gc current multi block request                   17
1742950045 gc current retry                                 18
1445598276 gc cr disk read                                 148
1457266432 gc current split                                229
2685450749 gc current grant 2-way                          290
 957917679 gc current block lost                           579
 737661873 gc cr block 2-way                               699
2277737081 gc current grant busy                           991
3570184881 gc current block 3-way                         1190
3151901526 gc cr block lost                               1951
 111015833 gc current block 2-way                         2078
3046984244 gc cr block 3-way                              2107
 661121159 gc cr multi block request                      4092
3201690383 gc cr grant 2-way                              4129
1520064534 gc cr block busy                               4576
2701629120 gc current block busy                         14379
1478861578 gc buffer busy                                67275</code></pre>
<p>Notice the <em>huge gap</em> between the number of buffer busy waits and everything else.  Other statistics I checked also confirmed that this wait event was the most significant on the cluster.  So now we&#8217;ve got an event and we know that 67,275 sessions were waiting on it during ASH snapshots between 9am and 5pm today.  Let&#8217;s see what SQL these sessions were executing when they got snapped.  In fact lets even include the &#8220;gc current block busy&#8221; events since there was a bit of a gap for them too.</p>
<pre><code>select sql_id, count(*) cnt from dba_hist_active_sess_history
where snap_id between 12831 and 12838
and event_id in (2701629120, 1478861578)
group by sql_id
having count(*)&gt;1000
order by 2;

SQL_ID               CNT
------------- ----------
6kk6ydpp3u8xw       1011
2hvs3mpab5j0w       1022
292jxfuggtsqh       1168
3mcxaqffnzgfw       1226
a36pf34c87x7s       1328
4vs8wgvpfm87w       1390
22ggtj4z9ak3a       1574
gsqhbt5a6d4uv       1744
cyt90uk11a22c       2240
39dtqqpr7ygcw       4251
8v3b2m405atgy      42292</code></pre>
<p>Wow - another big leap - 4,000 to 42,000!  Clearly there&#8217;s one SQL statement which is the primary culprit.  What&#8217;s the statement?</p>
<pre><code>select sql_text from dba_hist_sqltext where sql_id='8v3b2m405atgy';

SQL_TEXT
---------------------------------------------------------------------------
insert into bigtable(id, version, client, cl_business_id, cl_order_id, desc</code></pre>
<p>I&#8217;ve changed the table and field names so you can&#8217;t guess who my client might be.  :)   But it gets the idea across - an insert statement.  Hmmm.  Any guesses yet about what the problem might be?  Well an insert statement could access a whole host of objects (partitions and indexes)&#8230;  and even more in this case since there are a good number of triggers on this table.  Conveniently, the ASH in 10g records what object is being waited on so we can drill down even to that level.</p>
<pre><code>select count(distinct(current_obj#)) from dba_hist_active_sess_history
where snap_id between 12831 and 12838
and event_id=1478861578 and sql_id='8v3b2m405atgy';

COUNT(DISTINCT(CURRENT_OBJ#))
-----------------------------
                           14

select current_obj#, count(*) cnt from dba_hist_active_sess_history
where snap_id between 12831 and 12838
and event_id=1478861578 and sql_id='8v3b2m405atgy'
group by current_obj#
order by 2;

CURRENT_OBJ#        CNT
------------ ----------
     3122841          1
     3122868          3
     3173166          4
     3324924          5
     3325122          8
     3064307          8
          -1         10
     3064369        331
           0        511
     3122795        617
     3064433        880
     3208619       3913
     3208620       5411
     3208618      22215</code></pre>
<p>Well a trend is emerging.  Another very clear outlier - less than a thousand sessions waiting on most objects but the last one is over twenty-two thousand.  Let&#8217;s have a look at all three of the biggest ones.</p>
<pre><code>select object_id, owner, object_name, subobject_name, object_type from dba_objects
where object_id in (3208618, 3208619, 3208620);

 OBJECT_ID OWNER      OBJECT_NAME                    SUBOBJECT_NAME                 OBJECT_TYPE
---------- ---------- ------------------------------ ------------------------------ -------------------
   3208618 JSCHDER    BIGTABLE_LOG                   P_2007_09                      TABLE PARTITION
   3208619 JSCHDER    BIGTABL_LG_X_ID                P_2007_09                      INDEX PARTITION
   3208620 JSCHDER    BIGTABL_LG_X_CHANGE_DATE       P_2007_09                      INDEX PARTITION</code></pre>
<p>Now wait just a moment&#8230;  this isn&#8217;t even the object we&#8217;re updating!!  Well I&#8217;ll spare you the details but one of the triggers logs every change to BIGTABLE with about 7 inserts into this one.  It&#8217;s all PL/SQL so we get bind variables and everything - it&#8217;s just the sheer number of accesses that is causing all the contention.</p>
<p>One further thing we can do is actually see which blocks are getting most contended for - the ASH records this too.  (Isn&#8217;t the ASH great?)</p>
<pre><code>select current_file#, current_block#, count(*) cnt
from dba_hist_active_sess_history
where snap_id between 12831 and 12838
and event_id=1478861578 and sql_id='8v3b2m405atgy'
and current_obj# in (3208618, 3208619, 3208620)
group by current_file#, current_block#
having count(*)&gt;50
order by 3;

CURRENT_FILE# CURRENT_BLOCK#        CNT
------------- -------------- ----------
         1330         238073         51
         1542          22645         55
         1487         237914         56
         1330         238724         61
         1330         244129         76
         1487         233206        120</code></pre>
<p>One thing that I immediately noticed is that there does <strong>not</strong> seem to be a single hot block!!!  (What?)  Out of 40,000 sessions accessing these three objects no more than 120 ever tried to hit the same block.  Let&#8217;s quickly check if any of these are header blocks on the segments.</p>
<pre><code>select segment_name, header_file, header_block
from dba_segments where owner='JHEIDER' and partition_name='P_2007_09'
and segment_name in ('PLACEMENTS_LOG','PLCMNTS_LG_X_ID',
  'PLCMNTS_LG_X_CHANGE_DATE');

SEGMENT_NAME                   HEADER_FILE HEADER_BLOCK
------------------------------ ----------- ------------
BIGTABL_LG_X_CHANGE_DATE              1207       204809
BIGTABL_LG_X_ID                       1207       196617
BIGTABLE_LOG                          1209        16393</code></pre>
<p>No - all seem to be data blocks.  Why so much contention?  Maybe the RAC and OPS experts out there already have some guesses&#8230;  but first let&#8217;s explore one alternative method to check the same thing and see of the numbers line up.</p>
<h4>AWR Segment Statistics</h4>
<p>Here&#8217;s a handy little query I made up the other day to quickly digest any of the segment statistics from the AWR and grab the top objects for the cluster, reporting on each instance.  I&#8217;m not going to explain the whole thing but I&#8217;ll just copy it verbatim - feel free to use it but you&#8217;ll have to figure it out yourself.  :)</p>
<pre><code>
col object format a60
col i format 99
select * from (
select o.owner||'.'||o.object_name||decode(o.subobject_name,NULL,'','.')||
  o.subobject_name||' ['||o.object_type||']' object,
  instance_number i, stat
from (
  select obj#||'.'||dataobj# obj#, instance_number, sum(
GC_BUFFER_BUSY_DELTA
) stat
  from dba_hist_seg_stat
  where (snap_id between 12831 and 12838)
  and (instance_number between 1 and 6)
  group by rollup(obj#||'.'||dataobj#, instance_number)
  having obj#||'.'||dataobj# is not null
) s, dba_hist_seg_stat_obj o
where o.dataobj#||'.'||o.obj#=s.obj#
order by max(stat) over (partition by s.obj#) desc,
  o.owner||o.object_name||o.subobject_name, nvl(instance_number,0)
) where rownum&lt;=40;

OBJECT                                                         I       STAT
------------------------------------------------------------ --- ----------
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]                    2529540
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]               1     228292
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]               2     309684
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]               3     289147
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]               4     224155
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]               5    1136822
JSCHDER.BIGTABLE_LOG.P_2007_09 [TABLE PARTITION]               6     341440
JSCHDER.BIGTABL_LG_X_CHANGE_DATE.P_2007_09 [INDEX PARTITION]        2270221
JSCHDER.BIGTABL_LG_X_CHANGE_DATE.P_2007_09 [INDEX PARTITION]   1     220094
JSCHDER.BIGTABL_LG_X_CHANGE_DATE.P_2007_09 [INDEX PARTITION]   2     313038
JSCHDER.BIGTABL_LG_X_CHANGE_DATE.P_2007_09 [INDEX PARTITION]   3     299509
JSCHDER.BIGTABL_LG_X_CHANGE_DATE.P_2007_09 [INDEX PARTITION]   4     217489
JSCHDER.BIGTABL_LG_X_CHANGE_DATE.P_2007_09 [INDEX PARTITION]   5     940827
JSCHDER.BIGTABL_LG_X_CHANGE_DATE.P_2007_09 [INDEX PARTITION]   6     279264
JSCHDER.BIGTABLE.P_WAREHOUSE [TABLE PARTITION]                      1793931
JSCHDER.BIGTABLE.P_WAREHOUSE [TABLE PARTITION]                 1     427482
JSCHDER.BIGTABLE.P_WAREHOUSE [TABLE PARTITION]                 2     352305
JSCHDER.BIGTABLE.P_WAREHOUSE [TABLE PARTITION]                 3     398699
JSCHDER.BIGTABLE.P_WAREHOUSE [TABLE PARTITION]                 4     268045
JSCHDER.BIGTABLE.P_WAREHOUSE [TABLE PARTITION]                 5     269230
JSCHDER.BIGTABLE.P_WAREHOUSE [TABLE PARTITION]                 6      78170
JSCHDER.BIGTABL_LG_X_ID.P_2007_09 [INDEX PARTITION]                  771060
JSCHDER.BIGTABL_LG_X_ID.P_2007_09 [INDEX PARTITION]            1     162296
JSCHDER.BIGTABL_LG_X_ID.P_2007_09 [INDEX PARTITION]            2     231141
JSCHDER.BIGTABL_LG_X_ID.P_2007_09 [INDEX PARTITION]            3     220573
JSCHDER.BIGTABL_LG_X_ID.P_2007_09 [INDEX PARTITION]            4     157050
JSCHDER.BIGTABLE.P_DEACTIVE [TABLE PARTITION]                        393663
JSCHDER.BIGTABLE.P_DEACTIVE [TABLE PARTITION]                  1      66277
JSCHDER.BIGTABLE.P_DEACTIVE [TABLE PARTITION]                  2      10364
JSCHDER.BIGTABLE.P_DEACTIVE [TABLE PARTITION]                  3       6930
JSCHDER.BIGTABLE.P_DEACTIVE [TABLE PARTITION]                  4       3484
JSCHDER.BIGTABLE.P_DEACTIVE [TABLE PARTITION]                  5     266722
JSCHDER.BIGTABLE.P_DEACTIVE [TABLE PARTITION]                  6      39886
JSCHDER.BIGTABLE.P_ACTIVE_APPROVED [TABLE PARTITION]                 276637
JSCHDER.BIGTABLE.P_ACTIVE_APPROVED [TABLE PARTITION]           1      13750
JSCHDER.BIGTABLE.P_ACTIVE_APPROVED [TABLE PARTITION]           2      12207
JSCHDER.BIGTABLE.P_ACTIVE_APPROVED [TABLE PARTITION]           3      23522
JSCHDER.BIGTABLE.P_ACTIVE_APPROVED [TABLE PARTITION]           4      28336
JSCHDER.BIGTABLE.P_ACTIVE_APPROVED [TABLE PARTITION]           5      99704
JSCHDER.BIGTABLE.P_ACTIVE_APPROVED [TABLE PARTITION]           6      99118

40 rows selected.</code></pre>
<p>As an aside, there is a line in the middle that says &#8220;GC_BUFFER_BUSY_DELTA&#8221;.  You can replace that line with any of these values to see the top objects for the corresponding waits during the reporting period:</p>
<pre><code>LOGICAL_READS_DELTA
BUFFER_BUSY_WAITS_DELTA
DB_BLOCK_CHANGES_DELTA
PHYSICAL_READS_DELTA
PHYSICAL_WRITES_DELTA
PHYSICAL_READS_DIRECT_DELTA
PHYSICAL_WRITES_DIRECT_DELTA
ITL_WAITS_DELTA
ROW_LOCK_WAITS_DELTA
GC_CR_BLOCKS_SERVED_DELTA
GC_CU_BLOCKS_SERVED_DELTA
GC_BUFFER_BUSY_DELTA
GC_CR_BLOCKS_RECEIVED_DELTA
GC_CU_BLOCKS_RECEIVED_DELTA
SPACE_USED_DELTA
SPACE_ALLOCATED_DELTA
TABLE_SCANS_DELTA</code></pre>
<p>Now as you can see, these statistics confirm what we observed from the ASH: the top waits in the system are for the BIGTABLE_LOG table.  However this also reveals something the ASH didn&#8217;t - that the date-based index on the same table is a close second.</p>
<h4>The Real Culprit</h4>
<p>Any time you see heavy concurrency problems during <em>inserts on table data blocks</em> there should always be one first place to look: space management.  Since ancient versions of OPS it has been a well-known fact that freelists are the enemy of concurrency.  In this case, that was exactly the culprit.</p>
<pre><code>select distinct tablespace_name from dba_tab_partitions
where table_name='BIGTABLE_LOG';

TABLESPACE_NAME
------------------------------
BIGTABLE_LOG_DATA

select extent_management, allocation_type, segment_space_management
from dba_tablespaces where tablespace_name='BIGTABLE_LOG_DATA';

EXTENT_MAN ALLOCATIO SEGMEN
---------- --------- ------
LOCAL      USER      MANUAL

SQL&gt; select distinct freelists, freelist_groups from dba_tab_partitions
  2  where table_name='BIGTABLE_LOG';

 FREELISTS FREELIST_GROUPS
---------- ---------------
         1               1</code></pre>
<p>And there you have it.  The busiest table on their 6-node OLTP RAC system is using MSSM with a single freelist group.  I&#8217;m pretty sure this could cause contention problems!  But in this case it wasn&#8217;t quite what I expected.  It looks to me like the single freelist itself wasn&#8217;t the point of contention - but it was pointing all of the nodes to the same small number of blocks for inserts and these data blocks were getting fought over.  But they were probably filling up quickly and so no single block had a large number of waits reported in the ASH.  I haven&#8217;t proven that but it&#8217;s my current theory.  :)  If anyone has another idea then leave a comment and let me know!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ardentperf.com/2007/09/12/gc-buffer-busy-waits-in-rac-finding-hot-blocks/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
