So Collaborate is over and I’m back in Chicago… home sweet home. I thoroughly enjoyed the week in Denver, in spite of the snow! Thursday, the last day, was especially fun.

First was a panel debate “To RAC or Not To RAC: What’s Best for HA.” Dan Norris invited me to participate in this panel along with Alex Gorbachev (Pythian), Neil Greene (Predictive Technologies) and Matt Zito (GridApp) - I certainly felt privileged to take part. Here are a few of the things that I took away from the debate… feel free to discuss:
Read more

Just a quick post to say that I’ve uploaded the slides from my services presentation at Collaborate and you can find them over on the publications page. Thanks to everyone who attended!! Great questions and comments throughout the session. Next time I’ll try to get through everything faster so that there’s more time for Q&A!

Next week sometime I expect to upload the instructions from today’s Hands-on Lab (11g RAC on VMware with ASM and OEL5). I want to clean it up a bit first. For anyone who didn’t hear the story, I heard on Sunday evening that they had a room with about 50 computers which was going to sit empty during a few timeslots. And to me that seemed like a tragedy - how often do I wish I could have a chance to try something new with a bit of guidance and without worrying about hosing my laptop?! So I decided to write a hands-on lab for 11gRAC/VMware. I pretty much spent all day yesterday putting it together but it came out great!

Also I’m thinking about repeating that RAC/VMware lab around Chicago sometime… is there anyone around the Chicago area who might be interested in something like this?

Guess I should also mention that I’ve been rather enjoying myself at Collaborate so far too! (Even though I spent pretty much all day yesterday making that Hands-on Lab…) Got a chance to meet Peter Scott Monday night - that was fun! Somehow he spotted my name badge and then I got to finally put a face to someone I’d only known as a blogger. :)

About a month ago I wrote an overview of Linux Caching and I/O Queues as they pertain to Oracle. I was working on a project to architect, install and configure the beginnings of an 8-node cluster consisting of either one or two RAC databases. During the project, while I was waiting for the OS guys to resolve some networking issues, I ran a bunch of benchmarks on the storage subsystem. Specifically, I experimented with the size of the HBA Queue Depth to see if it would make a difference in performance.

But before getting into the results, a quick overview of our configuration: it was 11g RAC on Red Hat Enterprise Linux 5; Dell servers with four dual-core Opteron chips each. The RAC cluster initially had four nodes but will grow to at least eight as the data is migrated. The system has 4G QLogic cards, a McData switch and a 3Par SAN (which is blazing fast). ASM (no CFS) and dedicated Oracle Homes. The first spec had an InfiniBand interconnect but after a teleconference with Alex from Pythian discussing the project’s specific requirements, the spec was updated to use redundant Gigabit Ethernet.

Picking up where I left off: the default limit set by the Linux qla2xxx driver for concurrent I/O requests on QLogic cards (32 per LUN) is conservative. So can I increase performance by increasing this limit? The best way to answer a question like this is simply to try it.
Read more

The trouble with Linux? No… the trouble with computers in general - is that they keep changing! Solaris 10 comes out, Oracle 11g, Red Hat 5… and everything works different!! It’s a full-time job just trying to keep up with everything.

Almost exactly one year ago I wrote about using udev on 2.6 kernels to set the proper permissions for Oracle RAC. Two weeks after that post (March 14) Red Hat Enterprise Linux 5 was released and changed everything.

In my original post, I demonstrated how to create a PERMISSIONS file that udev would use when creating the device nodes. This worked on RHEL4 and SLES9. However this week I’ve been helping a client deploy 11g RAC on a RHEL5-based cluster - and I remembered that the PERMISSIONS facility was removed from udev in RH5. Seems like I remember reading something about having a single source of configuration for udev, which makes sense… so maybe they picked the RULES. (You’ll remember from my previous post that RULES are processes right before PERMISSIONS.) This is just as well since RULES are actually quite a bit more powerful than PERMISSIONS.

So on RHEL5 and OEL5 - in order to conform to Linux Best Practices - we now have to set correct RAC file permissions using udev RULES. To get started, we need to review how RULES work. The udev manual page gives a good overview of rules processing. But of course there are plenty of great tutorials that go deeper if you’re looking for more.
Read more

Well it’s been awhile since I’ve written anything for the blog - during the past four months I went on a trip to Asia, celebrated Thanksgiving and Christmas with both my family and my girlfriend’s family and then in January I got engaged! I’ve also been working on some continuing educational goals - so needless to say I’ve been keeping busy. And I will continue to be very busy over the next few months of wedding planning so I probably won’t be writing too much.

This post is just another interesting case study from a customer I’m working with right now. We were looking at various queues in the I/O stream and wondering how we might be able to tweak them. I was mainly investigating the host HBA’s - but in the process of digging into this I also learned about few other queues and Linux internals in general as it relates to Oracle.
Read more

So I’ve really been digging Kevin Closson’s blog lately. Back at the beginning of this month he had another post that caught my attention about running Oracle on Opteron in which he made the point that these boxes should always be run in NUMA mode (not SUMA). This grabbed my eye because I’ve been delving a bit deeper than usual into CPU issues recently. In particular, on both of my past two tuning engagements, we’ve looked pretty closely at CPU utilizations. At the first we wanted to see if Oracle was effectively utilizing Hyper-Threading. At the second we were investigation high CPU wait events from the database. (Which turned out not to be CPU-related!) I worked up some quick scripts to help analyze the CPU patterns in both of these situation. But before I get into that - let me go on a quick tangent about what originally got me interested in this. :)
Read more

Wow - the last three weeks have been crazy! During the last week of May I was wrapping up the services paper and a few submissions for the UKOUG. And for the first two weeks of June I’ve been working on some performance problems for one of our clients in the Phoenix area. Nice weather but lots of work!

Turns out that the system I’m working on is running Oracle on NFS - one of Kevin Closson’s favorite causes. In fact he recently wrote a blog post on the topic of monitoring tools for this exact environment. He must have listed more than 50 tools and yet said “if you are using Oracle over NFS, there are a few network monitoring tools out there - I don’t like any of them.”

After last week I couldn’t agree more. Perhaps one of the most useful tools on linux for working with I/O is the iostat OS utility - but it’s entirely useless for NFS devices. However I really wanted to see exactly what the I/O patterns looked like - from the physical device perspective.

To make things more complicated, this server only has a single NIC which is being shared for everything. While I sympathize with Kevin’s argument that NFS keeps things simple, I’m not sure I could recommend this configuration… it can make things a little tricky to troubleshoot. Kevin has a script in his blog post that displays activity on an ethernet port similar to the way iostat monitors block devices - however in this case I need to only look at traffic to one particular IP address. So what’s a boy to do?
Read more

From the occasionally-useful-scripts library…

It’s like fuser but shows name of the process (args[0]). Needs lsof installed. I’ve used it on Linux and Solaris.

nap01:~$ cat jduser
#!/bin/sh
[ -n "$1" ] && [ -d "$1" ] || { echo "Usage: $0 [dir]"; exit; }
AWK=awk; [ "`uname`" = "SunOS" ] && AWK=nawk;
lsof +d $1|
   tail +2|
   sort -k9|
   $AWK '{ a=$9;
           if(lasta!=a){lasta=a;printf "\n"a":\n        "};
           system("printf \\"["$2"]`ps -p "$2" -o args|"\
                "tail +2|cut -f1 -d\\\\" \\\\"` \\"") }
         END{print "\n"}'
nap01:~$ ./jduser
Usage: ./jduser [dir]
nap01:~$ ./jduser /u04/oracle/oradata/jt10g

/u04/oracle/oradata/jt10g/control01.ctl:
        [7128]ora_dbw0_jt10g [7130]ora_lgwr_jt10g [7132]ora_ckpt_jt10g
/u04/oracle/oradata/jt10g/control02.ctl:
        [7128]ora_dbw0_jt10g [7130]ora_lgwr_jt10g [7132]ora_ckpt_jt10g
/u04/oracle/oradata/jt10g/control03.ctl:
        [7128]ora_dbw0_jt10g [7130]ora_lgwr_jt10g [7132]ora_ckpt_jt10g
/u04/oracle/oradata/jt10g/example01.dbf:
        [7128]ora_dbw0_jt10g [7130]ora_lgwr_jt10g [7134]ora_smon_jt10g
/u04/oracle/oradata/jt10g/redo01.log:
        [7130]ora_lgwr_jt10g
/u04/oracle/oradata/jt10g/redo02.log:
        [7130]ora_lgwr_jt10g
/u04/oracle/oradata/jt10g/redo03.log:
        [7130]ora_lgwr_jt10g
/u04/oracle/oradata/jt10g/sysaux01.dbf:
        [7128]ora_dbw0_jt10g [7130]ora_lgwr_jt10g [7134]ora_smon_jt10g
        [7140]ora_mmon_jt10g
/u04/oracle/oradata/jt10g/system01.dbf:
        [7128]ora_dbw0_jt10g [7130]ora_lgwr_jt10g [7132]ora_ckpt_jt10g
        [7134]ora_smon_jt10g [7136]ora_reco_jt10g [7138]ora_cjq0_jt10g
        [7140]ora_mmon_jt10g [12062]oraclejt10g
/u04/oracle/oradata/jt10g/temp01.dbf:
        [7128]ora_dbw0_jt10g [7130]ora_lgwr_jt10g [7134]ora_smon_jt10g
/u04/oracle/oradata/jt10g/undotbs01.dbf:
        [7128]ora_dbw0_jt10g [7130]ora_lgwr_jt10g [7134]ora_smon_jt10g
        [7140]ora_mmon_jt10g
/u04/oracle/oradata/jt10g/users01.dbf:
        [7128]ora_dbw0_jt10g [7130]ora_lgwr_jt10g [7134]ora_smon_jt10g
        [7140]ora_mmon_jt10g

And yes, all that escaping does make my head hurt.

On an unrelated note Padraig just pointed out this useful utility called OraSRP to process extended SQL trace files. Cool!

Well I’ve been incognito for the past two weeks or so because I’ve been finishing up a pretty detailed paper about Oracle Services. Finally finished up the first draft yesterday… it’s 16 pages in the IEEE Computer Society article LaTeX class - which doesn’t leave much whitespace! It’s a pretty comprehensive review of pretty much every aspect of services in Oracle databases.

Two Kinds of Configuration Data Stored in the OCR

Anyway tonight I was doing a little digging with the clusterware API and I thought I’d post my discoveries so far. The whole reason for this was that I’m trying to investigate the contents of the OCR. Even Julian Dyke and Steve Shaw’s book Pro Oracle Database 10g RAC on Linux seems to skip over the fact that there are in fact two different sets of data stored in the OCR. The spuriously-named OCRDUMP utility only dumps out half the contents. It seems that CRS_STAT -F will dump the other half.
Read more

Was just perusing Sergio Leunissen’s blog this morning and a couple of his recent posts caught my attention:

First off, as someone who frequently installs Oracle on Linux, his post last month about the recently released RPM oracle-validated was great - can’t believe I’d missed that. Basically it’s an RPM that makes sure you have all the required OS packages for the Oracle RDBMS and even sets kernel parameters and creates an oracle user and dba/oinstall groups. Sergio has a nice demo in his post; I’m going to have to start using this!

Secondly, Sergio linked to a great post from Wim back at the end of February about the differences between RedHat and OEL. The most important point: OEL is not a fork. In fact I didn’t realize this was available but there’s even a short PDF which lists every single package that’s different - and what the differences are. Have a look; it’s pretty much just logo and branding changes. In short, OEL is like Centos or White Box with real Oracle Corp support.

If you’re comparing OEL and RedHat there are really only two things to compare: (1) availability of OS for “proof of concept” or development systems - OEL lets you easily DL and run as many copies as you want for free like Centos and White Box while RedHat doesn’t and (2) support organizations - do you think that Red Hat or Oracle will do a better job of supporting your Linux operating system when you do decide to go into production and purchase support for it (and honestly that seems to be the bigger question in my mind).

Another thing to keep in mind is that Oracle will also support RedHat installations; you can even update your RedHat system to point to Oracle’s servers for new updates rather than RedHat’s servers. (Then for example you can automatically pick up packages like oracle-validated and ocfs.)

keep looking »