At the company where I’m working right now, I’m part of an architecture effort to come up with our standard design for RAC on Linux across the firm. There will be dozens or possibly hundreds of deployments globally using the design we settle on.

We’re internally debating whether or not we should include OCFS2 in this design right now, and I’m curious if anyone has arguments one way or the other to share. Our standard design on Solaris does utilize a cluster filesystem and we would welcome a similar design, but there are some concerns about the readiness, stability and future of OCFS2.

OCFS2 is being considered for these four use cases:

  • database binaries (vs local files or NFS)
  • diag top (11g) or admin tree (10g) (vs local files or NFS)
  • archived logs
  • backups


Other files will be stored in ASM.

I have seen mention in blogs such as http://bigdaveroberts.wordpress.com/ of something called ASMFS in 11gR2 and I’m wondering – will this feature (if included) have any impact on Oracle’s commitment to OCFS2 development? Could Oracle conceivably develop a whole new cluster filesystem and put their full weight behind it as they did for ASM storage, leaving OCFS2 as a lower priority for new features and improvements? Has Oracle demonstrated significant commitment to OCFS2 development and support in the past, and is this a mature enough technology for wide-scale deployment?

Just looking for opinions. :)

This article isn’t directly database-related, but I think it’s a great software engineering topic so it seemed worth writing about. Right now I’m involved in a project that involves releasing software packages of a few different flavors. Some of them are other people’s software that we’re re-packaging (like oracle database binaries) and some are code that was completely written and is maintained in-house. And as we’ve been working through the software development process this question has recently come to mind: how should we assign version numbers?

Actually it’s part of a bigger question of how we develop elegant software – other related design issues in this project are clear end-to-end traceability (requirement -> source code version -> released binary package), solid change control, integrated testing framework and readiness for automation (build and test). But the one I’m interested in right now is this: how can I develop a robust version numbering strategy?

First of all, there really isn’t a one-size-fits-all “correct” version numbering scheme. The best way to choose version numbers is influenced by your software development philosophy (and resultant development lifecycle), project management approach and development team. (Globally distributed informal volunteer network or local engineers on payroll?) So the best solution for my current project might not be the best solution for my next project.

But lets take a closer look at this project. First, a little background on the art of version numbering. › Read more

Happy Thanksgiving to everyone in the US! And happy belated thanksgiving to everyone in Canada since you celebrated back at the beginning of October. :)

I’ve been doing a lot of scripting work lately. Although I can’t write about everything I’m doing, I would like to post a pattern that I thought could be useful to other folks. I needed to update some properties in the LISTENER.ORA file from a shell script. The problem is that basically you have to clobber it or properly parse it! Even though I initialized this listener file myself, there is always the possibility that a DBA could manually edit it – so I didn’t want to clobber the contents. That means that I had to parse it.

So how do you parse a listener file from a shell script? I came up with a nifty way to tokenize and parse the file through a simple awk script combined with sed – and it seemed rather useful so I refactored it a few times to apply to more general use cases. I think someone else could use something like this too so I’m posting it – do me a favor and keep the reference to ardentperf in there if you copy the code.

There’s two functions that are basically the same pattern: the first will read an attribute and the second will set one. I’ve included some examples at the bottom of the post. I’ve tried to make the awk code portable but so far I’ve only tested it with gawk and mawk on a few flavors of Linux.
› Read more

Before I started consulting, I was an Oracle engineer in a very large software development organization. The company had a number of major products and the one I worked with was used by hospitals and radiology offices world-wide. (These guys are one of the biggest companies worldwide in the field.) Our product included the hardware and software; you could order it with Sun or HP hardware (Solaris or Linux). It had an Oracle backend and a web-based middle tier built with lots of C++ and Java code.

Any software engineer who has worked on large projects – industry or community – can tell you the importance of solid change control processes. So… since everything for the build had to be checked into clearcase… yes, we checked Oracle 10g into the repo. A 2G tarball. And whenever there was a 1K patch to the oracle install? A brand new 2G tarball. The clearcase guys loved me.

And that was how I started thinking about an automated build process. I don’t need to check B24792-01_1of5.zip into the repository because it’s straight off edelivery. I can skip p7150622_10203_Linux-x86-64.zip since it’s direct from metalink. The only thing missing was a solid, simple, flexible program for automating the oracle install, patchset, CPUs and oneoffs – taking Oracle’s official bits as input.

Anyone else out there who could use a program like this? How about for rapid provisioning of servers? (Like all that grid buzz.) Grid Control does some basic stuff (if you can make it work) and there are advanced kits for the big data centers – but there have to be more people than just me who would love to have this program.

Proposing orainstall

That’s why I’m proposing to write a bit of software called orainstall. It would be script-based and cross-platform and intended for community use. I have some ideas for a design but I’m really interested to hear how you might use something like this and what features might be useful to you. Do you think it’s a good design? I’d also be interested in hearing if you’d be interested to use it or to help test it.

The program would be launched by running › Read more

Is asmlib obsolete on a modern Linux system? I’m still undecided but starting to lean toward “yes”.

Everybody knows that asmlib was very useful when it was first introduced with Oracle 10.1 to simplify a host of issues on Linux: direct async device access without raw devices, file permissions & ownership without custom code, and persistent device naming without devlabel.

But I’m now involved in setting some standards to be used across a large organization for Oracle 10.2 on RHEL5 and I’m wondering if there’s still a case for using asmlib. So I did a little trolling for info – which was suprisingly sparse. Had a hard time finding much, but after a lot of digging I think I’ve compiled a useful bit of information about benefits and drawbacks.

Benefits

It seems to me that the ASMLIB API was originally introduced to do more than just simplify file permissions – sounds like it was an alternative I/O API to the standard unix one, allowing ASM to access the underlying storage more efficiently and completely. I don’t think it’s just an “extra layer” – it’s an alternative code path to the std unix I/O libs. Like an ODM for block devices – and the idea was that there could be additional vendor implementations. And Oracle released an initial generic implementation on Linux under the GPL.
› Read more

So Collaborate is over and I’m back in Chicago… home sweet home. I thoroughly enjoyed the week in Denver, in spite of the snow! Thursday, the last day, was especially fun.

First was a panel debate “To RAC or Not To RAC: What’s Best for HA.” Dan Norris invited me to participate in this panel along with Alex Gorbachev (Pythian), Neil Greene (Predictive Technologies) and Matt Zito (GridApp) – I certainly felt privileged to take part. Here are a few of the things that I took away from the debate… feel free to discuss:
› Read more

Just a quick post to say that I’ve uploaded the slides from my services presentation at Collaborate and you can find them over on the publications page. Thanks to everyone who attended!! Great questions and comments throughout the session. Next time I’ll try to get through everything faster so that there’s more time for Q&A!

Next week sometime I expect to upload the instructions from today’s Hands-on Lab (11g RAC on VMware with ASM and OEL5). I want to clean it up a bit first. For anyone who didn’t hear the story, I heard on Sunday evening that they had a room with about 50 computers which was going to sit empty during a few timeslots. And to me that seemed like a tragedy – how often do I wish I could have a chance to try something new with a bit of guidance and without worrying about hosing my laptop?! So I decided to write a hands-on lab for 11gRAC/VMware. I pretty much spent all day yesterday putting it together but it came out great!

Also I’m thinking about repeating that RAC/VMware lab around Chicago sometime… is there anyone around the Chicago area who might be interested in something like this?

Guess I should also mention that I’ve been rather enjoying myself at Collaborate so far too! (Even though I spent pretty much all day yesterday making that Hands-on Lab…) Got a chance to meet Peter Scott Monday night – that was fun! Somehow he spotted my name badge and then I got to finally put a face to someone I’d only known as a blogger. :)

About a month ago I wrote an overview of Linux Caching and I/O Queues as they pertain to Oracle. I was working on a project to architect, install and configure the beginnings of an 8-node cluster consisting of either one or two RAC databases. During the project, while I was waiting for the OS guys to resolve some networking issues, I ran a bunch of benchmarks on the storage subsystem. Specifically, I experimented with the size of the HBA Queue Depth to see if it would make a difference in performance.

But before getting into the results, a quick overview of our configuration: it was 11g RAC on Red Hat Enterprise Linux 5; Dell servers with four dual-core Opteron chips each. The RAC cluster initially had four nodes but will grow to at least eight as the data is migrated. The system has 4G QLogic cards, a McData switch and a 3Par SAN (which is blazing fast). ASM (no CFS) and dedicated Oracle Homes. The first spec had an InfiniBand interconnect but after a teleconference with Alex from Pythian discussing the project’s specific requirements, the spec was updated to use redundant Gigabit Ethernet.

Picking up where I left off: the default limit set by the Linux qla2xxx driver for concurrent I/O requests on QLogic cards (32 per LUN) is conservative. So can I increase performance by increasing this limit? The best way to answer a question like this is simply to try it.
› Read more

The trouble with Linux? No… the trouble with computers in general – is that they keep changing! Solaris 10 comes out, Oracle 11g, Red Hat 5… and everything works different!! It’s a full-time job just trying to keep up with everything.

Almost exactly one year ago I wrote about using udev on 2.6 kernels to set the proper permissions for Oracle RAC. Two weeks after that post (March 14) Red Hat Enterprise Linux 5 was released and changed everything.

In my original post, I demonstrated how to create a PERMISSIONS file that udev would use when creating the device nodes. This worked on RHEL4 and SLES9. However this week I’ve been helping a client deploy 11g RAC on a RHEL5-based cluster – and I remembered that the PERMISSIONS facility was removed from udev in RH5. Seems like I remember reading something about having a single source of configuration for udev, which makes sense… so maybe they picked the RULES. (You’ll remember from my previous post that RULES are processes right before PERMISSIONS.) This is just as well since RULES are actually quite a bit more powerful than PERMISSIONS.

So on RHEL5 and OEL5 – in order to conform to Linux Best Practices – we now have to set correct RAC file permissions using udev RULES. To get started, we need to review how RULES work. The udev manual page gives a good overview of rules processing. But of course there are plenty of great tutorials that go deeper if you’re looking for more.
› Read more

Well it’s been awhile since I’ve written anything for the blog – during the past four months I went on a trip to Asia, celebrated Thanksgiving and Christmas with both my family and my girlfriend’s family and then in January I got engaged! I’ve also been working on some continuing educational goals – so needless to say I’ve been keeping busy. And I will continue to be very busy over the next few months of wedding planning so I probably won’t be writing too much.

This post is just another interesting case study from a customer I’m working with right now. We were looking at various queues in the I/O stream and wondering how we might be able to tweak them. I was mainly investigating the host HBA’s – but in the process of digging into this I also learned about few other queues and Linux internals in general as it relates to Oracle.
› Read more

keep looking »