Nagios plugin check_remote_oracle

New version 0.3 to fetch real undo usage (“UNEXPIRED SEGMENTS”) !!!

We picked up the nagios plugin check_oracle and made some upgrades so it can monitor remote databases with no need of installing a nagios agent on the Oracle Database Server or the Oracle client on the nagios server.

The only requisite is having an ssh daemon running on the oracle server and copy the nagios user ssh key to that oracle server so nagios can connect through ssh with no password.

The first thing we change is the way to get the ORACLE_HOME environment variable from the oratab file, just because… IT DIDN’T WORKED!

When an oratab contained commented lines (“#”) the logic of the script produced a messed out output, so, firstly, we make the code to ignore the lines starting by a comment “^#” from the oratab file, and then it works (locally).

Then, we implemented the solution to ssh the server and get the ORACLE_HOME remotelly.

Here’s the original code from the original check_oracle plugin:

# Hunt down a reasonable ORACLE_HOME
if [ -z "$ORACLE_HOME" ] ; then
        # Adjust to taste
        for oratab in /var/opt/oracle/oratab /etc/oratab
        do
        [ ! -f $oratab ] && continue
        ORACLE_HOME=`IFS=:
                while read SID ORACLE_HOME junk;
                do
                        if [ "$SID" = "$2" -o "$SID" = "*" ] ; then
                                echo $ORACLE_HOME;
                                exit;
                        fi;
                done < $oratab`
        [ -n "$ORACLE_HOME" ] && break
        done
fi
# Last resort
[ -z "$ORACLE_HOME" -a -d $PROGPATH/oracle ] && ORACLE_HOME=$PROGPATH/oracle

if [ "$cmd" != "--db" ]; then
        if [ -z "$ORACLE_HOME" -o ! -d "$ORACLE_HOME" ] ; then
                echo "Cannot determine ORACLE_HOME for sid $2"
                exit $STATE_UNKNOWN
        fi
fi

And here the code modified to ignore commented lines and parse a remote oratab with ssh:

# Hunt down a reasonable ORACLE_HOME
GET_ORACLE_HOME='if [ -z "$ORACLE_HOME" ] ; then
        # Adjust to taste
        for oratab in /var/opt/oracle/oratab /etc/oratab
        do
        [ ! -f $oratab ] && continue
        ORACLE_HOME=`IFS=:
                while read SID ORACLE_HOME junk;
                do
                        if [ "$SID" = "$ORACLE_SID" -o "$SID" = "*" ] ; then
                                echo $ORACLE_HOME;
                                exit;
                        fi;
                done << EOF
$(grep -v ^# $oratab)
EOF`
        [ -n "$ORACLE_HOME" ] && echo $ORACLE_HOME ; break
        echo $ORACLE_HOME
        done
fi'
ORACLE_HOME=`ssh $ORACLE_OSUSER@$REMOTE_SERVER -x "ORACLE_SID=${2} ; ${GET_ORACLE_HOME}"`

if [ "$cmd" != "--db" ]; then
        if [ -z "$ORACLE_HOME" -o ! `ssh $ORACLE_OSUSER@$REMOTE_SERVER -x "ls -d ${ORACLE_HOME}"` ] ; then
                echo "Cannot determine ORACLE_HOME for sid $2"
                exit $STATE_UNKNOWN
        fi
fi

Now, we have to get these parameters we've introduced on the code, for instance, $ORACLE_OSUSER and $REMOTE_SERVER:

REMOTE_SERVER=$2 && shift 2
if [ -z REMOTE_SERVER ]
then
        echo "No remote server specified!!!"
        exit $STATE_UNKNOWN
fi

# Parametres: Si $3="-U" llavors $4 es el OracleOSUser, si no, no.

if [ "${1}" = "-U" ]
then
        ORACLE_OSUSER=$2 && shift 2
else
        ORACLE_OSUSER="oracle"
fi

Now we've got all the parameters we need to connect through ssh to a remote Oracle database without having to install the Oracle client locally.

So we just keep introducing the ssh command to the checks passed to the database.

This is the listener check command:

case "$cmd" in
--tns)
    tnschk=`ssh $ORACLE_OSUSER@$REMOTE_SERVER -x "ORACLE_HOME=${ORACLE_HOME};PATH=$PATH:$ORACLE_HOME/bin;LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib;export ORACLE_HOME PATH LD_LIBRARY_PATH; tnsping $2"`
    tnschk2=` echo  $tnschk | grep -c OK`
    if [ ${tnschk2} -eq 1 ] ; then
        tnschk3=` echo $tnschk | sed -e 's/.*(//' -e 's/).*//'`
        echo "OK - reply time ${tnschk3} from $2"
        exit $STATE_OK
    else
        echo "No TNS Listener on $2"
        exit $STATE_CRITICAL
    fi
    ;;

This one, is the pmon database check command:

--db)
    pmonchk=`ssh $ORACLE_OSUSER@$REMOTE_SERVER -x "ps -ef | grep -v grep | grep -c ora_pmon_${2}"`
    if [ ${pmonchk} -ge 1 ] ; then
        echo "${2} OK - ${pmonchk} PMON process(es) running"
        exit $STATE_OK
    else
        echo "${2} Database is DOWN"
        exit $STATE_CRITICAL
    fi
    ;;

Next, for the rest of the options, that involved querying the database, we had to develop a function "connect_to_server" in order to be able to escape the "$" sign from the V$_ views on the ssh command:

    connect_to_server(){
        ssh $ORACLE_OSUSER@$REMOTE_SERVER -x "ORACLE_HOME=${ORACLE_HOME};PATH=$PATH:$ORACLE_HOME/bin;LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib;ORACLE_SID=${ORACLE_SID} ;export ORACLE_HOME ORACLE_SID PATH LD_LIBRARY_PATH;$GET_RESU
LT"
        exit
}

So, the DB caches check ends up like this:

--cache)
    if [ "$#" -ne 4 ] ; then 
        echo "Wrong Parameter Count!!!"
        exit
    fi
ORACLE_SID=${2}

    if [ ${3} -gt ${4} ] ; then
        echo "UNKNOWN - Warning level is less than Critical"
        exit $STATE_UNKNOWN
    fi

    connect_to_server(){
        ssh $ORACLE_OSUSER@$REMOTE_SERVER -x "ORACLE_HOME=${ORACLE_HOME};PATH=$PATH:$ORACLE_HOME/bin;LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib;ORACLE_SID=${ORACLE_SID} ;export ORACLE_HOME ORACLE_SID PATH LD_LIBRARY_PATH;$GET_RESU
LT"
        exit
}

    GET_RESULT="sqlplus -s / as sysdba << EOF
set pagesize 0
set numf '9999999.99'
select (1-(pr.value/(dbg.value+cg.value)))*100
from v\\\$sysstat pr, v\\\$sysstat dbg, v\\\$sysstat cg
where pr.name='physical reads'
and dbg.name='db block gets'
and cg.name='consistent gets'
/
EOF"
    result=`connect_to_server`;

    if [ -n "`echo $result | grep ORA-`" ] ; then
      error=` echo "$result" | grep "ORA-" | head -1`
      echo "CRITICAL - $error"
      exit $STATE_CRITICAL
    fi

    buf_hr=`echo "$result" | awk '/^[0-9\. \t]+$/ {print int($1)}'` 
    buf_hrx=`echo "$result" | awk '/^[0-9\. \t]+$/ {print $1}'` 
    GET_RESULT="sqlplus -s / as sysdba << EOF
set pagesize 0
set numf '9999999.99'
select sum(lc.pins)/(sum(lc.pins)+sum(lc.reloads))*100
from v\\\$librarycache lc;
EOF"
        
    result=`connect_to_server`;

    if [ -n "`echo $result | grep ORA-`" ] ; then
      error=` echo "$result" | grep "ORA-" | head -1`
      echo "CRITICAL - $error"
      exit $STATE_CRITICAL
    fi

    lib_hr=`echo "$result" | awk '/^[0-9\. \t]+$/ {print int($1)}'`
    lib_hrx=`echo "$result" | awk '/^[0-9\. \t]+$/ {print $1}'`

    if [ $buf_hr -le ${3} -o $lib_hr -le ${3} ] ; then
        echo "${2} CRITICAL - Cache Hit Rates: $lib_hrx% Lib -- $buf_hrx% Buff|lib=$lib_hrx%;${4};${3};0;100 buffer=$buf_hrx%;${4};${3};0;100"
        exit $STATE_CRITICAL
    fi
    if [ $buf_hr -le ${4} -o $lib_hr -le ${4} ] ; then
        echo "${2} WARNING  - Cache Hit Rates: $lib_hrx% Lib -- $buf_hrx% Buff|lib=$lib_hrx%;${4};${3};0;100 buffer=$buf_hrx%;${4};${3};0;100"
        exit $STATE_WARNING
    fi
    echo "${2} OK - Cache Hit Rates: $lib_hrx% Lib -- $buf_hrx% Buff|lib=$lib_hrx%;${4};${3};0;100 buffer=$buf_hrx%;${4};${3};0;100"

    exit $STATE_OK
    ;;

Finally, here the tablespaces check:

--tablespace)
    if [ "$#" -ne 5 ] ; then 
        echo "Wrong Parameter Count!!!"
        exit
    fi
    if [ ${4} -lt ${5} ] ; then
        echo "UNKNOWN - Warning level is more than Critical"
        exit $STATE_UNKNOWN
    fi

    ORACLE_SID=$2

    connect_to_server(){
        ssh $ORACLE_OSUSER@$REMOTE_SERVER -x " ORACLE_HOME=${ORACLE_HOME};PATH=$PATH:$ORACLE_HOME/bin;LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib;ORACLE_SID=${ORACLE_SID} ;export ORACLE_HOME ORACLE_SID PATH LD_LIBRARY_PATH;$GET_RESULT"
        exit
}

    GET_RESULT="sqlplus -s / as sysdba << EOF
set pagesize 0
set numf '9999999.99'
select NVL(b.free,0.0),a.total,100 - trunc(NVL(b.free,0.0)/a.total * 1000) / 10 prc
from (
select tablespace_name,sum(bytes)/1024/1024 total
from dba_data_files group by tablespace_name) A
LEFT OUTER JOIN
( select tablespace_name,sum(bytes)/1024/1024 free
from dba_free_space group by tablespace_name) B
ON a.tablespace_name=b.tablespace_name WHERE a.tablespace_name='${3}';
EOF"

    result=`connect_to_server`;

    if [ -n "`echo $result | grep ORA-`" ] ; then
      error=` echo "$result" | grep "ORA-" | head -1`
      echo "CRITICAL - $error"
      exit $STATE_CRITICAL
    fi

    ts_free=`echo "$result" | awk '/^[ 0-9\.\t ]+$/ {print int($1)}'` 
    ts_total=`echo "$result" | awk '/^[ 0-9\.\t ]+$/ {print int($2)}'` 
    ts_pct=`echo "$result" | awk '/^[ 0-9\.\t ]+$/ {print int($3)}'` 
    ts_pctx=`echo "$result" | awk '/^[ 0-9\.\t ]+$/ {print $3}'` 
    if [ "$ts_free" -eq 0 -a "$ts_total" -eq 0 -a "$ts_pct" -eq 0 ] ; then
        echo "No data returned by Oracle - tablespace $5 not found?"
        exit $STATE_UNKNOWN
    fi
    if [ "$ts_pct" -ge ${4} ] ; then
        echo "${2} : ${3} CRITICAL - $ts_pctx% used [ $ts_free / $ts_total MB available ]|${3}=$ts_pctx%;${5};${4};0;100"
        exit $STATE_CRITICAL
    fi
    if [ "$ts_pct" -ge ${5} ] ; then
        echo "${2} : ${3} WARNING  - $ts_pctx% used [ $ts_free / $ts_total MB available ]|${3}=$ts_pctx%;${5};${4};0;100"
        exit $STATE_WARNING
    fi
    echo "${2} : ${3} OK - $ts_pctx% used [ $ts_free / $ts_total MB available ]|${3}=$ts_pctx%;${5};${4};0;100"
    exit $STATE_OK
    ;;

And here the help of the plugin:

su - nagios -c "/usr/local/nagios/libexec/check_remote_oracle --help"
check_remote_oracle v0.1 (nagios-plugins 1.4.15)
The nagios plugins come with ABSOLUTELY NO WARRANTY. You may redistribute
copies of the plugins under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.

Usage:
  check_remote_oracle -H  [-U Oracle OS User] --tns 
  check_remote_oracle -H  [-U Oracle OS User] --db 
  check_remote_oracle -H  [-U Oracle OS User] --login 
  check_remote_oracle -H  [-U Oracle OS User] --cache   
  check_remote_oracle -H  [-U Oracle OS User] --tablespace    
  check_remote_oracle --help
  check_remote_oracle --version

Check Oracle status

--tns SID/IP Address
   Check remote TNS server
--db SID
   Check local database (search /bin/ps for PMON process)
--login SID
   Attempt a dummy login and alert if not ORA-01017: invalid username/password
--cache
   Check local database for library and buffer cache hit ratios
--tablespace
   Check local database for tablespace capacity in ORACLE_SID
--help
   Print this help screen
--version
   Print version and license information

If the plugin doesn't work, check that the ORACLE_HOME environment
variable is set, that ORACLE_HOME/bin is in your PATH, and the
tnsnames.ora file is locatable and is properly configured on your Oracle server.

If you want to use a default Oracle home, add in your oratab file:
*:/opt/app/oracle/product/7.3.4:N

Send email to nagios-users@lists.sourceforge.net if you have questions
regarding use of this software. To submit patches or suggest improvements,
send email to nagiosplug-devel@lists.sourceforge.net.
Please include version information with all correspondence (when possible,
use output from the --version option of the plugin itself).

Enjoy it!

You can download the plugin, here.
(New version 0.2 with active sessions count, ASM diskgroups and hard/soft parsing percentage ratios check)

Leave a Reply

You must be logged in to post a comment.