Nagios plugin check_remote_oracle
New version 0.3 to fetch real undo usage (“UNEXPIRED SEGMENTS”) !!!
We picked up the nagios plugin check_oracle and made some upgrades so it can monitor remote databases with no need of installing a nagios agent on the Oracle Database Server or the Oracle client on the nagios server.
The only requisite is having an ssh daemon running on the oracle server and copy the nagios user ssh key to that oracle server so nagios can connect through ssh with no password.
The first thing we change is the way to get the ORACLE_HOME environment variable from the oratab file, just because… IT DIDN’T WORKED!
When an oratab contained commented lines (“#”) the logic of the script produced a messed out output, so, firstly, we make the code to ignore the lines starting by a comment “^#” from the oratab file, and then it works (locally).
Then, we implemented the solution to ssh the server and get the ORACLE_HOME remotelly.
Here’s the original code from the original check_oracle plugin:
# Hunt down a reasonable ORACLE_HOME if [ -z "$ORACLE_HOME" ] ; then # Adjust to taste for oratab in /var/opt/oracle/oratab /etc/oratab do [ ! -f $oratab ] && continue ORACLE_HOME=`IFS=: while read SID ORACLE_HOME junk; do if [ "$SID" = "$2" -o "$SID" = "*" ] ; then echo $ORACLE_HOME; exit; fi; done < $oratab` [ -n "$ORACLE_HOME" ] && break done fi # Last resort [ -z "$ORACLE_HOME" -a -d $PROGPATH/oracle ] && ORACLE_HOME=$PROGPATH/oracle if [ "$cmd" != "--db" ]; then if [ -z "$ORACLE_HOME" -o ! -d "$ORACLE_HOME" ] ; then echo "Cannot determine ORACLE_HOME for sid $2" exit $STATE_UNKNOWN fi fi
And here the code modified to ignore commented lines and parse a remote oratab with ssh:
# Hunt down a reasonable ORACLE_HOME GET_ORACLE_HOME='if [ -z "$ORACLE_HOME" ] ; then # Adjust to taste for oratab in /var/opt/oracle/oratab /etc/oratab do [ ! -f $oratab ] && continue ORACLE_HOME=`IFS=: while read SID ORACLE_HOME junk; do if [ "$SID" = "$ORACLE_SID" -o "$SID" = "*" ] ; then echo $ORACLE_HOME; exit; fi; done << EOF $(grep -v ^# $oratab) EOF` [ -n "$ORACLE_HOME" ] && echo $ORACLE_HOME ; break echo $ORACLE_HOME done fi' ORACLE_HOME=`ssh $ORACLE_OSUSER@$REMOTE_SERVER -x "ORACLE_SID=${2} ; ${GET_ORACLE_HOME}"` if [ "$cmd" != "--db" ]; then if [ -z "$ORACLE_HOME" -o ! `ssh $ORACLE_OSUSER@$REMOTE_SERVER -x "ls -d ${ORACLE_HOME}"` ] ; then echo "Cannot determine ORACLE_HOME for sid $2" exit $STATE_UNKNOWN fi fi
Now, we have to get these parameters we've introduced on the code, for instance, $ORACLE_OSUSER and $REMOTE_SERVER:
REMOTE_SERVER=$2 && shift 2 if [ -z REMOTE_SERVER ] then echo "No remote server specified!!!" exit $STATE_UNKNOWN fi # Parametres: Si $3="-U" llavors $4 es el OracleOSUser, si no, no. if [ "${1}" = "-U" ] then ORACLE_OSUSER=$2 && shift 2 else ORACLE_OSUSER="oracle" fi
Now we've got all the parameters we need to connect through ssh to a remote Oracle database without having to install the Oracle client locally.
So we just keep introducing the ssh command to the checks passed to the database.
This is the listener check command:
case "$cmd" in --tns) tnschk=`ssh $ORACLE_OSUSER@$REMOTE_SERVER -x "ORACLE_HOME=${ORACLE_HOME};PATH=$PATH:$ORACLE_HOME/bin;LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib;export ORACLE_HOME PATH LD_LIBRARY_PATH; tnsping $2"` tnschk2=` echo $tnschk | grep -c OK` if [ ${tnschk2} -eq 1 ] ; then tnschk3=` echo $tnschk | sed -e 's/.*(//' -e 's/).*//'` echo "OK - reply time ${tnschk3} from $2" exit $STATE_OK else echo "No TNS Listener on $2" exit $STATE_CRITICAL fi ;;
This one, is the pmon database check command:
--db) pmonchk=`ssh $ORACLE_OSUSER@$REMOTE_SERVER -x "ps -ef | grep -v grep | grep -c ora_pmon_${2}"` if [ ${pmonchk} -ge 1 ] ; then echo "${2} OK - ${pmonchk} PMON process(es) running" exit $STATE_OK else echo "${2} Database is DOWN" exit $STATE_CRITICAL fi ;;
Next, for the rest of the options, that involved querying the database, we had to develop a function "connect_to_server" in order to be able to escape the "$" sign from the V$_ views on the ssh command:
connect_to_server(){ ssh $ORACLE_OSUSER@$REMOTE_SERVER -x "ORACLE_HOME=${ORACLE_HOME};PATH=$PATH:$ORACLE_HOME/bin;LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib;ORACLE_SID=${ORACLE_SID} ;export ORACLE_HOME ORACLE_SID PATH LD_LIBRARY_PATH;$GET_RESU LT" exit }
So, the DB caches check ends up like this:
--cache) if [ "$#" -ne 4 ] ; then echo "Wrong Parameter Count!!!" exit fi ORACLE_SID=${2} if [ ${3} -gt ${4} ] ; then echo "UNKNOWN - Warning level is less than Critical" exit $STATE_UNKNOWN fi connect_to_server(){ ssh $ORACLE_OSUSER@$REMOTE_SERVER -x "ORACLE_HOME=${ORACLE_HOME};PATH=$PATH:$ORACLE_HOME/bin;LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib;ORACLE_SID=${ORACLE_SID} ;export ORACLE_HOME ORACLE_SID PATH LD_LIBRARY_PATH;$GET_RESU LT" exit } GET_RESULT="sqlplus -s / as sysdba << EOF set pagesize 0 set numf '9999999.99' select (1-(pr.value/(dbg.value+cg.value)))*100 from v\\\$sysstat pr, v\\\$sysstat dbg, v\\\$sysstat cg where pr.name='physical reads' and dbg.name='db block gets' and cg.name='consistent gets' / EOF" result=`connect_to_server`; if [ -n "`echo $result | grep ORA-`" ] ; then error=` echo "$result" | grep "ORA-" | head -1` echo "CRITICAL - $error" exit $STATE_CRITICAL fi buf_hr=`echo "$result" | awk '/^[0-9\. \t]+$/ {print int($1)}'` buf_hrx=`echo "$result" | awk '/^[0-9\. \t]+$/ {print $1}'` GET_RESULT="sqlplus -s / as sysdba << EOF set pagesize 0 set numf '9999999.99' select sum(lc.pins)/(sum(lc.pins)+sum(lc.reloads))*100 from v\\\$librarycache lc; EOF" result=`connect_to_server`; if [ -n "`echo $result | grep ORA-`" ] ; then error=` echo "$result" | grep "ORA-" | head -1` echo "CRITICAL - $error" exit $STATE_CRITICAL fi lib_hr=`echo "$result" | awk '/^[0-9\. \t]+$/ {print int($1)}'` lib_hrx=`echo "$result" | awk '/^[0-9\. \t]+$/ {print $1}'` if [ $buf_hr -le ${3} -o $lib_hr -le ${3} ] ; then echo "${2} CRITICAL - Cache Hit Rates: $lib_hrx% Lib -- $buf_hrx% Buff|lib=$lib_hrx%;${4};${3};0;100 buffer=$buf_hrx%;${4};${3};0;100" exit $STATE_CRITICAL fi if [ $buf_hr -le ${4} -o $lib_hr -le ${4} ] ; then echo "${2} WARNING - Cache Hit Rates: $lib_hrx% Lib -- $buf_hrx% Buff|lib=$lib_hrx%;${4};${3};0;100 buffer=$buf_hrx%;${4};${3};0;100" exit $STATE_WARNING fi echo "${2} OK - Cache Hit Rates: $lib_hrx% Lib -- $buf_hrx% Buff|lib=$lib_hrx%;${4};${3};0;100 buffer=$buf_hrx%;${4};${3};0;100" exit $STATE_OK ;;
Finally, here the tablespaces check:
--tablespace) if [ "$#" -ne 5 ] ; then echo "Wrong Parameter Count!!!" exit fi if [ ${4} -lt ${5} ] ; then echo "UNKNOWN - Warning level is more than Critical" exit $STATE_UNKNOWN fi ORACLE_SID=$2 connect_to_server(){ ssh $ORACLE_OSUSER@$REMOTE_SERVER -x " ORACLE_HOME=${ORACLE_HOME};PATH=$PATH:$ORACLE_HOME/bin;LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib;ORACLE_SID=${ORACLE_SID} ;export ORACLE_HOME ORACLE_SID PATH LD_LIBRARY_PATH;$GET_RESULT" exit } GET_RESULT="sqlplus -s / as sysdba << EOF set pagesize 0 set numf '9999999.99' select NVL(b.free,0.0),a.total,100 - trunc(NVL(b.free,0.0)/a.total * 1000) / 10 prc from ( select tablespace_name,sum(bytes)/1024/1024 total from dba_data_files group by tablespace_name) A LEFT OUTER JOIN ( select tablespace_name,sum(bytes)/1024/1024 free from dba_free_space group by tablespace_name) B ON a.tablespace_name=b.tablespace_name WHERE a.tablespace_name='${3}'; EOF" result=`connect_to_server`; if [ -n "`echo $result | grep ORA-`" ] ; then error=` echo "$result" | grep "ORA-" | head -1` echo "CRITICAL - $error" exit $STATE_CRITICAL fi ts_free=`echo "$result" | awk '/^[ 0-9\.\t ]+$/ {print int($1)}'` ts_total=`echo "$result" | awk '/^[ 0-9\.\t ]+$/ {print int($2)}'` ts_pct=`echo "$result" | awk '/^[ 0-9\.\t ]+$/ {print int($3)}'` ts_pctx=`echo "$result" | awk '/^[ 0-9\.\t ]+$/ {print $3}'` if [ "$ts_free" -eq 0 -a "$ts_total" -eq 0 -a "$ts_pct" -eq 0 ] ; then echo "No data returned by Oracle - tablespace $5 not found?" exit $STATE_UNKNOWN fi if [ "$ts_pct" -ge ${4} ] ; then echo "${2} : ${3} CRITICAL - $ts_pctx% used [ $ts_free / $ts_total MB available ]|${3}=$ts_pctx%;${5};${4};0;100" exit $STATE_CRITICAL fi if [ "$ts_pct" -ge ${5} ] ; then echo "${2} : ${3} WARNING - $ts_pctx% used [ $ts_free / $ts_total MB available ]|${3}=$ts_pctx%;${5};${4};0;100" exit $STATE_WARNING fi echo "${2} : ${3} OK - $ts_pctx% used [ $ts_free / $ts_total MB available ]|${3}=$ts_pctx%;${5};${4};0;100" exit $STATE_OK ;;
And here the help of the plugin:
su - nagios -c "/usr/local/nagios/libexec/check_remote_oracle --help" check_remote_oracle v0.1 (nagios-plugins 1.4.15) The nagios plugins come with ABSOLUTELY NO WARRANTY. You may redistribute copies of the plugins under the terms of the GNU General Public License. For more information about these matters, see the file named COPYING. Usage: check_remote_oracle -H[-U Oracle OS User] --tns check_remote_oracle -H [-U Oracle OS User] --db check_remote_oracle -H [-U Oracle OS User] --login check_remote_oracle -H [-U Oracle OS User] --cache check_remote_oracle -H [-U Oracle OS User] --tablespace check_remote_oracle --help check_remote_oracle --version Check Oracle status --tns SID/IP Address Check remote TNS server --db SID Check local database (search /bin/ps for PMON process) --login SID Attempt a dummy login and alert if not ORA-01017: invalid username/password --cache Check local database for library and buffer cache hit ratios --tablespace Check local database for tablespace capacity in ORACLE_SID --help Print this help screen --version Print version and license information If the plugin doesn't work, check that the ORACLE_HOME environment variable is set, that ORACLE_HOME/bin is in your PATH, and the tnsnames.ora file is locatable and is properly configured on your Oracle server. If you want to use a default Oracle home, add in your oratab file: *:/opt/app/oracle/product/7.3.4:N Send email to nagios-users@lists.sourceforge.net if you have questions regarding use of this software. To submit patches or suggest improvements, send email to nagiosplug-devel@lists.sourceforge.net. Please include version information with all correspondence (when possible, use output from the --version option of the plugin itself).
Enjoy it!
You can download the plugin, here.
(New version 0.2 with active sessions count, ASM diskgroups and hard/soft parsing percentage ratios check)