{"id":870,"date":"2012-06-09T10:55:57","date_gmt":"2012-06-09T17:55:57","guid":{"rendered":"http:\/\/chriscarey.com\/wordpress\/?p=870"},"modified":"2015-09-27T14:48:56","modified_gmt":"2015-09-27T22:48:56","slug":"how-to-prevent-multple-check_nrpe-socket-timeout-after-10-seconds-alerts","status":"publish","type":"post","link":"https:\/\/chriscarey.com\/blog\/2012\/06\/09\/how-to-prevent-multple-check_nrpe-socket-timeout-after-10-seconds-alerts\/","title":{"rendered":"How to prevent multple &#8220;CHECK_NRPE: Socket timeout after 10 seconds&#8221; alerts"},"content":{"rendered":"<p><img decoding=\"async\" src=\"\/\/chriscarey.com\/projects\/ajax-monitor-for-nagios\/ajax-monitor-1.png\" alt=\"Nagios Monitor\" style=\"width:250px;float:right\" \/><\/p>\n<p>In server monitoring with Nagios, nobody likes to get paged any more than necessary. This article will show you how to prevent multiple &#8220;<strong>CHECK_NRPE: Socket timeout after 10 seconds<\/strong>&#8221; alerts every time a host goes down.<\/p>\n<p>In this circumstance, I&#8217;m not trying to get NRPE working. I&#8217;m trying to <strong>shut it up<\/strong> when there is an outage.<\/p>\n<p><!--more--><\/p>\n<p>Every time a host goes down due to network issues, Nagios alerts a &#8220;host down&#8221; alert, which pages my phone. Nagios also sends one &#8220;CHECK_NRPE: Socket timeout after 10 seconds&#8221; alert for each remote service check on the box (check hard drive space, check processes, check zombies, check apt, check load, etc).<\/p>\n<p>If I&#8217;m watching 10 services on a single box, <strong>this is a total of 11 pages on my phone<\/strong> when that box is down. When you are monitoring a lot of systems, this can get out of control very fast.<\/p>\n<p>What would be better is when a host goes down, the ping host check on nagios alerts me the host is down, and the multiple other nrpe checks on the box are not alerting. I know the box is down, so I expect that I&#8217;m not going to be able to do those other checks.<\/p>\n<p><strong>The solution<\/strong><\/p>\n<p>In <strong>generic-service.cfg<\/strong> where we configure our service templates, create a new service template (copying generic-service if you want). Call it nrpe-service.<\/p>\n<p>In the new nrpe-service template <strong>notification_options<\/strong>, remove the u to prevent notifying on &#8220;UNKNOWN&#8221;<\/p>\n<div class=\"codecolorer-container text blackboard\" style=\"overflow:auto;white-space:nowrap;width:565px;\"><div class=\"text codecolorer\">notification_options &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;w,c,r<\/div><\/div>\n<p>On each of your services that use nrpe, change them to &#8220;use nrpe-service&#8221;:<\/p>\n<div class=\"codecolorer-container text blackboard\" style=\"overflow:auto;white-space:nowrap;width:565px;\"><div class=\"text codecolorer\">define service {<br \/>\n&nbsp; host_name ipv4.securespot.net<br \/>\n&nbsp; check_command &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; check_nrpe_1arg!check_disk<br \/>\n&nbsp; use &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; nrpe-service<br \/>\n&nbsp; service_description &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Check Disk<br \/>\n}<\/div><\/div>\n<p>Then we need to modify the call to check_nrpe and add a -u option. This is from the check_nrpe help page:<br \/>\n-u         = Make socket timeouts return an UNKNOWN state instead of CRITICAL<\/p>\n<p>On my box this is found in  <strong>\/etc\/nagios-plugins\/config\/check_nrpe.cfg<\/strong>:<\/p>\n<div class=\"codecolorer-container text blackboard\" style=\"overflow:auto;white-space:nowrap;width:565px;\"><div class=\"text codecolorer\">define command {<br \/>\n&nbsp; command_name &nbsp;check_nrpe<br \/>\n&nbsp; command_line &nbsp;\/usr\/lib\/nagios\/plugins\/check_nrpe -u -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$<br \/>\n}<\/div><\/div>\n<p>So we have told Nagios that for these types of services, we do not want to alert on UNKNOWN states. Then we told check_nrpe to make socket timeouts return an UNKNOWN state instead of CRITICAL.<\/p>\n<p>Here is a reference for those nagios options<br \/>\n<a href=\"http:\/\/nagios.sourceforge.net\/docs\/3_0\/objectdefinitions.html\" title=\"Nagios Object Definitions\" target=\"_blank\">http:\/\/nagios.sourceforge.net\/docs\/3_0\/objectdefinitions.html<\/a><\/p>\n<p>Now when a host goes down, only 1 alert for the ping host check only!<\/p>\n<p>Good luck and happy monitoring!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In server monitoring with Nagios, nobody likes to get paged any more than necessary. This article will show you how to prevent multiple &#8220;CHECK_NRPE: Socket timeout after 10 seconds&#8221; alerts every time a host goes down. In this circumstance, I&#8217;m not trying to get NRPE working. I&#8217;m trying to shut it up when there is [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[41,28],"tags":[57,42],"class_list":["post-870","post","type-post","status-publish","format-standard","hentry","category-nagios","category-networking","tag-nagios","tag-nrpe"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/prpYG-e2","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/chriscarey.com\/blog\/wp-json\/wp\/v2\/posts\/870","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/chriscarey.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/chriscarey.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/chriscarey.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/chriscarey.com\/blog\/wp-json\/wp\/v2\/comments?post=870"}],"version-history":[{"count":40,"href":"https:\/\/chriscarey.com\/blog\/wp-json\/wp\/v2\/posts\/870\/revisions"}],"predecessor-version":[{"id":1613,"href":"https:\/\/chriscarey.com\/blog\/wp-json\/wp\/v2\/posts\/870\/revisions\/1613"}],"wp:attachment":[{"href":"https:\/\/chriscarey.com\/blog\/wp-json\/wp\/v2\/media?parent=870"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/chriscarey.com\/blog\/wp-json\/wp\/v2\/categories?post=870"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/chriscarey.com\/blog\/wp-json\/wp\/v2\/tags?post=870"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}