Jump to content

Offline Servers Monitor - XOR type case with heartbeat and lastcontact


Recommended Posts

This is hard to explain but I'll do my best! I've a case where I want an Internal Monitor to check for LastContact with servers and alert if LastContact longer than 10 minutes.

All easy there but I *only* want to alert to fire if Heartbeat is still checking in ok.

i.e. Heartbeat = OK but LastContact = BAD.

I'm really stuggling with the logic and SQL part and hoped someone might be able to help. My case is that we've a Fortigate webfiltering issue at a few clients (which is totally random and can go for months with no issue) but when it happens, which is always on a weekend!, Heartbeat is ok and no offline alerts but LastContact stops. We end up finding out from client. I guess UDP vs TCP or something and we're working on solution there too. It would just mean for short term until we get a proper fix we can get an alert if and when the issue happens that is different to an normal offline server alert.

 

Link to post
Share on other sites

Thanks - I'm not that far ahead! I've attached monitor that does what I want - but I need to add a condition that says "ONLY if LastHeartbeatTime is within past 5 minutes" or similar. i.e. "Alert if lastcontact is more than 10 minutes but ONLY if heartbeat has also checked in in past few minutes".

Heartbeat is in another SQL table though and seems to be calculated by an update interval.

I think under Additional Condition I need something like heartbeatcomputers.LastHeartbeatTime SOMETHING.

Sorry it's a brain bender to explain!

lastcontact.PNG

Link to post
Share on other sites

The opposite can be done as a search:

image.thumb.png.b5d61035d6d2a9d1d08cdadeba153248.png

For some reason I think it couldn't be a monitor, don't recall now.  We use that to script a "restart the ltsvcmon service" script.

Edit: the * in the search allows raw SQL.

Edited by SteveYates
Link to post
Share on other sites

Thanks Steve. Could I just use an "run script" from the alerts section of the internal monitor to do that? i.e. if FAILURE run "restart ltsvcmon script".

Good to see that Automate has x2 services for redundancy and that it still needs something else done to actually fix it!

 

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...