Jump to content
timwiser

New tool for monitoring Dell hardware - Feedback required :)

Recommended Posts

After struggling with various approaches for monitoring Dell server hardware, I've ended up creating a tool that I think will go a long way to making things somewhat easier. It leverages the OMREPORT.EXE utility that comes when you install Dell OpenManage and allows direct querying of the status of your server.

 

Running the EXE on the command line with no parameters will hopefully show something like this:

 

C:\Windows\LTSvc>DellStatus.exe
No issues with virtual disk(s)
No issues with physical disk(s)
No issues with chassis
No issues with batteries
Result code 0

 

If you just want information on disks, you can specify /query:disks. The same applies to /query:batteries and /query:chassis.

If you don't care about controller firmware alerts then add /ignorecontroller:yes and if you're not bothered about non-genuine drives just add /ignorenongenuine:yes

 

The nice thing about the chassis monitoring is that it covers fans, intrusion detection, memory, power supplies, processors, temperatures, voltages, hardware log and batteries, so you get a lot of monitoring in one hit.

 

The disk monitoring will look for any disk that has a problem (isn't "Ready" or "Online") and/or has predictive failure flagged up. It also highlights controllers that are in a degraded state. Any disk that is marked with predictive failure will be listed with a * against its name in the output. The associated controller is also listed against a defective disk so you know where it's plugged into.

 

To get this into Labtech what I've done is create three remote monitors against physical Dell servers which run the EXE with a query parameter. If the EXE finds any problems then the result code will be 2. If there's any warnings (at the moment this means either OpenManage is not installed or a controller firmware is out of date) then the result code is 1. If all is working and there's no alerts, the result code will be 0. My monitor is therefore a state-based one that looks for "Result code 0" as success, and so forth.

 

If OpenManage is not installed into the default location (eg. a D drive) then this situation is handled accordingly.

 

If you're interested in trying this out then I'd love your feedback. We have precious few Dell server that have hardware problems (typical!) so PLEASE try this out on a server if you know it has problems, and send me your feedback so any improvements or gotchas can be nipped in the bud.

 

You can download it from the links below.

 

Pop me an email at tim@air-it.co.uk with your comments - it'd really be appreciated.

Edited by Guest
  • Like 1

Share this post


Link to post
Share on other sites

Would just like to say that we've been testing this this morning (if you're all not in the Slack you should be) and it works fantastically well.

 

You've pulled a blinder with this one mate! Thanks a lot.

Share this post


Link to post
Share on other sites

OK, so some more work has been done on this utility to make it more stable and configurable. It now detects non-genuine disks as well.

 

Download links are:

 

32 bit, https://1drv.ms/u/s!AppFBGO93g5zltM3f2Wxc7UKSebwLQ

64 bit, https://1drv.ms/u/s!AppFBGO93g5zltM4gMwlTVPuo6QUCQ

 

If you run it either interactively or via a remote monitor (which is what I do) you can pass the following parameters:

 

/query which tells it what to return information on. You can put disks, chassis or batteries, or a comma-separated combination. Default is all

/ignorecontrollers:yes which tells it to ignore controller degradation

/ignorenongenuine:yes which tells it to ignore the fact that a server has non-genuine disks installed

/resultonly:yes which tells it to just output a single string containing a result code (eg. "Result code 1")

 

Result codes are:

 

0 = No issues detected

1 = Warning issues detected (non-genuine disk(s), controller errors, OpenManage not installed)

2 = Critical issues detected

 

 

So some example command lines would be:

 

DellMonitoring64.exe /query:disks /ignorenongenuine:yes /ignorecontrollers:yes

DellMonitoring64.exe /query:chassis,batteries

 

This utility could be fired off either via a script that downloads the appropriate one for the architecture of the server it's running on and then shell-enhanced's the output, or via one or more state-based remote monitors. We have three - one for disks, one for chassis, one for batteries. The three states are listed above.

 

The GREAT thing about this utility is that it does not require the Windows SNMP service to be configured, or even present. You just need OpenManage to be installed on the server.

 

Please let me know if you find this useful - I'd love to hear from you.

  • Like 1

Share this post


Link to post
Share on other sites

Fantastic tool - thank you so very much. We've been railing against the lack of OOB management/alerting for servers (DELL, HP, IBM - that's it, Labtech...) for a long time, and finding the time to set up SNMP, MIBs OIDs etc has been difficult - this provides a really great solution until I get around to working out all of the rest. Likely, we'll end up using this regardless as it's so very easy to implement.

 

Excellent work. Again, thank you.

Share this post


Link to post
Share on other sites

Hey Tim, I run a hosted labtech server, what would be the best way to deploy this to my agents and what would a script look like to call this and run it on a scheduled basis

Share this post


Link to post
Share on other sites
Hey Tim, I run a hosted labtech server, what would be the best way to deploy this to my agents and what would a script look like to call this and run it on a scheduled basis

 

You would want to have a group that only contains dell servers, you can create an autojoin search like this that would sniff that out for you and autojoin those agents to the group e.g.:

AND
      [Computer.Hardware.Manufacturer] Is like %dell%
      [Computer.OS.IsServer] Is true
      [Computer.OS.Name] is like %windows%

 

Once you create the group and set up the search, you'll want a script that runs periodically to make sure machines in that group have the exe file so that your remote monitors can use it. Since there's 2 versions 32 and 64 bit, add some logic to determine what kind of server and download the correct version. Add the script to run via the group

WfAwqx0.png

 

I would also recommend an Internal monitor setup on that group to make sure Dell Openmanage is installed on the server.

 

Once that's set, you'll need to create remote monitors as part of the group, to run the application and parse the results. Go into the group > computers > remote monitors, and add one, Go through the wizard to 'monitor the results of an executable'. Make one for each of them and have it run something like:

 

"C:\Windows\LTSvc\scripts\DellMonitoring.exe" /query:batteries /resultonly:yes

 

Have it make sure the result contains 0 or else create a ticket, repeat for each /query type

Share this post


Link to post
Share on other sites

Thanks - will check that out. FWIW we took the Nagios plugin written in PERL and compiled it and it's mostly quite good. Rarely on clunky servers it will spit an error due to compilation problem, but provides a good overview with options to include / exclude things as required.

Share this post


Link to post
Share on other sites

Fantastic tool, I really appreciate you creating this. How can I see a listing of possible filters I can run?

 

I would like to have chassis monitoring but ignore intrusion detection. I tried to play around with something like this but haven't found the magic combination yet.

 

DellMonitoring /query:chassis /ignoreintrusion:yes

Share this post


Link to post
Share on other sites

Hi. There is not yet a way to ignore intrusion alerts. I would think that you could resolve this by clearing the logs or taking something in OpenManage though?

 

Tim.

Share this post


Link to post
Share on other sites

I'm a bit of a newbie when in comes to scripting and custom monitors. Would anybody be so kind as to provide a guide for me to follow to set this up with scripts, monitors, and alerts?

 

Levi

Share this post


Link to post
Share on other sites

Hello, I have been using your tool on a few servers with a lot of success. I am having one issue on a Dell T630 running Server 2012 R2.

 

DellMonitoring64.exe /query:disks /ignorenongenuine:yes /ignorecontrollers:yes /resultonly:yes

Controller 0, physical disk 0:1:0

Result code 2

 

I am getting a Result code of 2 after running the above command. However, when I look in OpenManage I do not see any errors on the disk or any errors in the logs. Can you please advise?

Share this post


Link to post
Share on other sites

Hmm, it might be the 'NON RAID' status throwing it. I will check the code on Monday and get back to you, poss with a new version. Thanks for the feedback!!

Share this post


Link to post
Share on other sites
Hmm, it might be the 'NON RAID' status throwing it. I will check the code on Monday and get back to you, poss with a new version. Thanks for the feedback!!

 

Thank you very much!

Share this post


Link to post
Share on other sites

A couple days ago I successfully implemented your Dell Monitoring tool within Automate/Labtech. Seeing results from a Custom Remote Monitor and Ticket Creation based on the results (assuming there was a reason to create a ticket) is a wonderfully accomplished feeling for an Automate/Labtech novice.

 

Not only is what you put together greatly useful, the implementation of it was a great exercise in the world of Automate/Labtech.

The information you posted about the tool, combined with dcomitini's post, lead to me completing the following:

 

Creating script to download utility from ltserver

Creating search for Dell Hardware running Windows Server OS

Creating autojoin group with created search

Scheduling script on autojoin group

Creating internal monitor

Targeting internal monitor to autojoin group and creating tickets for machines missing OMSA (Although I discovered later that with your utility, and the alert template I created, tickets were created due to OMSA from the RemMon) (Would also like to take this a step further by having a scrip tied to this that downloads and installs OMSA for me)

Creating remote monitors on a test machine

Copying remote monitors to autojoin group

Creating a State Based Alert template

Reviewing Ticket Data

 

Seeing this actually produce results is a great feeling and helps you learn how things are tied together. Thanks!

Share this post


Link to post
Share on other sites

Thanks man. Good to hear you have nailed the implementation. Hopefully your description will help the poster above in the thread.

Share this post


Link to post
Share on other sites

@dpltadmin

 

Here is the script I have pieced together that transfers Tim's tool over and also downloads/installs Dell Open Manage.

 

I have two Dell Groups - both with auto join searches.

One group is looking for Dell servers that do not have open manage (The script I posted runs on that group every hour)

My other Dell group looks for servers that do have open manage and I have Tim's .exe monitor applied to that group as a remote monitor.

 

MX - Dell Monitor Install.zip

Share this post


Link to post
Share on other sites

Anyone that is using this, how are you determining if a server does not have OpenMange installed? The name of OpenManage appears to be different across versions and operating systems. Are you searching based on a service or?

Share this post


Link to post
Share on other sites

Awesome thank you.

Do you know if there are any limitations with what version of OpenManage is installed on the server?

I have a hodgepodge of versions across the servers we manage and need to see if I need to move them all to a consistent version.

Share this post


Link to post
Share on other sites

Also if anyone can chime in on how they are setting up an internal monitor to check if there is Dell Open Manage installed.

I have a search that checks if a server is Dell, running Windows Server, and is missing Dell OpenManage but I am unsure how to convert that into an internal monitor.

Any help with this is appreciated.

Thank you.

Share this post


Link to post
Share on other sites

I have created a Dell Servers group with autojoin search.

We require OpenManage on all of our Dell servers.  I created a group of remote monitors solely for this group that monitor for specific Windows System Log Event IDs generated by OpenManage.  These open our tickets especially for drive failure.  I plan on creating some internal monitors for less sensitive conditions we wish to track.

My one thought is how to ensure the same results for VMWare ESXi hosts.  We don't have very many, but it's not as easy there.  I also don't think LabTech has support for EsxCLI commands to query the arrays and report back.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...