Navigation-Menus (Do Not Edit Here!)

Monday, August 11, 2014

How To Stress Test CPU and memeory Using Linux (RedHat\Cent OS) System





This can be useful for

  • troubleshooting hardware/overheating issues
  • Testing new hardware changes, etc

In this case i wanted to stress test my Dell PE 2950 after modding it to reduce the fan noise - link


Details


Model - Dell PE 2950
OS - ESXI 5.5

I cannot get a lot of info from the exsi health monitor and i cannot run any pakages from the esxi shell.

I needed an effective way to test if decreasing the fan speeds have any negative effects on the system under heavy load.

I used a Cent OS 6.5 installed on to an old 8GB USB thumb drive (not live linux. i have it plugged in to the usb port on the server and its just a matter of changing the boot priority in bios to get it up and running)


Sadly there isn't a lot of options for Linux so finally, I ended up running Prime 95 and a few other monitoring tools to perform the task.

 
Packages and programs used


  • Prime 95 for Linux (Mprime) - link
  • lm_sensors - link
  • OpenIPMI-tools - link

Installing and usage of relevant packages

lm_sensors - link

Read sensory data such as – temperature of cpu cores and ram modules
sudo yum install lm_sensors

It is recommended to run the following command after installation
sudo sensors-detect #Carefully follow the prompts to configure the package

to start monitoring temperature
sensors


OpenIPMI-tools - link
yum install OpenIPMI OpenIPMI-tools
chkconfig ipmi on
service ipmi start

# Usage  Examples

# To check firmware version
ipmitool mc info

# Show sensor output
ipmitool sdr list
ipmitool sdr type list
ipmitool sdr type Temperature
ipmitool sdr type Fan
ipmitool sdr type 'Power Supply'

Prime 95 for Linux (Mprime) - link

This will max out all the cores on both sockets and test a lot of ram. This tool is one of my absolute favorites, when over clocking my gaming rig.
# download the package
wget http://mersenneforum.org/gimps/mprime2511.tar.gz

# Extract
tar zxw mprime2511.tar.gz

# run the program 
./mprime

Follow the on screen prompts to initiate the test, I used the “blend test”


Test environment

Room Ambient temp – 78* (AC was switched off)

Windows open for good air flow


TEST



I ran the prime95 blend test for 60 minutes and It was maxing out all 8 cores and pushing the system to the limit. If it goes through this without a hick up, it should hold up under normal use with no problems.


 I used the following commands with multiple ssh sessions


Monitor cpu usage
top













Process running time
watch ps -p "pid" -o etime=  

#"pid" - get the pid for prime95 from the “top“








Monitor the temperatures continuously
watch sensors
lm_sensors_output































Results

I didnt collect any logs but rather monitored the ssh sessions for any issues, but im happy to say that cpus and the ram modules held up pretty well with the fan mod under a lot of stress

Max temperature recorded was 87* and it went down after the fans spooled up. 




















So there you go. Leave comment and lets us know if you can add anything or correct anything I will update the post.


Friday, August 8, 2014

Reducing Dell PowerEdge (PE) 2950/2900/2800 II/III fan noise - Fan mod + BMC firmware mod (Noob friendly guide)



Dell 2950 III is one of the best bang for the buck servers you can find on Ebaym but there is one problem this server runs very loud by design.

Example (video Credit David Lohle)
 



 
I have my lab setup in my room so I had to do something about this.

After wondering around in the OSMA, DRAC and BIOS with no luck, I turned to almighty Google for help.

Turns out Dell decided not to expose the BMC’s fan controller settings to the users. It’s baked in to the firmware.

Reducing the noise involves two mods, hardware and firmware. 

  1. Fan MOD - Lower the Fan speeds to reduce the noise
  2. Firmware mod - Lowering the BMC fan rpm thresholds  

Update: 

I stress tested the server after the mod, check here for details - Dell PE 2950 Stress test

01. Fan MOD - Lower the Fan speeds to reduce the noise

I stumbled upon this post on the “Blind Caveman’s blog”. - http://blindcaveman.wordpress.com/2013/08/23/problem-dell-poweredge-2950-iii-jet-engine-fan-noise/

Apparently he had success with adding a 47ohm resistor in line to all 4 intake fans, he has a very comprehensive guide on the mod.

I’m just going to put the summery of what I did. (Props to Caveman for coming up with this solution)


Items you need

  • 4pc of 47ohm ½ watt resistors. (Radio shack $1.49)
  • Heat Shrink. (Radio shack $4.59)
  • Soldering iron.
Note : You can drop the resistor value to increase the fan voltage

10v = 12 ohms
9v = 2020 ohms
8v = 3030 ohms
7v = 42
42 ohms

Fan Mod - Steps

01. Remove the cover.

02. Remove the fan by pulling the orange tabs and gently lifting up.
     
















 03. Remove the wire clip cut the “Red” wire and solder the resistor in line with the wire.
     
    Red Wire
























04. Re-seat the fans back on the server. (be careful not to let it touch the heat sink right next to it)

     
    Watch out for the Heat-sink


























Note:
I just modded the intake fans, OP suggest to mod the PSU fans but I don’t think you need to mess with the power supply fans for 3 reasons. 
        • It’s not going to make a huge difference. (my PE is running below 52db with just the intake fans modded)
        • PSU is Expensive to replace. (on Ebay PSU is around $100 but four Dell 2950 Fans cost less than $10)
        • I believe the PSU units should run very cool and efficient as much as possible.
      ---------------------------------------------------------------------------------------------------------------------------

      So after the mod, I booted up the server, it was running significantly quieter. BUT… yes there’s a huge but....

      Issue 01 - OSMA Errors and fan speed issues

      The fan speeds were ramping up and down every few minutes. 
      When i monitored the fan speeds via DRAC and it showed an error with the fans failing since the idle rpm is lower than the minimum rpm threshold.



      What is  happening


      the BMC lower the fan RPM after the initial boot, since the resistor is in place the lowest RPM is around 1800 and the default minimum RPM error threshold is 2250rpm so the BMC panics, spins the fans back up to 100%, lower them again since the error is cleared. So on. it was going on in a never ending cycle of annoyingness.


      So after some more google fu. I found a post written by a German “Artificial intelligence researcher” who faced the same issue after he swapped out the dell fans with lower RPM ones and since dell refused to help him fix it, he engineered his own fix for this by modifying the BMC firmware to reduce the minimum rpm threshold (how cool is that).

      His name is Arnuschky - Link | Post link

      His post is well written to the point (Kudos to you sir) but its not very noob friendly. :(
      So I’m going to make a step by step guide using his post as reference with few more additions, for anyone who is new to open source and messing with dell firmwares.


      02. Firmware mod - Lowering the BMC fan rpm thresholds


      The solution explained-

      Arnuschky figured out the exact setting in the BMC’s firmware, the check-sums etc to modify the fan rpm thresholds and wrote a very nifty script to help us modify the values on a downloaded firmware file.

      What is BMC (board management controller)

      • Among many other things, fans are controlled by the BMC and the fan curve and all the values are baked in to the firmware.

      • BMC (board management controller) by design will ramp up the RPM of the fans every time you add more hardware to the system such as – Add-on cards, RAM, HDD’s, etc

      What is IPMI

      • Intelligent Platform Management Interface, this tool set can be easily installed on any linux distribution and after you enable IPMI in the BIOS (DRAC interface) you can query sensory data from BMC and configure parameters on the BMC.


      Procedure

      Things you should know –
      • This worked for many people including me. Myself nor anyone involved will not be held responsible for any damages caused by proceeding with the firmware mod.

      • You cannot perform this mod on ESXI. But if you are running a base OS like Redhat/CentOS/Ubuntu you should be good to go.

      • You cannot flash the firmware using a VM (If you know a way please let us know)

      • To modify the firmware you have to be on a Linux server, you can technically flash the modified firmware from windows server. I will add the details later in the post


      Packages required

      • BMC Firmware file – Dell Drivers and support
      • IPMI tools
      • glibc.i686 (If you are on a 64bit OS)

      I have Esxi 5.5 installed on the Dell server so I used a Cent OS 6.4 installation running off a USB stick to do the modifications and flashing



      Enable IPMI on the DRAC interface
      • You can do this by logging in to the DRAC web interface or though the bios screen
      • Press ctrl+E during the post screen to access the DRAC card configuration screen and Enable IPMI

      Setting up IPMI Tools

      yum install OpenIPMI OpenIPMI-tools

      Start\Enable the Service

      chkconfig ipmi on
      service ipmi start

      Run the following commands to see if IPMI is working
      ipmitool sdr type Temperature
      Temp             | 01h | ok  |  3.1 | -48 degrees C
      Temp             | 02h | ok  |  3.2 | -42 degrees C
      Temp             | 05h | ok  | 10.1 | 40 degrees C
      Temp             | 06h | ok  | 10.2 | 40 degrees C
      Ambient Temp     | 08h | ok  |  7.1 | 27 degrees C
      CPU Temp Interf  | 76h | ns  |  7.1 | Disabled


      ipmitool sdr type Fan
      FAN 1 RPM        | 30h | ok  |  7.1 | 4200 RPM
      FAN 2 RPM        | 31h | ok  |  7.1 | 4200 RPM
      FAN 3 RPM        | 32h | ok  |  7.1 | 4200 RPM
      FAN 4 RPM        | 33h | ok  |  7.1 | 4200 RPM
      FAN 5 RPM        | 34h | ok  |  7.1 | 4200 RPM
      FAN 6 RPM        | 35h | ok  |  7.1 | 4200 RPM
      Fan Redundancy   | 75h | ok  |  7.1 | Fully Redundant

      Install glibc.i686

      yum install glibc.i686
      note:
      Firmware Flash program is 32bit and it will fail with the following warning on 64bit OS

      /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory


      Download the relevant firmware file

      • Visit - http://www.dell.com/support/

      • Enter your service tag 

      • Select OS version – Redhat or any other linux flavor (This will allow you to download the .bin file containing the firmware, this is what we need to modify the values)
      To save you time here’s the link for the Dell PE 2950 II, BMC firmware V2.50 - direct link

      mkdir bmcfwmod 
      cd bmcfwmod                                  #create project directory
      wget "http://downloads.dell.com/FOLDER00928606M/1/2950_ESM_Firmware_4NNNG_LN32_2.50_A00.BIN"


      Set permissions and extract the firmware .bin file

      chmod 755 BMC_FRMW_LX_R223079.BIN                              # make executable
      sudo mkdir bmc_firmware                                        # create dir as root
      sudo ./BMC_FRMW_LX_R223079.BIN --extract bmc_firmware          # yes, you have to do this as root! :(
      cd bmc_firmware

      Note : You have to extract the bin file in-order to proceed..
      Above commands will extract the firmware bin file, in to the bmc_firmware folder. 
      Check inside the folder to see if you have a file called /payload/bmcflsh.dat.
      If not that means your system is not compatible with this mod. If yes, please continue.

      Patching the firmware file


      Note:
      You should be in the bmc_firmware directory created above

      Download and run the script

      --no-check-certificate switch is used to get around the cert issue due to the github domain name mismatch
      wget "https://raw.github.com/arnuschky/dell-bmc-firmware/master/adjust-fan-thresholds/dell-adjust-fan-thresholds.py --no-check-certificate"
      chmod 755 dell-adjust-fan-thresholds.py    # set permissions
      ./dell-adjust-fan-thresholds.py payload/bmcflsh.dat  #execute the py script on the bmcflsh.dat file

      The script will prompt you with the following screen


      Select your server model in this case I selected Dell PowerEdge 2950 = number 3

      Then it will prompt you to select the fans and adjust the threshold.
      On the DRAC interface the intake fans shows up numbered 1-4,
      I edited the values for the fans 1 thorough 4 (Only the intake fans will be effected)




















      Setting the value

      When you select the fan number it will ask you to enter the value for the new threshold
      This should be entered in multiples of 75 for example the default value is 2025 which is a 27x75 so the default value is 27
      So to reduce the threshold value you need to enter something lower than 27
      I choose 18 as the value, this will drop my threshold to 1350rpm (18x75=1350)

      Saving the changes

      After editing the appropriate values, enter “W” to write the changes to the firmware as prompted.
      This will update the bmcflsh.dat with the modified values

      Flashing the modified firmware

      If you are on a 64bit OS make sure you have the glibc.i686 package installed


      LD_LIBRARY_PATH=./hapi/opt/dell/dup/lib:$LD_LIBRARY_PATH ./bmcfl32l -i=payload/bmcflsh.dat –f

      This will map the necessary Shared Libraries and execute the bmcfl32l to flash the firmware file











      Fans will rev up and stop for a brief moment during the update, don’t worry it will spool up again in a second.
      You do not need to reboot to see the changes, but do a reboot just in case.
      So there you go, your Dell 2950 should be purring away on the shelf silently.

      Note:
      You should disable the IPMI on DRAC since it is a big security risk.


      Tested for more 24 hours

      Update: Dell PE 2950 Stress test after the mod


      • No noticeable temperature difference with the components 
      • No post errors 
      • No OMSA or DRAC errors 


      Noise Level comparison

      Before the mod


      After the mod




      Its a very long post and its almost morning. so forgive me for any grammar, spelling or formatting mistakes.

      Until next time.......

      Thursday, August 7, 2014

      Setting up Dell OMSA (Openmanage Server admin) on Vmware ESXI 5.x (Guide)
















      Hello internetz

      I recently bought a dell 2950 for my home lab and decided to slap esxi on it and this is a guide on how to install OSMA 7.4 on ESXI 5.x

      Note:

      At the time of the writing the latest installable version for ESXI is 5.5 update1 this version had issues with the OMSA agent due to a VMware bug, please update to the latest patch to get it up and running



      http://en.community.dell.com/support-forums/servers/f/177/t/19581502.aspx


      Procedure – Summary

      • Download the relevant packages - Link
      • Enter the ESXI host to maintenance mode & Enable EXSI shell and SSH
      • Install OSMA agent on the EXSI host
      • Reboot
      • Exit maintenance mode

      On your workstation or server

      • Install Dell open manager server administrator on your client PC or a sever
      • Connect to with the ESXI host IP and host login credentials

      Download the relevant packages

      OSMA for various operating systems can be downloaded from the Dell website

      Link


      In this case we are using –

      Dell OpenManage 7.4 for ESXi 5.5 VIB (ESXI Agent)

      Dell OpenManage 7.4 for Microsoft Windows x64 (Management Console)



      Enter the ESXI host to maintenance mode

      Using ssh

      SSH in to the host server and run the following command

      esxcli system maintenanceMode set --enable true

      Using the VSphere client


















      Installing OSMA agent on the EXSI host

      Transfer the Zip file to the host server using winscp

      I used the following location

       /var/log/vmware

      You can use the data stores but I find this much easier
























      Install the VIB file

      esxcli software vib install –d "/var/log/vmware/OM-SrvAdmin-Dell-Web-7.4.0-876.VIB-ESX55i_A00.zip"


      If you transferred the file to a data store

      esxcli software vib install -d "/vmfs/volumes/Datastore/DirectoryName/PatchName.zip" 











      Reboot the server

      Reboot –f


      Make sure the services are up and running

      etc/init.d/./wsman status
      It should return: openwsmand is running

      etc/init.d/./sfcbd status
      It should return: sfcbd is running

      etc/init.d/./hostd status
      It should return: hostd is running


      Exit maintenance mode

      Shell –

      esxcli system maintenanceMode set --enable false

      Installing the management console

      Download the respective package for your workstation/Server and install it.
      I installed the Dell OpenManage 7.4 for Microsoft Windows x64 on my workstation running windows 8.1
      Open the application it will take you to a web interface

      Login using the host server IP and credentials



















      Important-

      Remember to select “ignore certificate issues” otherwise the login will timeout with the error
      Login failed....
      Connection timeout

       

      Saturday, May 3, 2014

      Powershell: simple script for port monitoring (SMTP, HTTP, FTP, etc) using "system.net.sockets.tcpclient" class




      Recently we had a requirement to check SMTP of two diffrent servers and run a script if both servers failed. i googled around for the tool but ended up putting together this script.

      Its not the most prettiest but it works, and im sure you guys will make something much better out of it.


      # Define the host names here for the servers that needs to be monitored
      $servers = "relay1.host.com","relay2.host.com"
      # Define port number
      $tcp_port = "25"
      
      # Loop through each host to get an individual result.
      ForEach($srv in $servers) {
      
          $tcpClient = New-Object System.Net.Sockets.TCPClient
          $tcpClient.Connect($srv,$tcp_port)
      
          $connectState = $tcpClient.Connected
      
          If($connectState -eq $true) {
              Write-Host "$srv is online"
          }
          Else {
              Write-Host "$srv is offline"
          }
      
          $tcpClient.Dispose()
      
      }
      

      If something is wrong or if you think there is a better way please free feel to comment and let everyone know. its all about community after all.

      Update 4/18/2016 -

      Updated the script with the one provided by Donald Gray - Thanks a lot : )

      Wednesday, March 5, 2014

      SmarterMail SMTP Log "Data transfer failed" error

      We ran in to this error few weeks back. with our Backup email relay and spooling system.

      We have a Watchguard firewall with a "SMTP proxy" filtering spam on all inbound traffic to the relay servers

      We noticed that we are having a lot log entries where the email delivery failed with the following entry in the SMTP log

      [2014.03.05] 09:45:46 [89.249.234.3][18261257] cmd: DATA

      [2014.03.05] 09:45:51 [89.249.234.3][18261257] rsp: 354 Start mail input; end with <CRLF>.<CRLF>

      [2014.03.05] 09:47:49 [89.249.234.3][18261257] rsp: 421 Command timeout, closing transmission channel

      [2014.03.05] 09:47:49 [89.249.234.3][18261257] data transfer failed.

      [2014.03.05] 09:47:49 [89.249.234.3][18261257] disconnected at 05/03/2014 09:47:49

      Cause:


      The emails that are dropped with the "data transfer failed." error are emails that were quarantined or blocked by the WG Spam firewall

      Details:



      What's interesting was we were not able to reproduce the issue. it seems to be happening for random emails. then we checked the firewall logs to see if something was happening to the email from the WG firewall 

      Firewall log for the effected email


      ProxyMatch

      ProxyQuarantine: SMTP Classified as confirmed SPAM

      pri=6

      disp=Allow

      policy=VPS072-SMRelay1_SMTP_Proxy_IN-00

      protocol=smtp/tcp

      src_ip=89.249.234.3

      src_port=56029

      dst_ip=185.17.174.55

      dst_port=25

      src_intf=1-WAN1

      dst_intf=104-INBAY_SYSTEMS

      rc=600

      proxy_act=SMTP incoming SM-Relay Proxy



      Smartermail log


      Initial handshake and the SM server creates the connection
      -----------------------------------------------------------------------------------------------------
      [2014.03.05] 09:44:37 [89.249.234.3][18261257] rsp: 220 relay1.inbaysystems.com

      [2014.03.05] 09:44:37 [89.249.234.3][18261257] connected at 05/03/2014 09:44:37

      [2014.03.05] 09:44:41 [89.249.234.3][18261257] cmd: EHLO [89.249.234.3]

      [2014.03.05] 09:44:41 [89.249.234.3][18261257] rsp: 250-relay1.inbaysystems.com Hello [89.249.234.3]250-SIZE 52428800250-AUTH LOGIN CRAM-MD5250-STARTTLS250 OK

      [2014.03.05] 09:44:44 [89.249.234.3][18261257] cmd: MAIL FROM:<[email protected]>

      [2014.03.05] 09:44:49 [89.249.234.3][18261257] rsp: 250 OK <[email protected]> Sender ok

      [2014.03.05] 09:45:20 [89.249.234.3][18261257] cmd: RCPT TO:<[email protected]>

      [2014.03.05] 09:45:20 [89.249.234.3][18261257] rsp: 250 OK <[email protected]> Recipient ok


      The email data transfer starts and during this time the email is scanned and quarantined by the WG firewall 
      causing the "data transfer failed." error
      ----------------------------------------------------------------------------------------------------------------------------
      [2014.03.05] 09:45:46 [89.249.234.3][18261257] cmd: DATA

      [2014.03.05] 09:45:51 [89.249.234.3][18261257] rsp: 354 Start mail input; end with <CRLF>.<CRLF>

      [2014.03.05] 09:47:49 [89.249.234.3][18261257] rsp: 421 Command timeout, closing transmission channel

      [2014.03.05] 09:47:49 [89.249.234.3][18261257] data transfer failed.

      [2014.03.05] 09:47:49 [89.249.234.3][18261257] disconnected at 05/03/2014 09:47:49
       ----------------------------------------------------------------------------------------------------------------------------

       
      I did a lot of research online looking for an explanation for this stumbled upon a few other things that may cause similar issues

      MTU value issues with ISP's and the local network


      http://www.tweakservers.com/421-command-timeout/

      Hope this will help save some time for you guys, since there are no proper explanations or fixes for this issue on the internet