Friday, February 7, 2020

How to run Exachk report?

How to run Exachk report?

Oracle Exadata Database Machine Exachk or HealthCheck

MOS Document:
Oracle Exadata Database Machine Exachk or HealthCheckOracle Support Document: 1070954.1

Patch 18622611: Latest exachk patch can be downloaded from the patch number 18622611

Create new directory and place the latest exachk report and run the exachk as a root user.

NOTE: To avoid skipped checks, make sure you run the report using the below options:

export RAT_TIMEOUT=120
export RAT_ROOT_TIMEOUT=600
export RAT_PASSWORDCHECK_TIMEOUT=40
export RAT_NOCLEAN_DIR=1
export RAT_IBSWITCH_USER=root
export RAT_PASSWORDCHECK_TIMEOUT=100
export RAT_COPY_EM_XML_FILES=0
export RAT_TIMEOUT=1200
export RAT_ROOT_TIMEOUT=6000s
./exachk –a

Note: Before executing exachk, please check whether you have downloaded the latest one (using ./ exachk -v)

Steps
1. Create New directory 

2. Download patch and place under new directory

3. Unzip the latest exachk downloaded

4. Set the environmental variables as mentioned above
export RAT_TIMEOUT=120
export RAT_ROOT_TIMEOUT=600
export RAT_PASSWORDCHECK_TIMEOUT=40
export RAT_NOCLEAN_DIR=1
export RAT_IBSWITCH_USER=root
export RAT_PASSWORDCHECK_TIMEOUT=100
export RAT_COPY_EM_XML_FILES=0
export RAT_TIMEOUT=1200
export RAT_ROOT_TIMEOUT=6000s

5. Check the exachk version.
./exachk version 

6. Run the exachk report
./exachk –a

7. Sample exachk output log has been attached here.

npexdbadm01:(root)-/home/orauat/Exacheck
>./exachk -a


Checking ssh user equivalency settings on all nodes in cluster for root

Node npexdbadm02 is configured for ssh user equivalency for root user



Searching for running databases . . . . .

.  .  .  .  .  .  .  .
List of running databases registered in OCR

1. EBSXDEV
2. OBIXUAT
3. OBIPUAT
4. EBSXUAT
5. All of above
6. None of above

Select databases from list for checking best practices. For multiple databases, select 5 for All or comma separated number like 1,2 etc [1-6][5]. 6

Searching out ORACLE_HOME for selected databases.

.  .  .  .  .

Checking Status of Oracle Software Stack - Clusterware, ASM, RDBMS

.
.  .  .  . . . .  .  .  .  .  . . . .  .  .  .
-------------------------------------------------------------------------------------------------------
                                                 Oracle Stack Status
-------------------------------------------------------------------------------------------------------
  Host Name        CRS Installed   RDBMS Installed     CRS UP     ASM UP   RDBMS UP     DB Instance Name
-------------------------------------------------------------------------------------------------------
npexdbadm01                 Yes           Yes           Yes       Yes       Yes
npexdbadm02                 Yes           Yes           Yes       Yes       Yes
-------------------------------------------------------------------------------------------------------


Copying plug-ins

. .
Node npexcel01 is configured for ssh user equivalency for root user


Node npexcel02 is configured for ssh user equivalency for root user


Node npexcel03 is configured for ssh user equivalency for root user


.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
npexsw-ibb01 is configured for ssh user equivalency for root user
.
npexsw-iba01 is configured for ssh user equivalency for root user



npexsw-iba01 is configured for ssh user equivalency for root user


*** Checking Best Practice Recommendations ( PASS / WARNING / FAIL ) ***

.

Collections and audit checks log file is
/home/orauat/Exacheck/exachk_npexdbadm01_022718_111411/log/exachk.log

Starting to run exachk in background on npexdbadm02



============================================================
            Node name - npexdbadm01
============================================================

Collecting - ASM Disk Group for Infrastructure Software and Configuration
Collecting - ASM Diskgroup Attributes
Collecting - ASM diskgroup usable free space
Collecting - ASM initialization parameters
Collecting - CPU Information
Collecting - Compute node PCI bus slot speed for infiniband HCAs
Collecting - Exadata Critical Issue EX38
Collecting - Kernel parameters
Collecting - Maximum number of semaphore sets on system
Collecting - Maximum number of semaphores on system
Collecting - OS Packages
Collecting - Patches for Grid Infrastructure
Collecting - Patches for RDBMS Home
Collecting - RDBMS patch inventory
Collecting - Switch Version Information
Collecting - number of semaphore operations per semop system call

Collecting - CRS user limits configuration
Collecting - CRS user time zone check
Collecting - Check alerthistory for non-test open stateless alerts [Database Server]
Collecting - Check alerthistory for stateful alerts not cleared [Database Server]
Collecting - Check alerthistory for test open stateless alerts [Database Server]
Collecting - Clusterware patch inventory
Collecting - Discover switch type(spine or leaf)
Collecting - Exadata Critical Issue DB09
Collecting - Exadata Critical Issue EX30
Collecting - Exadata Critical Issue EX36
Collecting - Exadata software version on database server
Collecting - Exadata system model number
Collecting - Exadata version on database server
Collecting - HCA firmware version on database server
Collecting - HCA transfer rate on database server
Collecting - Infrastructure Software and Configuration for compute
Collecting - MaxStartups setting in sshd_config
Collecting - OFED Software version on database server
Collecting - Obtain hardware information
Collecting - Operating system and Kernel version on database server
Collecting - Oracle monitoring agent and/or OS settings on ADR diagnostic directories
Collecting - Raid controller bus link speed
Collecting - System Event Log
Collecting - Validate key sysctl.conf parameters on database servers
Collecting - Verify Data Network is Separate from Management Network
Collecting - Verify Database Server Disk Controller Configuration
Collecting - Verify Database Server Physical Drive Configuration
Collecting - Verify Database Server Virtual Drive Configuration
Collecting - Verify Disk Cache Policy on database server
Collecting - Verify Hardware and Firmware on Database and Storage Servers (CheckHWnFWProfile) [Database Server]
Collecting - Verify ILOM Power Up Configuration for HOST_AUTO_POWER_ON
Collecting - Verify ILOM Power Up Configuration for HOST_LAST_POWER_STATE
Collecting - Verify IP routing configuration on database servers
Collecting - Verify InfiniBand Address Resolution Protocol (ARP) Configuration on Database Servers
Collecting - Verify InfiniBand Fabric Topology (verify-topology)
Collecting - Verify InfiniBand subnet manager is not running on database server
Collecting - Verify InfiniBand subnet manager is running on an InfiniBand switch
Collecting - Verify Master (Rack) Serial Number is Set [Database Server]
Collecting - Verify NTP configuration on database servers
Collecting - Verify Quorum disks configuration
Collecting - Verify RAID Controller Battery Temperature [Database Server]
Collecting - Verify RAID disk controller CacheVault capacitor condition [Database Server]
Collecting - Verify TCP Segmentation Offload (TSO) is set to off
Collecting - Verify basic Logical Volume(LVM) system devices configuration
Collecting - Verify database server disk controllers use writeback cache
Collecting - Verify database server file systems have Check interval = 0
Collecting - Verify database server file systems have Maximum mount count = -1
Collecting - Verify imageinfo on database server
Collecting - Verify imageinfo on database server to compare systemwide
Collecting - Verify installed rpm(s) kernel type match the active kernel version
Collecting - Verify key InfiniBand fabric error counters are not present
Collecting - Verify service exachkcfg autostart status on database server
Collecting - Verify that the SDP over IB option sdp_apm_enable is set to 0
Collecting - Verify the localhost alias is pingable [Database Server]
Collecting - Verify the Name Service Cache Daemon (NSCD) configuration
Collecting - Verify the storage servers in use configuration matches across the cluster
Collecting - Verify the vm.min_free_kbytes configuration
Collecting - Verify there are no files present that impact normal firmware update procedures [Database Server]
Collecting - root time zone check
Collecting - verify asr exadata configuration check via ASREXACHECK on database server
Starting to run root privileged commands in background on STORAGE SERVER npexcel01 (192.168.10.6)

Starting to run root privileged commands in background on STORAGE SERVER npexcel02 (192.168.10.8)

Starting to run root privileged commands in background on STORAGE SERVER npexcel03 (192.168.10.10)

Starting to run root privileged commands in background on INFINIBAND SWITCH (npexsw-ibb01)

Starting to run root privileged commands in background on INFINIBAND SWITCH (npexsw-iba01)


Collections from STORAGE SERVER:
------------------------------------------------------------
Collecting - Exadata Critical Issue EX10
Collecting - Exadata Critical Issue EX11
Collecting - Exadata Critical Issue EX22
Collecting - Exadata Critical Issue EX28
Collecting - Exadata Critical Issue EX31
Collecting - Exadata Critical Issue EX36
Collecting - Exadata critical issue EX14
Collecting - Exadata critical issue EX16
Collecting - Exadata critical issue EX17
Collecting - Exadata critical issue EX17
Collecting - Exadata software version on storage server
Collecting - Exadata software version on storage servers
Collecting - Exadata storage server system model number
Collecting - Infrastructure Software and Configuration for storage
Collecting - RAID controller version on storage servers
Collecting - Verify Disk Cache Policy on storage servers
Collecting - Verify Exadata Smart Flash Cache is created
Collecting - Verify Hardware and Firmware on Database and Storage Servers (CheckHWnFWProfile) [Storage Server]
Collecting - Verify ILOM Power Up Configuration for HOST_AUTO_POWER_ON on storage servers
Collecting - Verify ILOM Power Up Configuration for HOST_LAST_POWER_STATE on storage servers
Collecting - Verify InfiniBand subnet manager is not running on storage server
Collecting - Verify Master (Rack) Serial Number is Set [Storage Server]
Collecting - Verify NTP configuration on storage servers
Collecting - Verify OSSCONF/cellinit.ora consistency across storage servers
Collecting - Verify RAID Controller Battery Temperature [Storage Server]
Collecting - Verify RAID disk controller CacheVault capacitor condition [Storage Server]
Collecting - Verify Storage Server user CELLDIAG exists
Collecting - Verify active system values match those defined in configuration file cell.conf  [Storage Server]
Collecting - Verify data (non-system) disks on Exadata Storage Servers have no partitions
Collecting - Verify imageinfo on storage server
Collecting - Verify imageinfo on storage server to compare systemwide
Collecting - Verify release tracking bug on storage servers
Collecting - Verify service exachkcfg autostart status on storage server
Collecting - Verify storage server disk controllers use writeback cache
Collecting - Verify that griddisks are distributed as expected across celldisks
Collecting - Verify the localhost alias is pingable [Storage Server]
Collecting - Verify there are no files present that impact normal firmware update procedures [Storage Server]
Collecting - verify asr exadata configuration check via ASREXACHECK on storage servers
Collecting - Check alerthistory for non-test open stateless alerts [Storage Server]
Collecting - Check alerthistory for stateful alerts not cleared [Storage Server]
Collecting - Check alerthistory for test open stateless alerts [Storage Server]
Collecting - Configure Storage Server alerts to be sent via email
Collecting - Determine storage server type(All Flash/High Capacity)
Collecting - Exadata Celldisk predictive failures
Collecting - Exadata storage server root filesystem free space
Collecting - HCA firmware version on storage server
Collecting - OFED Software version on storage server
Collecting - Operating system and Kernel version on storage server
Collecting - Storage server flash cache mode
Collecting - Storage server make and model
Collecting - Verify Data Network is Separate from Management Network on storage server
Collecting - Verify Datafiles are Placed on Diskgroups consisting of griddisks with correct attributes
Collecting - Verify Ethernet Cable Connection Quality on storage servers
Collecting - Verify ExaWatcher is executing [Storage Server]
Collecting - Verify Exadata Smart Flash Cache is actually in use
Collecting - Verify Exadata Smart Flash Cache status is normal
Collecting - Verify Exadata Smart Flash Log is Created
Collecting - Verify InfiniBand Cable Connection Quality on storage servers
Collecting - Verify average ping times to DNS nameserver [Storage Server]
Collecting - Verify celldisk configuration on disk drives
Collecting - Verify celldisk configuration on flash memory devices
Collecting - Verify griddisk ASM status
Collecting - Verify griddisk count matches across all storage servers where a given prefix name exists
Collecting - Verify storage server metric CD_IO_ST_RQ
Collecting - Verify the percent of available celldisk space used by the griddisks
Collecting - Verify there are no griddisks configured on flash memory devices
Collecting - mpt_cmd_retry_count from /etc/modprobe.conf on Storage Servers


Collections from INFINIBAND SWITCH:
------------------------------------------------------------
Collecting - Exadata Critical Issue IB5
Collecting - Exadata Critical Issue IB6
Collecting - Hostname in /etc/hosts
Collecting - Infiniband Switch NTP configuration
Collecting - Infiniband subnet manager status
Collecting - Infiniband switch HCA status
Collecting - Infiniband switch HOSTNAME configuration
Collecting - Infiniband switch firmware version
Collecting - Infiniband switch health
Collecting - Infiniband switch localtime configuration
Collecting - Infiniband switch module configuration
Collecting - Infiniband switch subnet manager configuration
Collecting - Infiniband switch type(Spine or leaf)
Collecting - Infrastructure Software and Configuration for switch
Collecting - Verify average ping times to DNS nameserver [IB Switch]
Collecting - Verify no IB switch ports disabled due to excessive symbol errors
Collecting - Verify the localhost alias is pingable [IB Switch]
Collecting - sm_priority configuration on Infiniband switch


Data collections completed. Checking best practices on npexdbadm01.
------------------------------------------------------------



 FAIL =>     One or more database servers have stateful alerts that have not been cleared
 FAIL =>     Oracle monitoring agent and Operating systems settings on Automatic diagnostic  repository directories are not correct or not all targets have been scanned or not all diagnostic directories found
 FAIL =>     The "oradism" file is not correctly configured for /u01/app/oracle/product/11.2.0.4/OBIPUAT
 FAIL =>     The "oradism" file is not correctly configured for /u01/app/oracle/product/11.2.0.4/OBIXUAT
 WARNING =>  Oracle database software owner hard nofile shell limit is not configured according to recommendation
 WARNING =>  Oracle database software owner soft nproc shell limit is not configured according to recommendation
 FAIL =>     Storage Server alerts are not configured to be sent via email
 CRITICAL => Oracle database(s) should be using RDS protocol over InfiniBand Network for /u01/app/oracle/product/11.2.0.4/EBSXDEV
 WARNING =>  Key InfiniBand fabric error counters should not be present
 CRITICAL => One or more Ethernet network cables are not connected.
 WARNING =>  Average ping times to DNS nameserver may be negatively impacting SSH operations. on infiniband switch npexsw-ibb01
 WARNING =>  Average ping times to DNS nameserver may be negatively impacting SSH operations. on infiniband switch npexsw-iba01
 WARNING =>  Average ping times to DNS nameserver may be negatively impacting SSH operations.
 INFO =>     Verify the percent of available celldisk space used by the griddisks
 INFO =>     Exadata Critical Issues (Doc ID 1270094.1):- DB1-DB4,DB6,DB9-DB40, EX1-EX26,EX29-EX39 and IB1-IB3,IB5-IB6
 FAIL =>     The ASM failure group configuration is not as recommended
 CRITICAL => There should be enough freespace in all diskgroups to reestablish redundancy after a single disk failure
Collecting patch inventory on CRS HOME /u01/app/12.2.0.1/grid
Collecting patch inventory on ASM HOME /u01/app/12.2.0.1/grid
Collecting patch inventory on ORACLE_HOME /u01/app/oracle/product/11.2.0.4/EBSXDEV
Collecting patch inventory on ORACLE_HOME /u01/app/oracle/product/11.2.0.4/OBIPUAT
Collecting patch inventory on ORACLE_HOME /u01/app/oracle/product/11.2.0.4/OBIXUAT



Copying results from npexdbadm02 and generating report. This might take a while. Be patient.


Collecting - CRS user limits configuration
Collecting - CRS user time zone check
Collecting - Check alerthistory for non-test open stateless alerts [Database Server]
Collecting - Check alerthistory for stateful alerts not cleared [Database Server]
Collecting - Check alerthistory for test open stateless alerts [Database Server]
Collecting - Clusterware patch inventory
Collecting - Exadata Critical Issue DB09
Collecting - Exadata Critical Issue EX30
Collecting - Exadata Critical Issue EX36
Collecting - Exadata software version on database server
Collecting - Exadata system model number
Collecting - Exadata version on database server
Collecting - HCA firmware version on database server
Collecting - HCA transfer rate on database server
Collecting - Infrastructure Software and Configuration for compute
Collecting - MaxStartups setting in sshd_config
Collecting - OFED Software version on database server
Collecting - Obtain hardware information
Collecting - Operating system and Kernel version on database server
Collecting - Oracle monitoring agent and/or OS settings on ADR diagnostic directories
Collecting - Raid controller bus link speed
Collecting - System Event Log
Collecting - Validate key sysctl.conf parameters on database servers
Collecting - Verify Data Network is Separate from Management Network
Collecting - Verify Database Server Disk Controller Configuration
Collecting - Verify Database Server Physical Drive Configuration
Collecting - Verify Database Server Virtual Drive Configuration
Collecting - Verify Disk Cache Policy on database server
Collecting - Verify Hardware and Firmware on Database and Storage Servers (CheckHWnFWProfile) [Database Server]
Collecting - Verify ILOM Power Up Configuration for HOST_AUTO_POWER_ON
Collecting - Verify ILOM Power Up Configuration for HOST_LAST_POWER_STATE
Collecting - Verify IP routing configuration on database servers
Collecting - Verify InfiniBand Address Resolution Protocol (ARP) Configuration on Database Servers
Collecting - Verify InfiniBand subnet manager is not running on database server
Collecting - Verify Master (Rack) Serial Number is Set [Database Server]
Collecting - Verify NTP configuration on database servers
Collecting - Verify Quorum disks configuration
Collecting - Verify RAID Controller Battery Temperature [Database Server]
Collecting - Verify RAID disk controller CacheVault capacitor condition [Database Server]
Collecting - Verify TCP Segmentation Offload (TSO) is set to off
Collecting - Verify basic Logical Volume(LVM) system devices configuration
Collecting - Verify database server disk controllers use writeback cache
Collecting - Verify database server file systems have Check interval = 0
Collecting - Verify database server file systems have Maximum mount count = -1
Collecting - Verify imageinfo on database server
Collecting - Verify imageinfo on database server to compare systemwide
Collecting - Verify installed rpm(s) kernel type match the active kernel version
Collecting - Verify no database server kernel out of memory errors
Collecting - Verify service exachkcfg autostart status on database server
Collecting - Verify that the SDP over IB option sdp_apm_enable is set to 0
Collecting - Verify the localhost alias is pingable [Database Server]
Collecting - Verify the Name Service Cache Daemon (NSCD) configuration
Collecting - Verify the storage servers in use configuration matches across the cluster
Collecting - Verify the vm.min_free_kbytes configuration
Collecting - Verify there are no files present that impact normal firmware update procedures [Database Server]
Collecting - root time zone check
Collecting - verify asr exadata configuration check via ASREXACHECK on database server

============================================================
            Node name - npexdbadm02
============================================================

Collecting - CPU Information
Collecting - Compute node PCI bus slot speed for infiniband HCAs
Collecting - Exadata Critical Issue EX38
Collecting - Kernel parameters
Collecting - Maximum number of semaphore sets on system
Collecting - Maximum number of semaphores on system
Collecting - OS Packages
Collecting - Patches for Grid Infrastructure
Collecting - Patches for RDBMS Home
Collecting - RDBMS patch inventory
Collecting - number of semaphore operations per semop system call


Data collections completed. Checking best practices on npexdbadm02.
------------------------------------------------------------



 INFO =>     Oracle GoldenGate failure prevention best practices
 FAIL =>     Oracle monitoring agent and Operating systems settings on Automatic diagnostic  repository directories are not correct or not all targets have been scanned or not all diagnostic directories found
 FAIL =>     The "oradism" file is not correctly configured for /u01/app/oracle/product/11.2.0.4/OBIPUAT
 FAIL =>     The "oradism" file is not correctly configured for /u01/app/oracle/product/11.2.0.4/OBIXUAT
 WARNING =>  Oracle database software owner hard nofile shell limit is not configured according to recommendation
 WARNING =>  Oracle database software owner soft nproc shell limit is not configured according to recommendation
 CRITICAL => Oracle database(s) should be using RDS protocol over InfiniBand Network for /u01/app/oracle/product/11.2.0.4/EBSXDEV
 CRITICAL => One or more Ethernet network cables are not connected.
Collecting patch inventory on CRS HOME /u01/app/12.2.0.1/grid
Collecting patch inventory on ASM HOME /u01/app/12.2.0.1/grid
Collecting patch inventory on ORACLE_HOME /u01/app/oracle/product/11.2.0.4/EBSXDEV
Collecting patch inventory on ORACLE_HOME /u01/app/oracle/product/11.2.0.4/OBIPUAT
Collecting patch inventory on ORACLE_HOME /u01/app/oracle/product/11.2.0.4/OBIXUAT




------------------------------------------------------------
                      CLUSTERWIDE CHECKS
------------------------------------------------------------

 CRITICAL => All Database and Storage Servers should be synchronized with the same NTP server
------------------------------------------------------------
Detailed report (html) -  /home/orauat/Exacheck/exachk_npexdbadm01_022718_111411/exachk_npexdbadm01_022718_111411.html


UPLOAD [if required] - /home/orauat/Exacheck/exachk_npexdbadm01_022718_111411.zip



npexdbadm01:(root)-/home/orauat/Exacheck
>


Regards,
Mallik

No comments:

Post a Comment

Automation Script | Archivelog Generation Hourly Monitoring

1. List out all the running databases and pic one database where we want to monitore the archive log generation from last 1 month. [oracle@o...