Autonomous Health Framework (Part 1)

Autonomous Health Framework: How to Use Your Database "Swiss Army
Knife” (Without Poking an Eye Out)

Sean Scott 25+ years working with Oracle technology  UTOUG Board
⁘ RAC SIG Board  Oracle OpenWorld ⁘ Collaborate/IOUG ⁘ Regional UG RAC/MAA ⁘ DR/HA ⁘ TFA/AHF ⁘ Exadata/ODA  Upgrades ⁘ Migration ⁘ Cloud ⁘ Automation DevOps ⁘ Infrastructure as Code  Containers ⁘ Virtualization

Why you need AHF

Why you need AHF • AHF diagnostic collections required by
MOS for some SR • Diagnostic collections accelerate SR resolution • Cluster-aware ADR log inspection and management • Advanced system and log monitoring • Incident control and notification • Connect to MOS • SMTP, REST APIs

Why you need AHF • Built-in Oracle tools: • ORAchk/EXAchk
• OS Watcher • Cluster Verification Utility (CVU) • Hang Manager • Diagnostic Assistant

Why you need AHF • Integrated with: • Database •
ASM and Clusterware • Automatic Diagnostic Repository (ADR) • Grid Infrastructure Management Repository (GIMR) • Cluster Health Advisor (CHA) & Cluster Health Monitor (CHM) • Enterprise Manager

Why you need AHF • Cluster aware: • Run commands
for all, some nodes • Cross-node configuration and file inspection • Central management for ADR • Consolidated diagnostic collection

Why you need AHF • Over 800 health checks •
400 identified as critical/failures • Severe problem check daily: 2AM • All known problem check weekly: 3AM Sunday • Auto-generates a collection when problems detected • Everything required to diagnose & resolve • Results delivered to the notification email

AHF is FREE!

Download AHF

Download AHF • AHF Parent Page: Doc ID 2550798.1 •
AHF On-Premises: Doc ID 2832630.1 (New) • Linux, ZLinux • Solaris x86/SPARC64 • HPUX • AIX 6/7 • Win 64-bit • AHF Gen-2 Cloud: Doc ID 2832594.1 (New)

Download AHF • Major release each quarter • Typically follows
DBRU schedule • Naming convention is year, quarter, release: YY.Q.R • 21.4.0, 21.4.1 • Intermediate releases are common!

Install AHF

Types of installs: Daemon or root • Recommended method •
Cluster awareness • Full AHF capabilities • Includes compliance checks • Enables notifications • Automatic diagnostic collection when issues are detected • May conflict with existing AHF/TFA installations

Types of installs: Local or non-root • Reduced feature set
• No automatic or remote diagnostics, collections • Limited file visibility (must be readable by Oracle home owner) • /var/log/messages • Some Grid Infrastructure logs • May co-exist with Daemon installations • No special pre-install considerations

Don’t take shortcuts Don’t follow Oracle’s installation instructions

Install AHF • Oracle’s instructions work when things are perfect
• Systems are rarely perfect! • AHF and TFA are known for certain… ahem, peculiarities

A brief history lesson… • There are two flavors of
TFA • A version downloaded from MOS • A version included in Grid Infrastructure install & patches • GI version is not fully featured • GI and MOS versions can interfere, conflict

Recommendation: Remove existing AHF/TFA before install

TFA pre-installation checks # Uninstall TFA (as root) tfactl uninstall
# Check for existing AHF/TFA installs which tfactl which ahfctl

TFA pre-installation checks # Locate leftover setup configuration files find
/ -name tfa_setup.txt # Verify files are removed find / -name tfactl find / -name startOSWbb.sh

TFA pre-installation checks # Remove legacy/existing AHF/TFA installations for d
in $(find / -name uninstalltfa) do cd $(dirname $d) ./tfactl uninstall # cd .. && rm -fr . done # Insure ALL AHF/TFA processes are stopped/inactive prior to uninstall # PERFORM THIS STEP ON ALL NODES

Installation—unzip [root@vna1 ahf]# ls -l total 407412 -rw-r--r--. 1 oracle
dba 417185977 Feb 1 23:17 AHF-LINUX_v21.4.1.zip [root@vna1 ahf]# unzip AHF-LINUX_v21.4.1.zip Archive: AHF-LINUX_v21.4.1.zip inflating: README.txt inflating: ahf_setup extracting: ahf_setup.dat inflating: oracle-tfa.pub

Installation

Post-installation sanity checks

Command line tools: ahfctl and tfactl $ tfactl <command> <options>
- or - $ tfactl tfactl> <command> <options> $ tfactl help $ tfactl <command> help $ ahfctl <command> <options> - or - $ ahfctl ahfctl> <command> <options> $ ahfctl help $ ahfctl <command> help

Post-install checks ahfctl version tfactl status ahfctl statusahf tfactl toolstatus
tfactl print hosts tfactl print components tfactl print protocols tfactl print config -node all

status vs statusahf [root@node1 ~]# tfactl status .---------------------------------------------------------------------------------------------. | Host
| Status of TFA | PID | Port | Version | Build ID | Inventory Status | +-------+---------------+-------+------+------------+----------------------+------------------+ | node1 | RUNNING | 28883 | 5000 | 21.4.1.0.0 | 21410020220111213353 | COMPLETE | | node2 | RUNNING | 30339 | 5000 | 21.4.1.0.0 | 21410020220111213353 | COMPLETE | '-------+---------------+-------+------+------------+----------------------+------------------' [root@node1 ~]#

[root@node1 ~]# tfactl statusahf .---------------------------------------------------------------------------------------------. | Host | Status of
TFA | PID | Port | Version | Build ID | Inventory Status | +-------+---------------+-------+------+------------+----------------------+------------------+ | node1 | RUNNING | 28883 | 5000 | 21.4.1.0.0 | 21410020220111213353 | COMPLETE | | node2 | RUNNING | 30339 | 5000 | 21.4.1.0.0 | 21410020220111213353 | COMPLETE | '-------+---------------+-------+------+------------+----------------------+------------------' ------------------------------------------------------------ Master node = node1 orachk daemon version = 214100 Install location = /opt/oracle.ahf/orachk Started at = Wed Feb 02 20:50:12 GMT 2022 Scheduler type = TFA Scheduler Scheduler PID: 28883 ... status vs statusahf

------------------------------------------------------------ ID: orachk.autostart_client_oratier1 ------------------------------------------------------------ AUTORUN_FLAGS = -usediscovery -profile oratier1 -dball
-showpass -tag autostart_client_oratier1 -readenvconfig COLLECTION_RETENTION = 7 AUTORUN_SCHEDULE = 3 2 * * 1,2,3,4,5,6 ------------------------------------------------------------ ------------------------------------------------------------ ID: orachk.autostart_client ------------------------------------------------------------ AUTORUN_FLAGS = -usediscovery -tag autostart_client -readenvconfig COLLECTION_RETENTION = 14 AUTORUN_SCHEDULE = 3 3 * * 0 ------------------------------------------------------------ Next auto run starts on Feb 03, 2022 02:03:00 ID:orachk.AUTOSTART_CLIENT_ORATIER1 statusahf option in tfactl is deprecated and will be removed in AHF 22.1.0. Please start using ahfctl for statusahf, Example: ahfctl statusahf status vs statusahf

Failed installs and upgrades Common issues

Warning remains after a successful upgrade [root@node1 ahf]# ahfctl statusahf
WARNING - AHF Software is older than 180 days. Please consider upgrading AHF to the latest version using ahfctl upgrade. .---------------------------------------------------------------------------------------------. | Host | Status of TFA | PID | Port | Version | Build ID | Inventory Status | +-------+---------------+-------+------+------------+----------------------+------------------+ | node1 | RUNNING | 28883 | 5000 | 21.4.1.0.0 | 21410020220111213353 | COMPLETE | | node2 | RUNNING | 24554 | 5000 | 21.4.1.0.0 | 21410020220111213353 | COMPLETE | '-------+---------------+-------+------+------------+----------------------+------------------' • Run ahfctl syncpatch

Not all nodes appear after upgrade [root@node1 ahf]# tfactl syncnodes
Current Node List in TFA : 1. node1 2. node2 Node List in Cluster : 1. node1 2. node2 Node List to sync TFA Certificates : 1 node2 Do you want to update this node list? Y|[N]: Syncing TFA Certificates on node2 : TFA_HOME on node2 : /opt/oracle.ahf/tfa ...

Not all nodes appear after upgrade (cont) ... TFA_HOME on
node2 : /opt/oracle.ahf/tfa DATA_DIR on node2 : /opt/oracle.ahf/data/node2/tfa Shutting down TFA on node2... Copying TFA Certificates to node2... Copying SSL Properties to node2... Sleeping for 5 seconds... Starting TFA on node2... .---------------------------------------------------------------------------------------------. | Host | Status of TFA | PID | Port | Version | Build ID | Inventory Status | +-------+---------------+-------+------+------------+----------------------+------------------+ | node1 | RUNNING | 28883 | 5000 | 21.4.1.0.0 | 21410020220111213353 | COMPLETE | | node2 | RUNNING | 30339 | 5000 | 21.4.1.0.0 | 21410020220111213353 | COMPLETE | '-------+---------------+-------+------+------------+----------------------+------------------' [root@node1 ahf]#

Installation and upgrade issues • Post-installation troubleshooting: • ahfctl stopahf;
ahfctl startahf • tfactl stop; tfactl start • tfactl status • ahfctl statusahf • tfactl toolstatus • tfactl syncnodes • ahfctl syncpatch

Installation and upgrade issues • Post-installation troubleshooting: • tfactl diagnosetfa
• Create an SR and upload result to MOS

Installation and upgrade issues

Recommendation: Post installation configurations

Move the repository to shared storage!

RAC: Move the repository to shared storage Local Repository Local
Repository Files needed by MOS

RAC: Move the repository to shared storage Shared Repository Files
needed by MOS

Configure email notification

Set email notifications [root@node1 ~]# tfactl set notificationAddress=sean.scott@viscosityna.com Successfully set
notificationAddress=sean.scott@viscosityna.com .---------------------------------------------------------------------------. | node1 | +----------------------------------------------+----------------------------+ | Configuration Parameter | Value | +----------------------------------------------+----------------------------+ | Notification Address ( notificationAddress ) | sean.scott@viscosityna.com | '----------------------------------------------+----------------------------'

Set email notifications Send test email: tfactl sendmail user@company.com

Recommended configurations

Recommended configurations # Repository settings tfactl set autodiagcollect=ON # default
tfactl set trimfiles=ON # default tfactl set reposizeMB= # default=10240 tfactl set rtscan=ON # default tfactl set redact=mask # default=none # Disk space monitoring tfactl set diskUsageMon=ON # default=OFF tfactl set diskUsageMonInterval=240 # Depends on activity. default=60 # Log purge tfactl set autopurge=ON # If space is slim. default=OFF tfactl set manageLogsAutoPurge=ON # default=OFF tfactl set manageLogsAutoPurgeInterval=720 # Set to 12 hours. default=60 tfactl set manageLogsAutoPurgePolicyAge=30d # default=30 tfactl set minfileagetopurge=48 # default=12

Recommended configurations [root@node1 ~]# tfactl print config | egrep "^\|.*\|.*\|$"
| awk -F'|' '{print $2, $3}' | sort | grep -i manage Logs older than the time period will be auto purged(days[d] hours[h]) ( manageLogsAutoPurgePolicyAge ) Managelogs Auto Purge ( manageLogsAutoPurge ) OFF [root@node1 ~]# tfactl set manageLogsAutoPurge=ON Successfully set manageLogsAutoPurge=ON .-------------------------------------------------------. | node1 | +-----------------------------------------------+-------+ | Configuration Parameter | Value | +-----------------------------------------------+-------+ | Managelogs Auto Purge ( manageLogsAutoPurge ) | ON | '-----------------------------------------------+-------' [root@node1 ~]# tfactl print config | egrep "^\|.*\|.*\|$" | awk -F'|' '{print $2, $3}' | sort | grep -i manage Logs older than the time period will be auto purged(days[d] hours[h]) ( manageLogsAutoPurgePolicyAge ) Managelogs Auto Purge ( manageLogsAutoPurge ) ON [root@node1 ~]# tfactl set manageLogsAutoPurgePolicyAge=30d Successfully set manageLogsAutoPurgePolicyAge=30d .----------------------------------------------------------------------------------------------------------------. | node1 | +--------------------------------------------------------------------------------------------------------+-------+ | Configuration Parameter | Value | +--------------------------------------------------------------------------------------------------------+-------+ | Logs older than the time period will be auto purged(days[d]|hours[h]) ( manageLogsAutoPurgePolicyAge ) | 30d | '--------------------------------------------------------------------------------------------------------+-------'

Additional configuration options

View configurations Default configuration list is… unsorted :( Some configurations
listed as parameters, others as descriptions :( tfactl print config tfactl print config | grep -e "^\|.*\|.*\|$" | sort tfactl print config | egrep "^\|.*\|.*\|$" | sort tfactl print config | egrep "^\|.*\|.*\|$" | \ awk -F'|' '{print $2, $3}' | sort tfactl get <configuration>

View configurations [root@node1 ~]# tfactl print config | egrep "^\|.*\|.*\|$"
| awk -F'|' '{print $2, $3}' | sort actionrestartlimit 30 Age of Purging Collections (Hours) ( minFileAgeToPurge ) 12 AlertLogLevel ALL Alert Log Scan ( rtscan ) ON Allowed Sqlticker Delay in Minutes ( sqltickerdelay ) 3 analyze OFF arc.backupmissing 1 arc.backupmissing.samples 2 arc.backup.samples 3 arc.backupstatus 1 Archive Backup Delay Minutes ( archbackupdelaymins ) 40 Auto Diagcollection ( autodiagcollect ) ON Automatic Purging ( autoPurge ) ON Automatic Purging Frequency ( purgeFrequency ) 4 Auto Sync Certificates ( autosynccertificates ) ON BaseLogPath ERROR cdb.backupmissing 1 cdb.backupmissing.samples 2 cdb.backup.samples 1 cdb.backupstatus 1 ...

Not all configurations can be set [root@node1 ~]# tfactl set
... autodiagcollect allow for automatic diagnostic collection when an event is observed (default ON) trimfiles allow trimming of files during diagcollection (default ON) tracelevel control the trace level of log files in /opt/oracle.ahf/data/node1/diag/tfa (default INFO for all facilities) reposizeMB=<n> set the maximum size of diagcollection repository to <n>MB repositorydir=<dir> set the diagcollection repository to <dir> logsize=<n> set the maximum size of each TFA log to <n>MB (default 50 MB) logcount=<n> set the maximum number of TFA logs to <n> (default 10) port=<n> set TFA Port to <n> maxcorefilesize=<n> set the maximum size of Core File to <n>MB (default 20 MB ) maxcompliancesize=<n> set the maximum size of Compliance Index directory <n>MB (default 150 MB ) maxcomplianceruns=<n> set the maximum number of Compliance Runs <n> to be stored (default 30) maxcorecollectionsize=<n> set the maximum collection size of Core Files to <n>MB (default 200 MB ) maxfilecollectionsize=<n> set the maximum file collection size to <n>MB (default 5 GB ) autopurge allow automatic purging of collections when less space is observed in repository (default OFF) autosynccertificates Manage TFA Auto Sync Certificates publicip allow TFA to run on public network

Not all configurations can be set [root@node1 ~]# tfactl set
... redact setting for ACR redaction smtp Update SMTP Configuration minSpaceForRTScan=<n> Minimun space required to run RT Scanning(default 500) rtscan allow Alert Log Scanning diskUsageMon allow Disk Usage Monitoring diskUsageMonInterval=<n> Time interval between consecutive Disk Usage Snapshot(default 60 minutes) manageLogsAutoPurge allow Manage Log Auto Purging manageLogsAutoPurgeInterval=<n> Time interval between consecutive Managelogs Auto Purge(default 60 minutes) manageLogsAutoPurgePolicyAge=<d|h> Logs older than the time period will be auto purged(default 30 days) minfileagetopurge set the age in hours for collections to be skipped by AutoPurge (default 12 Hours) tfaIpsPoolSize set the TFA IPS pool size tfaDbUtlPurgeAge set the TFA ISA Purge Age (in seconds) tfaDbUtlPurgeMode set the TFA ISA Purge Mode (simple/resource) tfaDbUtlPurgeThreadDelay set the TFA ISA Purge Thread Delay (in minutes) tfaDbUtlCrsProfileDelay set the TFA ISA Crs Profile Delay indexRecoveryMode set the Lucene index recovery mode (recreate/restore) rediscoveryInterval set the time interval for running lite rediscovery

Annoyances

Annoyances • Documentation isn’t always current • Commands, options, and
syntax may not match docs • Run tfactl <command> -h or tfactl <command> help • Some commands are user (root, oracle, grid) specific • Regression (usually minor) • Don’t build complex automation on new features • Don’t (always) rush to upgrade to the latest version • Example: GI can’t always see/manage DB & vice-versa

Annoyances • The transition from tfactl to ahfctl is incomplete
• Commands may be: • …available in both • …deprecated in tfactl • …new and unavailable in tfactl • …not ported to ahfctl (yet)

Annoyances • Date format options in commands are inconsistent •
Some require quotes, some don’t, some work either way • Some take double quotes, others take single quotes • YYYY/MM/DD or YYYY-MM-DD or YYYYMMDD or … • Some take dates and times separately • Sometimes there are -d and -t flags • Some take timestamps • Some work with either, others are specific

However… Many commands (incl. complex ones) have an -example option
[root@node1 ~]# tfactl diagcollect -examples Examples: /opt/oracle.ahf/tfa/bin/tfactl diagcollect Trim and Zip all files updated in the last 1 hours as well as chmos/osw data from across the cluster and collect at the initiating node Note: This collection could be larger than required but is there as the simplest way to capture diagnostics if an issue has recently occurred. /opt/oracle.ahf/tfa/bin/tfactl diagcollect -last 8h Trim and Zip all files updated in the last 8 hours as well as chmos/osw data from across the cluster and collect at the initiating node /opt/oracle.ahf/tfa/bin/tfactl diagcollect -database hrdb,fdb -last 1d -z foo Trim and Zip all files from databases hrdb & fdb in the last 1 day and collect at the initiating node ...

However… Many commands (incl. complex ones) have an -example option
[oracle@node1 ~]$ tfactl analyze -examples Examples: /opt/oracle.ahf/tfa/bin/tfactl analyze -since 5h Show summary of events from alert logs, system messages in last 5 hours. /opt/oracle.ahf/tfa/bin/tfactl analyze -comp os -since 1d Show summary of events from system messages in last 1 day. /opt/oracle.ahf/tfa/bin/tfactl analyze -search "ORA-" -since 2d Search string ORA- in alert and system logs in past 2 days. /opt/oracle.ahf/tfa/bin/tfactl analyze -search "/Starting/c" -since 2d Search case sensitive string "Starting" in past 2 days. /opt/oracle.ahf/tfa/bin/tfactl analyze -comp osw -since 6h Show OSWatcher Top summary in last 6 hours. ...

Managing ADR logs

Managing ADR logs Report space use for database, GI logs
Report space variations over time # Reporting tfactl managelogs -show usage # Show all space use in ADR tfactl managelogs -show usage -gi # Show GI space use tfactl managelogs -show usage -database # Show DB space use tfactl managelogs -show usage -saveusage # Save use for variation reports # Report space use variation tfactl managelogs -show variation -since 1d tfactl managelogs -show variation -since 1d -gi tfactl managelogs -show variation -since 1d -database

Managing ADR logs Purge logs in ADR across cluster nodes
ALERT, INCIDENT, TRACE, CDUMP, HM, UTSCDMP, LOG All diagnostic subdirectories must be owned by dba/grid # Purge ADR files tfactl managelogs -purge -older 30d -dryrun # Estimated space saving tfactl managelogs -purge -older 30d # Purge logs > 30 days old tfactl managelogs -purge -older 30d -gi # GI only tfactl managelogs -purge -older 30d -database # Database only tfactl managelogs -purge -older 30d -database all # All databases tfactl managelogs -purge -older 30d -database SID1,SID3 tfactl managelogs -purge -older 30d -node all # All nodes tfactl managelogs -purge -older 30d -node local # Local node tfactl managelogs -purge -older 30d -node NODE1,NODE3

Managing ADR logs - Things to know • First-time purge
can take a long time for: • Large directories • Many files • NOTE: Purge operation loops over files • Strategies for first time purge: • Delete in batches by age—365 days, 180 days, 90 days, etc. • Delete database and GI homes separately • Delete for individual SIDs, nodes

Managing ADR logs - File ownership • Files cannot be
deleted if subdirectories under ADR_HOME are not owned by grid/oracle or oinstall/dba • One mis-owned subdirectory • No files under that ADR_HOME will be purged • Even subdirectories with correct ownership! • Depending on version • grid may not be able to delete files in database ADR_HOMEs • oracle may not be able to delete files in GI ADR_HOMEs

Managing ADR logs - Files not deleted when • ADR_HOME:
• …schema version is mismatched • …library version is mismatched • …schema version is obsolete • …is not registered • …is for an orphaned CRS event or user • …is for an inactive listener

Managing ADR logs - Files not deleted when • ORACLE_SID
or ORACLE_HOME not present in oratab • Duplicate ORACLE_SIDs are present in oratab • Database unique name is mismatched to its directory • Can occur during cloning operations • ADR_BASE is not set properly • $ORACLE_HOME/log/diag directory is missing • $ORACLE_HOME/log/diag/adrci_dir.mif missing • $ORACLE_HOME/log/diag/adrci_dir.mif doesn’t list ADR_BASE

The best commands in AHF analyze, changes, events

analyze # Perform system analysis of DB, ASM, GI, system,
OS Watcher logs/output tfactl analyze # Options: -search "pattern" # Search in DB and CRS alert logs # Sets the search period to -last 1h # Override with -last xh|xd -verbose timeline file1 file2 # Shows timeline for specified files

analyze INFO: analyzing all (Alert and Unix System Logs) logs
for the last 1440 minutes... Please wait... INFO: analyzing host: node1 Report title: Analysis of Alert,System Logs Report date range: last ~1 day(s) Report (default) time zone: GMT - Greenwich Mean Time Analysis started at: 03-Feb-2022 06:27:46 PM GMT Elapsed analysis time: 0 second(s). Configuration file: /opt/oracle.ahf/tfa/ext/tnt/conf/tnt.prop Configuration group: all Total message count: 963, from 02-Feb-2022 08:01:39 PM GMT to 03-Feb-2022 04:23:43 PM GMT Messages matching last ~1 day(s): 963, from 02-Feb-2022 08:01:39 PM GMT to 03-Feb-2022 04:23:43 PM GMT last ~1 day(s) error count: 4, from 02-Feb-2022 08:03:31 PM GMT to 02-Feb-2022 08:11:12 PM GMT last ~1 day(s) ignored error count: 0 last ~1 day(s) unique error count: 3 Message types for last ~1 day(s) Occurrences percent server name type ----------- ------- -------------------- ----- 952 98.9% node1 generic 7 0.7% node1 WARNING 4 0.4% node1 ERROR ----------- ------- 963 100.0%

analyze ... Unique error messages for last ~1 day(s) Occurrences
percent server name error ----------- ------- ----------- ----- 2 50.0% node1 [OCSSD(30863)]CRS-1601: CSSD Reconfiguration complete. Active nodes are node1 . 1 25.0% node1 [OCSSD(2654)]CRS-1601: CSSD Reconfiguration complete. Active nodes are node1 node2 . 1 25.0% node1 [OCSSD(2654)]CRS-1601: CSSD Reconfiguration complete. Active nodes are node1 . ----------- ------- 4 100.0%

changes # Find changes made on the system tfactl changes
# Times and ranges -for "YYYY-MM-DD" -from "YYYY-MM-DD" -to "YYYY-MM-DD" -from "YYYY-MM-DD HH24:MI:SS" -to "YYYY-MM-DD HH24:MI:SS" -last 6h -last 1d

changes [root@node1 ~]# tfactl changes -last 2d Output from host
: node2 ------------------------------ [Feb/02/2022 20:11:16.438]: Package: cvuqdisk-1.0.10-1.x86_64 Output from host : node1 ------------------------------ [Feb/02/2022 19:57:16.438]: Package: cvuqdisk-1.0.10-1.x86_64 [Feb/02/2022 20:11:16.438]: Package: cvuqdisk-1.0.10-1.x86_64

events [root@node1 ~]# tfactl events -last 1d Output from host
: node2 ------------------------------ Event Summary: INFO :3 ERROR :2 WARNING :0 Event Timeline: [Feb/02/2022 20:10:46.649 GMT]: [crs]: 2022-02-02 20:10:46.649 [ORAROOTAGENT(27881)]CRS-5822: Agent '/u01/app/19.3.0.0/grid/ bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:1:3} in /u01/app/grid/diag/crs/node2/crs/trace/ ohasd_orarootagent_root.trc. [Feb/02/2022 20:11:12.856 GMT]: [crs]: 2022-02-02 20:11:12.856 [OCSSD(28472)]CRS-1601: CSSD Reconfiguration complete. Active nodes are node1 node2 . [Feb/02/2022 20:11:57.000 GMT]: [asm.+ASM2]: Reconfiguration started (old inc 0, new inc 4) [Feb/02/2022 20:28:31.000 GMT]: [db.db193h1.DB193H12]: Starting ORACLE instance (normal) (OS id: 24897) [Feb/02/2022 20:28:42.000 GMT]: [db.db193h1.DB193H12]: Reconfiguration started (old inc 0, new inc 4)

The best utilities in AHF alertsummary, grep, tail

alertsummary # Summarize events in database and ASM alert logs
tfactl alertsummary [root@node1 ~]# tfactl alertsummary Output from host : node1 ------------------------------ Reading /u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ------------------------------------------------------------------------ 02 02 2022 20:04:57 Database started ------------------------------------------------------------------------ 02 02 2022 20:07:41 Database started Summary: Ora-600=0, Ora-7445=0, Ora-700=0 ~~~~~~~ Warning: Only FATAL errors reported Warning: These errors were seen and NOT reported Ora-15173 Ora-15032 Ora-15017 Ora-15013 Ora-15326

grep # Find patterns in multiple files tfactl grep "ERROR"
alert tfactl grep -i "error" alert,trace [root@node1 ~]# tfactl grep -i "error" alert Output from host : node1 ------------------------------ Searching 'error' in alert Searching /u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 28: PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES ERROR(s) 375:Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_32035.trc: 378:Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_32049.trc: 446:ERROR: /* ASMCMD */ALTER DISKGROUP ALL MOUNT 543: PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES ERROR(s) 1034:Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_28105.trc: ...

tail # Tail logs by name or pattern tfactl tail
alert_ # Tail all logs matching alert_ tfactl tail alert_ORCL1.log -exact # Tail for an exact match tfactl tail -f alert_ # Follow logs(local node only) [root@node1 ~]# tfactl tail -f alert_ Output from host : node1 ------------------------------ ==> /u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log <== NOTE: cleaning up empty system-created directory '+DATA/vgtol7-rac-c/OCRBACKUP/backup00.ocr.274.1095654191' 2022-02-03T12:23:35.194335+00:00 NOTE: cleaning up empty system-created directory '+DATA/vgtol7-rac-c/OCRBACKUP/backup01.ocr.274.1095654191' 2022-02-03T16:23:43.602629+00:00 NOTE: cleaning up empty system-created directory '+DATA/vgtol7-rac-c/OCRBACKUP/backup01.ocr.275.1095668599' ==> /u01/app/oracle/diag/rdbms/db193h1/DB193H11/trace/alert_DB193H11.log <== TABLE SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY: ADDED INTERVAL PARTITION SYS_P301 (44594) VALUES LESS THAN (TO_DATE(‘... SYS.WRI$_OPTSTAT_HISTGRM_HISTORY: ADDED INTERVAL PARTITION SYS_P304 (44594) VALUES LESS THAN (TO_DATE(‘... 2022-02-03T06:00:16.143988+00:00 Thread 1 advanced to log sequence 22 (LGWR switch) Current log# 2 seq# 22 mem# 0: +DATA/DB193H1/ONLINELOG/group_2.265.1095625353

Questions

Autonomous Health Framework (Part 1)

Autonomous Health Framework (Part 1)

More Decks by Sean Scott

Other Decks in Technology

Featured

Transcript