MOS for some SR • Diagnostic collections accelerate SR resolution • Cluster-aware ADR log inspection and management • Advanced system and log monitoring • Incident control and notification • Connect to MOS • SMTP, REST APIs
400 identified as critical/failures • Severe problem check daily: 2AM • All known problem check weekly: 3AM Sunday • Auto-generates a collection when problems detected • Everything required to diagnose & resolve • Results delivered to the notification email
Cluster awareness • Full AHF capabilities • Includes compliance checks • Enables notifications • Automatic diagnostic collection when issues are detected • May conflict with existing AHF/TFA installations
• No automatic or remote diagnostics, collections • Limited file visibility (must be readable by Oracle home owner) • /var/log/messages • Some Grid Infrastructure logs • May co-exist with Daemon installations • No special pre-install considerations
TFA • A version downloaded from MOS • A version included in Grid Infrastructure install & patches • GI version is not fully featured • GI and MOS versions can interfere, conflict
in $(find / -name uninstalltfa) do cd $(dirname $d) ./tfactl uninstall # cd .. && rm -fr . done # Insure ALL AHF/TFA processes are stopped/inactive prior to uninstall # PERFORM THIS STEP ON ALL NODES
-showpass -tag autostart_client_oratier1 -readenvconfig COLLECTION_RETENTION = 7 AUTORUN_SCHEDULE = 3 2 * * 1,2,3,4,5,6 ------------------------------------------------------------ ------------------------------------------------------------ ID: orachk.autostart_client ------------------------------------------------------------ AUTORUN_FLAGS = -usediscovery -tag autostart_client -readenvconfig COLLECTION_RETENTION = 14 AUTORUN_SCHEDULE = 3 3 * * 0 ------------------------------------------------------------ Next auto run starts on Feb 03, 2022 02:03:00 ID:orachk.AUTOSTART_CLIENT_ORATIER1 statusahf option in tfactl is deprecated and will be removed in AHF 22.1.0. Please start using ahfctl for statusahf, Example: ahfctl statusahf status vs statusahf
Current Node List in TFA : 1. node1 2. node2 Node List in Cluster : 1. node1 2. node2 Node List to sync TFA Certificates : 1 node2 Do you want to update this node list? Y|[N]: Syncing TFA Certificates on node2 : TFA_HOME on node2 : /opt/oracle.ahf/tfa ...
| Repository Parameter | Value | +---------------------------+---------------------------------+ | Old Location | /opt/oracle.ahf/data/repository | | New Location | /some/directory/repository | | Current Maximum Size (MB) | 10240 | | Current Size (MB) | 0 | | Status | OPEN | ‘---------------------------+---------------------------------' # Repository commands are applied only on the local node
tfactl set trimfiles=ON # default tfactl set reposizeMB= # default=10240 tfactl set rtscan=ON # default tfactl set redact=mask # default=none # Disk space monitoring tfactl set diskUsageMon=ON # default=OFF tfactl set diskUsageMonInterval=240 # Depends on activity. default=60 # Log purge tfactl set autopurge=ON # If space is slim. default=OFF tfactl set manageLogsAutoPurge=ON # default=OFF tfactl set manageLogsAutoPurgeInterval=720 # Set to 12 hours. default=60 tfactl set manageLogsAutoPurgePolicyAge=30d # default=30 tfactl set minfileagetopurge=48 # default=12
| awk -F'|' '{print $2, $3}' | sort | grep -i manage Logs older than the time period will be auto purged(days[d] hours[h]) ( manageLogsAutoPurgePolicyAge ) Managelogs Auto Purge ( manageLogsAutoPurge ) OFF [root@node1 ~]# tfactl set manageLogsAutoPurge=ON Successfully set manageLogsAutoPurge=ON .-------------------------------------------------------. | node1 | +-----------------------------------------------+-------+ | Configuration Parameter | Value | +-----------------------------------------------+-------+ | Managelogs Auto Purge ( manageLogsAutoPurge ) | ON | '-----------------------------------------------+-------' [root@node1 ~]# tfactl print config | egrep "^\|.*\|.*\|$" | awk -F'|' '{print $2, $3}' | sort | grep -i manage Logs older than the time period will be auto purged(days[d] hours[h]) ( manageLogsAutoPurgePolicyAge ) Managelogs Auto Purge ( manageLogsAutoPurge ) ON [root@node1 ~]# tfactl set manageLogsAutoPurgePolicyAge=30d Successfully set manageLogsAutoPurgePolicyAge=30d .----------------------------------------------------------------------------------------------------------------. | node1 | +--------------------------------------------------------------------------------------------------------+-------+ | Configuration Parameter | Value | +--------------------------------------------------------------------------------------------------------+-------+ | Logs older than the time period will be auto purged(days[d]|hours[h]) ( manageLogsAutoPurgePolicyAge ) | 30d | '--------------------------------------------------------------------------------------------------------+-------'
... autodiagcollect allow for automatic diagnostic collection when an event is observed (default ON) trimfiles allow trimming of files during diagcollection (default ON) tracelevel control the trace level of log files in /opt/oracle.ahf/data/node1/diag/tfa (default INFO for all facilities) reposizeMB=<n> set the maximum size of diagcollection repository to <n>MB repositorydir=<dir> set the diagcollection repository to <dir> logsize=<n> set the maximum size of each TFA log to <n>MB (default 50 MB) logcount=<n> set the maximum number of TFA logs to <n> (default 10) port=<n> set TFA Port to <n> maxcorefilesize=<n> set the maximum size of Core File to <n>MB (default 20 MB ) maxcompliancesize=<n> set the maximum size of Compliance Index directory <n>MB (default 150 MB ) maxcomplianceruns=<n> set the maximum number of Compliance Runs <n> to be stored (default 30) maxcorecollectionsize=<n> set the maximum collection size of Core Files to <n>MB (default 200 MB ) maxfilecollectionsize=<n> set the maximum file collection size to <n>MB (default 5 GB ) autopurge allow automatic purging of collections when less space is observed in repository (default OFF) autosynccertificates Manage TFA Auto Sync Certificates publicip allow TFA to run on public network
... redact setting for ACR redaction smtp Update SMTP Configuration minSpaceForRTScan=<n> Minimun space required to run RT Scanning(default 500) rtscan allow Alert Log Scanning diskUsageMon allow Disk Usage Monitoring diskUsageMonInterval=<n> Time interval between consecutive Disk Usage Snapshot(default 60 minutes) manageLogsAutoPurge allow Manage Log Auto Purging manageLogsAutoPurgeInterval=<n> Time interval between consecutive Managelogs Auto Purge(default 60 minutes) manageLogsAutoPurgePolicyAge=<d|h> Logs older than the time period will be auto purged(default 30 days) minfileagetopurge set the age in hours for collections to be skipped by AutoPurge (default 12 Hours) tfaIpsPoolSize set the TFA IPS pool size tfaDbUtlPurgeAge set the TFA ISA Purge Age (in seconds) tfaDbUtlPurgeMode set the TFA ISA Purge Mode (simple/resource) tfaDbUtlPurgeThreadDelay set the TFA ISA Purge Thread Delay (in minutes) tfaDbUtlCrsProfileDelay set the TFA ISA Crs Profile Delay indexRecoveryMode set the Lucene index recovery mode (recreate/restore) rediscoveryInterval set the time interval for running lite rediscovery
syntax may not match docs • Run tfactl <command> -h or tfactl <command> help • Some commands are user (root, oracle, grid) specific • Regression (usually minor) • Don’t build complex automation on new features • Don’t (always) rush to upgrade to the latest version • Example: GI can’t always see/manage DB & vice-versa
Some require quotes, some don’t, some work either way • Some take double quotes, others take single quotes • YYYY/MM/DD or YYYY-MM-DD or YYYYMMDD or … • Some take dates and times separately • Sometimes there are -d and -t flags • Some take timestamps • Some work with either, others are specific
[root@node1 ~]# tfactl diagcollect -h Collect logs from across nodes in cluster Usage : /opt/oracle.ahf/tfa/bin/tfactl diagcollect [ [component_name1] [component_name2] ... [component_nameN] | [-srdc <srdc_profile>] | [-defips]] [-sr <SR#>] [-node <all|local|n1,n2,..>] [-tag <tagname>] [-z <filename>] [-acrlevel <system,database,userdata>] [-last <n><m|h|d>| -from <time> -to <time> | -for <time>] [-nocopy] [-notrim] [-silent] [-cores][-collectalldirs][-collectdir <dir1,dir2..>][-collectfiles <file1,..,fileN,dir1,..,dirN> [-onlycollectfiles]][- examples] components:-ips|-database|-asm|-crsclient|-dbclient|-dbwlm|-tns|-rhp|-procinfo|-cvu|-afd|-crs|-cha|-wls|-emagenti|-emagent|-oms|-omsi|-ocm|-emplugins|-em|- acfs|-install|-cfgtools|-os|-ashhtml|-ashtext|-awrhtml|-awrtext|-sosreport|-qos|-ahf|-dataguard -srdc Service Request Data Collection (SRDC). -database Specify comma separated list of db unique names for collection -defips Include in the default collection the IPS Packages for: ASM, CRS and Databases -sr Enter SR number to which the collection will be uploaded -node Specify comma separated list of host names for collection -tag <tagname> The files will be collected into tagname directory inside repository -z <zipname> The collection zip file will be given this name within the TFA collection repository -last <n><m|h|d> Files from last 'n' [m]inutes, 'n' [d]ays or 'n' [h]ours -since Same as -last. Kept for backward compatibility. -from "Mon/dd/yyyy hh:mm:ss" From <time> or "yyyy-mm-dd hh:mm:ss" or "yyyy-mm-ddThh:mm:ss" or “yyyy-mm-dd" ...
[root@node1 ~]# tfactl diagcollect -examples Examples: /opt/oracle.ahf/tfa/bin/tfactl diagcollect Trim and Zip all files updated in the last 1 hours as well as chmos/osw data from across the cluster and collect at the initiating node Note: This collection could be larger than required but is there as the simplest way to capture diagnostics if an issue has recently occurred. /opt/oracle.ahf/tfa/bin/tfactl diagcollect -last 8h Trim and Zip all files updated in the last 8 hours as well as chmos/osw data from across the cluster and collect at the initiating node /opt/oracle.ahf/tfa/bin/tfactl diagcollect -database hrdb,fdb -last 1d -z foo Trim and Zip all files from databases hrdb & fdb in the last 1 day and collect at the initiating node ...
[oracle@node1 ~]$ tfactl analyze -examples Examples: /opt/oracle.ahf/tfa/bin/tfactl analyze -since 5h Show summary of events from alert logs, system messages in last 5 hours. /opt/oracle.ahf/tfa/bin/tfactl analyze -comp os -since 1d Show summary of events from system messages in last 1 day. /opt/oracle.ahf/tfa/bin/tfactl analyze -search "ORA-" -since 2d Search string ORA- in alert and system logs in past 2 days. /opt/oracle.ahf/tfa/bin/tfactl analyze -search "/Starting/c" -since 2d Search case sensitive string "Starting" in past 2 days. /opt/oracle.ahf/tfa/bin/tfactl analyze -comp osw -since 6h Show OSWatcher Top summary in last 6 hours. ...
Report space variations over time # Reporting tfactl managelogs -show usage # Show all space use in ADR tfactl managelogs -show usage -gi # Show GI space use tfactl managelogs -show usage -database # Show DB space use tfactl managelogs -show usage -saveusage # Save use for variation reports # Report space use variation tfactl managelogs -show variation -since 1d tfactl managelogs -show variation -since 1d -gi tfactl managelogs -show variation -since 1d -database
can take a long time for: • Large directories • Many files • NOTE: Purge operation loops over files • Strategies for first time purge: • Delete in batches by age—365 days, 180 days, 90 days, etc. • Delete database and GI homes separately • Delete for individual SIDs, nodes
deleted if subdirectories under ADR_HOME are not owned by grid/oracle or oinstall/dba • One mis-owned subdirectory • No files under that ADR_HOME will be purged • Even subdirectories with correct ownership! • Depending on version • grid may not be able to delete files in database ADR_HOMEs • oracle may not be able to delete files in GI ADR_HOMEs
• …schema version is mismatched • …library version is mismatched • …schema version is obsolete • …is not registered • …is for an orphaned CRS event or user • …is for an inactive listener
or ORACLE_HOME not present in oratab • Duplicate ORACLE_SIDs are present in oratab • Database unique name is mismatched to its directory • Can occur during cloning operations • ADR_BASE is not set properly • $ORACLE_HOME/log/diag directory is missing • $ORACLE_HOME/log/diag/adrci_dir.mif missing • $ORACLE_HOME/log/diag/adrci_dir.mif doesn’t list ADR_BASE
OS Watcher logs/output tfactl analyze # Options: -search "pattern" # Search in DB and CRS alert logs # Sets the search period to -last 1h # Override with -last xh|xd -verbose timeline file1 file2 # Shows timeline for specified files