Slide 1

Slide 1 text

Sean Scott Oracle ACE Director Managing Principal Consultant, Viscosity North America What's Missing From Your RMAN Backup? RMOUG Training Days 2025

Slide 2

Slide 2 text

Database Reliability Engineering MAA ⁘ RAC ⁘ RMAN Data Guard ⁘ Sharding ⁘ Partitioning Information Lifecycle Management Exadata & Engineered Systems Database Modernization Upgrades ⁘ Patching ⁘ Migrations Cloud ⁘ Hybrid Automation DevOps ⁘ IaC ⁘ Containers ⁘ Terraform Vagrant ⁘ Ansible Observability AHF ⁘ TFA ⁘ CHA ⁘ CHM

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

www.viscosityna.com @ViscosityNA Oracle on Docker Running Oracle Databases in Linux Containers Free sample chapter: https://oraclesean.com

Slide 5

Slide 5 text

www.viscosityna.com @ViscosityNA

Slide 6

Slide 6 text

www.viscosityna.com @ViscosityNA Data Protection ≠ Database Backups

Slide 7

Slide 7 text

www.viscosityna.com @ViscosityNA Data protection must include everything necessary for database recovery on a new host in a new facility.

Slide 8

Slide 8 text

www.viscosityna.com @ViscosityNA Recovery is complete only after full operational capability and performance is restored.

Slide 9

Slide 9 text

www.viscosityna.com @ViscosityNA Data Protection vs. High Availability Data Protection Concepts • Data protection primarily protects against data loss • RMAN • Data Pump • User-managed backup

Slide 10

Slide 10 text

www.viscosityna.com @ViscosityNA Data Protection vs. High Availability Data Protection Concepts • High availability primarily protects against system or site failure • High availability may also offer data protection • Data Guard • GoldenGate

Slide 11

Slide 11 text

www.viscosityna.com @ViscosityNA Events that require database recovery: Hardware failure Data Protection Concepts • Database servers • Storage • Networking • Data centers

Slide 12

Slide 12 text

www.viscosityna.com @ViscosityNA Events that require database recovery: Physical data loss or corruption Data Protection Concepts • Intra-block corruption only • Corruption is confined to individual block(s) • Mismatched block header and/or footer • Invalid checksum • Empty block • Examples: damaged media, block overwritten with zeroes

Slide 13

Slide 13 text

www.viscosityna.com @ViscosityNA Events that require database recovery: Logical data loss or corruption Data Protection Concepts • Intra-block and inter-block corruption • Intra-block: Confined to one block • Inter-block: Exists between blocks • Block header & footer match • Checksum is valid • Yet data is logically inconsistent • Example: lost write

Slide 14

Slide 14 text

www.viscosityna.com @ViscosityNA Events that require database recovery: User and application error Data Protection Concepts • Database level: • Bad DDL or DML • Bad application logic • Software bugs • OS-level: • Deleting/changing database files • May be unintentional or malicious

Slide 15

Slide 15 text

www.viscosityna.com @ViscosityNA Events that require database recovery: User and application error Data Protection Concepts • Database level: • Bad DDL or DML • Bad application logic • Software bugs • OS-level: • Deleting/changing database files • May be unintentional or malicious RMAN backups can address scenarios beyond "vanilla" full database recovery. Recovery KPIs and enterprise risks may require a layered backup approach beyond the standard weekly level 0/daily level 1/hourly archivelogs.

Slide 16

Slide 16 text

www.viscosityna.com @ViscosityNA Backups are not Enough Backups are not enough

Slide 17

Slide 17 text

recovery procedure

Slide 18

Slide 18 text

www.viscosityna.com @ViscosityNA ~ DBA who hasn't read the documentation “We have documentation”

Slide 19

Slide 19 text

www.viscosityna.com Cool! "We have documentation" Is it clear and understandable? How current is it? Is it still accurate and relevant? Has it been tested? If so, how recently? Was it useful?

Slide 20

Slide 20 text

www.viscosityna.com Cool! "We have documentation" Does it cover scenarios besides a full restore? Restore individual datafiles/tablespaces Restore SYSTEM tablespace, control file, etc. Tablespace or PDB point-in-time recovery Restore individual PDB

Slide 21

Slide 21 text

www.viscosityna.com Cool! "We have documentation" Does it cover scenarios besides a full restore? Restore/recover from different media (disk, tape) Restore to a different host, change DB ID/name Data Guard switchover, failover Reinstantiate a Data Guard standby

Slide 22

Slide 22 text

www.viscosityna.com @ViscosityNA ~ Soon-to-be unemployed database administrator “We have backups”

Slide 23

Slide 23 text

www.viscosityna.com @ViscosityNA Backups ≠ Recovery! “We have backups” • Are they tested and validated regularly? • Are the recovery procedures clear and well-documented? • Do they support multiple restore/recovery scenarios? • Do they meet standards for RTO/RPO? • What are the essential dependencies? • What factors might affect recovery time? • Is database recovery validation an isolated or coordinated exercise?

Slide 24

Slide 24 text

www.viscosityna.com @ViscosityNA Common oversights/shortcomings in DR/HA plans Lessons Learned from DR/HA Postmortems • Over-reliant on teams "instinctively" knowing what to do • Over-optimistic RTO/RPO without empirical basis • Narrow focus (full restore only) • Isolated scope (DBA steps only) • Strong cognitive bias/blindness • Written once and rarely/never updated

Slide 25

Slide 25 text

www.viscosityna.com @ViscosityNA Common oversights/shortcomings of recovery tests Lessons Learned from DR/HA Postmortems • Low criteria for success • Database is up • Meets minimal status checks • Establish basic connectivity • Rarely considers • Infrastructure availability, provisioning steps • Consistent topology • OS/software reinstallation, patching, configuration

Slide 26

Slide 26 text

www.viscosityna.com @ViscosityNA Common oversights/shortcomings of recovery tests Lessons Learned from DR/HA Postmortems • Intangible and human elements • Chaos • Communication • Chain of command • Dependencies • Cross-training

Slide 27

Slide 27 text

www.viscosityna.com @ViscosityNA Common oversights/shortcomings of recovery tests Lessons Learned from DR/HA Postmortems • Post-recovery performance • Alternate recovery requirements • PDB/Datafile/Tablespace only • Block repair • PITR, Flashback

Slide 28

Slide 28 text

www.viscosityna.com @ViscosityNA Common oversights/shortcomings of recovery tests Lessons Learned from DR/HA Postmortems • Post-recovery reconfiguration • Hostname, IP, client changes • Networking, firewall reconfiguration • Security lists, certificates • Does everything still work? • cron jobs • Backups

Slide 29

Slide 29 text

www.viscosityna.com @ViscosityNA Common oversights/shortcomings of recovery tests Lessons Learned from DR/HA Postmortems • Run critical post-recovery steps • Capture new database backups! • Validate the system: • OraCHK, DBSAT • Validate HA • Data Guard • GoldenGate • Validate ETL/integrations

Slide 30

Slide 30 text

www.viscosityna.com @ViscosityNA

Slide 31

Slide 31 text

www.viscosityna.com @ViscosityNA Important database con fi guration fi les! What RMAN Doesn't Back Up • Password file • Data Guard Broker configurations • GoldenGate files • Block change tracking file • /etc/oratab

Slide 32

Slide 32 text

www.viscosityna.com @ViscosityNA Important database con fi guration fi les! What RMAN Doesn't Back Up • Networking files • tnsnames.ora • listener.ora • sqlnet.ora • Wallets, certificates • Temporary configurations • eg, parameter files for starting RMAN duplication

Slide 33

Slide 33 text

www.viscosityna.com @ViscosityNA Data! Code! What RMAN Doesn't Back Up • Contents of dba_directories, utl_file_dir including: • External tables • BFILE data • Data Pump parameter, log and dump files • SQL*Loader control files • Compiled Pro*C/C++, Pro*COBOL, etc. • External procedures (e.g. EXTPROC) • OS files/executables called via UTL_FILE

Slide 34

Slide 34 text

www.viscosityna.com @ViscosityNA Recovery Area! What RMAN Doesn't Back Up • Flashback logs • When using BACKUP RECOVERY AREA: • Current control file • Online redo logs

Slide 35

Slide 35 text

www.viscosityna.com @ViscosityNA Important non-database fi les! What RMAN Doesn't Back Up • Scripts • cron jobs • Passwords • .profile, .bashrc, .bash_profile • Environment files and configurations

Slide 36

Slide 36 text

www.viscosityna.com @ViscosityNA Diagnostic data What RMAN Doesn't Back Up • diagnostic_dest • audit_file_dest • background_dump_dest • core_dump_dest • user_dump_dest

Slide 37

Slide 37 text

www.viscosityna.com @ViscosityNA Cluster Ready Services fi les, contents of GRID_HOME What RMAN Doesn't Back Up • ASM storage configurations, locations (Oracle Cluster Registry: OCR) • Node-specific resources (Oracle Local Registry: OLR) • Networking files • tnsnames.ora • listener.ora • sqlnet.ora

Slide 38

Slide 38 text

www.viscosityna.com @ViscosityNA CRS setup What RMAN Doesn't Back Up • srvctl configuration: • Database and instance settings • Services and service configurations • Environment variables (eg TNS_ADMIN for EBS) • Listener configurations and endpoints

Slide 39

Slide 39 text

www.viscosityna.com @ViscosityNA Software and inventory What RMAN Doesn't Back Up • oraInventory • Database software and patches, gold images • Patch manifests • .patch_storage directories • Media Management Libraries (MML) and drivers • Client software • AHF, OEM, GoldenGate, etc.

Slide 40

Slide 40 text

www.viscosityna.com @ViscosityNA Operating system What RMAN Doesn't Back Up • Operating system software and patches • Kernel settings and host configurations • Application software and configuration • Agent software

Slide 41

Slide 41 text

www.viscosityna.com @ViscosityNA Disaster Recovery Procedures

Slide 42

Slide 42 text

www.viscosityna.com @ViscosityNA The primary goal of recovery: "Don't make things worse."

Slide 43

Slide 43 text

www.viscosityna.com @ViscosityNA Aim to simplify Disaster Recovery Procedures • Complexity increases the potential of: • Failure • Exceptions/variations between prod/non-prod • Meaningful configuration and parameter differences • Simple procedures limit the scope of QA

Slide 44

Slide 44 text

www.viscosityna.com @ViscosityNA Aim to automate Disaster Recovery Procedures • Automation reduces manual effort • Cognitive load is a finite resource • Automation takes care of the (technical) "How to" • Allows teams to focus on "What, Why, and (conceptual) How" • Automation addresses • Dependencies, sequence • Trivial activities (easily undervalued/missed/run out of order)

Slide 45

Slide 45 text

www.viscosityna.com @ViscosityNA Do not underestimate the stress and chaos of DR/HA!

Slide 46

Slide 46 text

www.viscosityna.com @ViscosityNA Aim to automate Disaster Recovery Procedures • Add sanity checks • Confirm environments • "Are you sure..." checks • Preserve output and timing via logging • Include measurable pass/fail checks

Slide 47

Slide 47 text

www.viscosityna.com @ViscosityNA Write abstract documentation Disaster Recovery Procedures • Don't use "common" variables in scripts • Avoid the dreaded "Oops! I ran that in the wrong window!" • Make scripts and documentation "copy/paste-proof" • Copy/paste should work correctly: • ...for every command! • ...in every database! • ...in every environment!

Slide 48

Slide 48 text

www.viscosityna.com @ViscosityNA Don't use "common" variables in scripts/documents # This: srvctl stop database -d $__db_name # Not: srvctl stop database -d $ORACLE_SID

Slide 49

Slide 49 text

www.viscosityna.com @ViscosityNA Don't include complete variable declarations # This: # Set recovery parameters with the correct values export __db_name= # Add the database name # Not: # Set recovery parameters; change values as needed export __db_name=test # Change the database name!

Slide 50

Slide 50 text

www.viscosityna.com @ViscosityNA Make documentation "copy/paste-proof" # This: srvctl stop database -d $__db_name # Not: srvctl stop database -d testdb # Harder to spot values that must be changed!

Slide 51

Slide 51 text

www.viscosityna.com @ViscosityNA Tips Disaster Recovery Procedures • Consider adding metadata to shell prompts: • user • pid • date/time • $PWD • $ORACLE_SID/$ORACLE_PDB • Send session output to a file • Increase SSH terminal scrollback/history

Slide 52

Slide 52 text

www.viscosityna.com @ViscosityNA Tips Disaster Recovery Procedures • Include base configuration settings in scripts • env | sort • whoami • date • $PWD • ...etc.

Slide 53

Slide 53 text

www.viscosityna.com @ViscosityNA Consolidate and parameterize Disaster Recovery Procedures • Things are easier to manage when they're the same • Differentiate systems via parameters only • Procedures that work across multiple environments are easier to: • Test • Validate • Practice

Slide 54

Slide 54 text

Questions? Contact Me! [email protected] https://linktr.ee/oraclesean www.viscosityna.com

Slide 55

Slide 55 text

No content