What's Missing from your RMAN Backup?

What's Missing From Your RMAN Backup? Sean Scott Managing Principal
Consultant — Viscosity North America Oracle ACE Director May 8, 2025

www.viscosityna.com @ViscosityNA Oracle ACE Director Maximum Availability Architecture (MAA) Database
Reliability Engineering RAC ⁘ RMAN ⁘ Data Guard ⁘ Sharding Partitioning ⁘ Engineered Systems Information Lifecycle Management Database Modernization Upgrades ⁘ Patching ⁘ Migrations Cloud ⁘ Hybrid Automation DevOps ⁘ Infrastructure as Code Containers ⁘ Terraform ⁘ Vagrant ⁘ Ansible Observability AHF ⁘ TFA ⁘ CHA ⁘ CHM

www.viscosityna.com @ViscosityNA Oracle on Docker Running Oracle Databases in Linux
Containers Free sample chapter: https://oraclesean.com

www.viscosityna.com @ViscosityNA

www.viscosityna.com @ViscosityNA Data Protection ≠ Database Backups

www.viscosityna.com @ViscosityNA Data protection must include everything necessary for database
recovery on a new host in a new facility.

www.viscosityna.com @ViscosityNA Recovery is complete only after full operational capability
and performance is restored.

www.viscosityna.com @ViscosityNA Data Protection vs. High Availability Data Protection Concepts
• Data protection primarily protects against data loss • RMAN • Data Pump • User-managed backup

www.viscosityna.com @ViscosityNA Data Protection vs. High Availability Data Protection Concepts
• High availability primarily protects against system or site failure • High availability may also offer data protection • Data Guard • GoldenGate

www.viscosityna.com @ViscosityNA Events that require database recovery: Hardware failure Data
Protection Concepts • Database servers • Storage • Networking • Data centers

www.viscosityna.com @ViscosityNA Events that require database recovery: Physical data loss
or corruption Data Protection Concepts • Intra-block corruption only • Corruption is confined to individual block(s) • Mismatched block header and/or footer • Invalid checksum • Empty block • Examples: damaged media, block overwritten with zeroes

www.viscosityna.com @ViscosityNA Events that require database recovery: Logical data loss
or corruption Data Protection Concepts • Intra-block and inter-block corruption • Intra-block: Confined to one block • Inter-block: Exists between blocks • Block header & footer match • Checksum is valid • Yet data is logically inconsistent • Example: lost write

www.viscosityna.com @ViscosityNA Events that require database recovery: User and application
error Data Protection Concepts • Database level: • Bad DDL or DML • Bad application logic • Software bugs • OS-level: • Deleting/changing database files • May be unintentional or malicious

www.viscosityna.com @ViscosityNA Events that require database recovery: User and application
error Data Protection Concepts • Database level: • Bad DDL or DML • Bad application logic • Software bugs • OS-level: • Deleting/changing database files • May be unintentional or malicious RMAN backups can address scenarios beyond "vanilla" full database recovery. Recovery KPIs and enterprise risks may require a layered backup approach beyond the standard weekly level 0/daily level 1/hourly archivelogs.

www.viscosityna.com @ViscosityNA Backups are not Enough Backups are not enough

recovery procedure

www.viscosityna.com @ViscosityNA ~ DBA who hasn't read the documentation “We
have documentation”

www.viscosityna.com @ViscosityNA Cool! "We have documentation" Is it clear and
understandable? How current is it? Is it still accurate and relevant? Has it been tested? If so, how recently? Was it useful?

www.viscosityna.com @ViscosityNA Cool! "We have documentation" Does it cover scenarios
besides a full restore? Partial restore of tables, datafiles, tablespaces, PDBs Flashback, point-in-time recovery Restore control file, server parameter file Recover using a backup control file Restore RAC to Restart/single instance Restore/recover from different media (disk, tape)

besides a full restore? Data Guard switchover, failover Reinstantiate a Data Guard standby Change file names/paths (different filesystem, ASM to filesystem, etc) Restore a database copy on the same host (change DB ID/name) Block media recovery Restore to a different host, loss of ORACLE_HOME

besides a full restore? Manual TSPITR using an auxiliary instance Data Pump export and import Using Guaranteed Restore Points (GRP) Rolling back failed patches or upgrades Register backup sets/pieces Recreate missing archive log files

www.viscosityna.com @ViscosityNA Cool! "We have documentation" Does it address contingencies
and intangibles? Modifying monitoring and alerting Determining space requirements Running jobs with screen or nohup alter system set job_queue_processes=0 Spooling output, echoing commands, setting meaningful timestamps Estimating recovery time

www.viscosityna.com @ViscosityNA Cool! "We have documentation" Does it address contingencies
and intangibles? Preparing an alternative recovery while the primary recovery is running Separating root cause analysis and resolution from "the blame game" Preserving immutable diagnostics, esp. for a suspected breach Responsibilities and boundaries (no C-Suite interference) Rotation schedules and relief teams Phone numbers for 24-hour pizza/food delivery

www.viscosityna.com @ViscosityNA ~ Soon-to-be unemployed database administrator “We have backups”

www.viscosityna.com @ViscosityNA Backups ≠ Recovery! “We have backups” • Are
they tested and validated regularly? • Are the recovery procedures clear and well-documented? • Do they support multiple restore/recovery scenarios? • Do they meet standards for RTO/RPO? • What are the essential dependencies? • What factors might affect recovery time? • Is database recovery validation an isolated or coordinated exercise?

www.viscosityna.com @ViscosityNA Recovery validations typically assume (unrealistically) ideal, benign circumstances

www.viscosityna.com @ViscosityNA Common oversights/shortcomings in DR/HA plans Lessons Learned from
DR/HA Postmortems • Over-reliant on teams "instinctively" knowing what to do • Over-optimistic RTO/RPO without empirical basis • Narrow focus (full restore only) • Isolated scope (DBA steps only) • Strong cognitive bias/blindness • Written once and rarely/never updated

www.viscosityna.com @ViscosityNA Common oversights/shortcomings of recovery tests Lessons Learned from
DR/HA Postmortems • Low criteria for success • Database is up • Meets minimal status checks • Establish basic connectivity • Rarely considers • Infrastructure availability, provisioning steps • Consistent topology • OS/software reinstallation, patching, configuration

DR/HA Postmortems • Intangible and human elements • Chaos • Communication • Chain of command • Dependencies • Cross-training

DR/HA Postmortems • Post-recovery performance • Alternate recovery requirements • PDB/tablespace/datafile/table only • Block repair • PITR, Flashback • Data Pump export/import

DR/HA Postmortems • Post-recovery reconfiguration • Hostname, IP, client changes • Networking, firewall reconfiguration • Security lists, certificates • Does everything still work? • cron jobs • Backups

DR/HA Postmortems • Run critical post-recovery steps • Capture new database backups! • Validate the system: • OraCHK, DBSAT • Validate HA • Data Guard • GoldenGate • Validate ETL/integrations

There's a problem with RMAN

www.viscosityna.com @ViscosityNA RMAN is reliable RMAN is secure RMAN is
consistent RMAN is trusted RMAN is complete RMAN is simple

www.viscosityna.com @ViscosityNA So, what's the problem?

www.viscosityna.com @ViscosityNA RMAN is so good, we assume it does
everything

everything and anticipates our needs

everything in the best- and fastest-possible way

www.viscosityna.com @ViscosityNA RMAN does not magically perform the fastest-possible backup

www.viscosityna.com @ViscosityNA Like databases, RMAN requires configuration and tuning

www.viscosityna.com @ViscosityNA Making RMAN fast takes effort

www.viscosityna.com @ViscosityNA Since RMAN restore performance is a function of
the original backup...

www.viscosityna.com @ViscosityNA ...the options for making a restore faster after
the fact are limited

www.viscosityna.com @ViscosityNA Although... run { backup as copy duration 0:01
partial minimize time tag 'RMAN_go_fast' database; }

www.viscosityna.com @ViscosityNA Fully protecting databases requires an understanding of what
RMAN does[n't] and can[n't] back up!

www.viscosityna.com @ViscosityNA Important database con fi guration fi les! What
RMAN Doesn't Back Up • Password file • Data Guard Broker configurations • GoldenGate files • Block change tracking file • /etc/oratab

www.viscosityna.com @ViscosityNA Important database con fi guration fi les! What
RMAN Doesn't Back Up • Networking files • tnsnames.ora • listener.ora • sqlnet.ora • Wallets, certificates • Temporary configurations • eg, parameter files for starting RMAN duplication

www.viscosityna.com @ViscosityNA Data! Code! What RMAN Doesn't Back Up •
Contents of dba_directories, utl_file_dir including: • External tables • BFILE data • Data Pump parameter, log and dump files • SQL*Loader control files • Compiled Pro*C/C++, Pro*COBOL, etc. • External procedures (e.g. EXTPROC) • OS files/executables called via UTL_FILE

www.viscosityna.com @ViscosityNA Recovery Area! What RMAN Doesn't Back Up •
Flashback logs • When using BACKUP RECOVERY AREA: • Current control file • Online redo logs

www.viscosityna.com @ViscosityNA Important non-database fi les! What RMAN Doesn't Back
Up • Scripts • cron jobs • Passwords • .profile, .bashrc, .bash_profile • Environment files and configurations

www.viscosityna.com @ViscosityNA Diagnostic data What RMAN Doesn't Back Up •
diagnostic_dest • audit_file_dest • background_dump_dest • core_dump_dest • user_dump_dest

www.viscosityna.com @ViscosityNA Cluster Ready Services fi les, contents of GRID_HOME
What RMAN Doesn't Back Up • ASM storage configurations, locations (Oracle Cluster Registry: OCR) • Node-specific resources (Oracle Local Registry: OLR) • Networking files • tnsnames.ora • listener.ora • sqlnet.ora

www.viscosityna.com @ViscosityNA CRS setup What RMAN Doesn't Back Up •
srvctl configuration: • Database and instance settings • Services and service configurations • Environment variables (eg TNS_ADMIN for EBS) • Listener configurations and endpoints

www.viscosityna.com @ViscosityNA Software and inventory What RMAN Doesn't Back Up
• oraInventory • Database software and patches, gold images • Patch manifests • .patch_storage directories • Media Management Libraries (MML) and drivers • Client software • AHF, OEM, GoldenGate, etc.

www.viscosityna.com @ViscosityNA Operating system What RMAN Doesn't Back Up •
Operating system software and patches • Kernel settings and host configurations • Application software and configuration • Agent software

www.viscosityna.com @ViscosityNA Resilient, not resistant

www.viscosityna.com @ViscosityNA Resilient = elastic. Resistant = brittle. Resilient, not
resistant • Hardened systems resist pressure up to a point • Beyond that threshold, they break • Hardening introduces brittleness • Over-engineering resistance is (exponentially) costly

www.viscosityna.com @ViscosityNA Resilient = elastic. Resistant = brittle. Resilient, not
resistant • Resilient systems are: • Elastic • Parameterized • Abstracted • Automated • Platform independent (ideally) • Continuously tested/validated

www.viscosityna.com @ViscosityNA Disaster Recovery Procedures

www.viscosityna.com @ViscosityNA The primary goal of recovery: "Don't make things
worse."

www.viscosityna.com @ViscosityNA Aim to simplify Disaster Recovery Procedures • Complexity
increases the potential of: • Failure • Exceptions/variations between prod/non-prod • Meaningful configuration and parameter differences • Simple procedures limit the scope of QA

www.viscosityna.com @ViscosityNA Aim to automate Disaster Recovery Procedures • Automation
reduces manual effort • Cognitive load is a finite resource • Automation takes care of the (technical) "How to" • Allows teams to focus on "What, Why, and (conceptual) How" • Automation addresses • Dependencies, sequence • Trivial activities (easily undervalued/missed/run out of order)

www.viscosityna.com @ViscosityNA Do not underestimate the stress and chaos of
DR/HA!

www.viscosityna.com @ViscosityNA Aim to automate Disaster Recovery Procedures • Add
sanity checks • Confirm environments • "Are you sure..." checks • Preserve output and timing via logging • Include measurable pass/fail checks

www.viscosityna.com @ViscosityNA Write abstract documentation Disaster Recovery Procedures • Don't
use "common" variables in scripts • Avoid the dreaded "Oops! I ran that in the wrong window!" • Make scripts and documentation "copy/paste-proof" • Copy/paste should work correctly: • ...for every command! • ...in every database! • ...in every environment!

www.viscosityna.com @ViscosityNA Don't use "common" variables in scripts/documents # This:
srvctl stop database -d $__db_name # Not: srvctl stop database -d $ORACLE_SID

www.viscosityna.com @ViscosityNA Don't include complete variable declarations # This: #
Set recovery parameters with the correct values export __db_name= # Add the database name # Not: # Set recovery parameters; change values as needed export __db_name=test # Change the database name!

www.viscosityna.com @ViscosityNA Make documentation "copy/paste-proof" # This: srvctl stop database
-d $__db_name # Not: srvctl stop database -d testdb # Harder to spot values that must be changed!

www.viscosityna.com @ViscosityNA Tips Disaster Recovery Procedures • Consider adding metadata
to shell prompts: • user • pid • date/time • $PWD • $ORACLE_SID/$ORACLE_PDB • Send session output to a file • Increase SSH terminal scrollback/history

www.viscosityna.com @ViscosityNA Tips Disaster Recovery Procedures • Include base configuration
settings in scripts • env | sort • whoami • date • $PWD • ...etc.

www.viscosityna.com @ViscosityNA Consolidate and parameterize Disaster Recovery Procedures • Things
are easier to manage when they're the same • Differentiate systems via parameters only • Procedures that work across multiple environments are easier to: • Test • Validate • Practice

Questions? Contact Me! [email protected] https://linktr.ee/oraclesean www.viscosityna.com

What's Missing from your RMAN Backup?

What's Missing from your RMAN Backup?

More Decks by Seán Scott

Other Decks in Technology

Featured

Transcript