Sunday, March 1, 2020

MAA Architecture and High Availability


MAA Architecture and High Availability
Oracle 12.2 RDBMS with the configuration with ASM, Data-Guard incorporates clustering technology for improved availability for both unplanned outages and planned maintenance. 
Oracle RAC or Oracle RAC One Node for HA within a data center by providing automatic failover should there be an unrecoverable outage of a database instance or a complete failure of the server on which it runs.
Oracle RAC also delivers substantial benefits for eliminating many types of planned downtime by performing maintenance in a rolling manner across Oracle RAC nodes. 
Oracle RAC
Oracle RAC improves application availability within a data center should there be an outage of a database instance or of the server on which it runs. Server failover with Oracle RAC is instantaneous. 
There are two components: Oracle Database instances and the Oracle Database itself.
·         x`A database instance is defined as a set of server processes and memory structures running on a single node (or server) which makes a particular database available to clients.
·         The database is a particular set of shared files (data files, index files, control files, and initialization files) that reside on persistent storage, and together can be opened and used to read and write data.
·         Oracle RAC uses an active-active architecture that enables multiple database instances, each running on different nodes, to simultaneously read and write to the same database.

Advantages of active-active Instance cluster in Oracle RAC
·         Improved HA within the cluster
·         Scalability
·         Reliability
·         HA during planned maintenance like the rolling patch
Oracle One-Node RAC
Oracle RAC One Node provides an option to Oracle RAC when server HA is a requirement, but scalability and instant failover are not.
Oracle Data Guard and ADG
Oracle Active Data Guard maintains one or more synchronized physical replicas (standby databases) at a remote location that are used to eliminate a single point of failure for a production database (the primary database). 
Here in BSP, Oracle Database hosting Oracle Flexcube is been configured with ADG to Report Database and DG in Standby Database (remote location).
Rest of Application database Databases are configured with a single node Standby Database (DG) in the remote location.
RTP and RPO

Here we eliminate single point of failure and provide a higher level of data protection and HA from all types of unplanned outages including data corruption, database failures, and site failures.
The existence of a replicated copy also provides substantial advantages for reducing downtime during periods of planned maintenance.
RTO is reduced to seconds or minutes with RPO of zero or near zero depending upon configuration.

Features during Backup/ Restore
Details
With ZDLRA Recovery Appliance
Without ZDLRA Recovery Appliance
Zero or near-zero data loss potential.
Site failure, Database Failure
Delta push after full backup forever based on backup policies
Real-time transactions copied over for continuous data protection
Oracle Active Data Guard/ Standby database 
Zero data loss in case of Database Failure
1 sec RPO
Near-Zero data loss
Clone to non-prod database
Preparing a copy of the Production database.
Quickly and easily (using OEM) clone full production databases to non-production environments
Backup offload
No overhead while taking database Backup
Overhead during the backup cycle, 
Backup to schedule in both Primary and Standby databases.
Validation for Data-corruption
Data corruption in Storage DIsks
Detect errors on Backup while taking instead at during restore backup
Manually Execution of dbverify or Analyze commands, Recreate objects from Standby or from Backup in case of physical corruption.
RMAN Crosscheck Backup, RMAN Restore Validate
V$DATABASE_BLOCK_CORRUPTION; 
RMAN> RECOVER BLOCK DATAFILE xxx BLOCK yyy;
Data Recovery Advisor recommend commands to perform restore and recovery procedures for
Corruption
Method
Corrupted Block
Block Media Recover
Database/ datafile
The point in Time Recovery
Flashback the entire database
Automatic block repair
Physical block corruption;
corruption can be caused by a faulty disk or disk controller, an
errant bit-flip on a disk, or a bug in the operating system, storage area network (SAN), or storage
system
·         Ingest of Backupsets & Redo Logs
·         Indexing Backupsets into Delta Store
·         Replicating Backupsets & Redo Logs
·         Optimization of the Delta Store
·         Timer Based Validation
·         Validation during Restore
·         Mirror Sync Validation
It can be achieved using Oracle DG, Data Recovery Advisor, Oracle Flashback, Oracle RMAN.
Configuration of the DB_BLOCK_CHECKING=FULL, DB_BLOCK_CHECKSUM=FULL, and DB_LOST_WRITE_PROTECT=TYPICAL parameters on the Data Guard primary and standby databases. ( add overhead in the database during write operations)
Use Active Data Guard to enable Automatic Block Repair ( applicable for FCUBS database from Report Database)
TDE Data Encryption
Tablespace or Table column Encryption
Support functionality during backup/ restore
Support functionality to central Backup location and Tape drives.
Rapid Database Backup Restoration 
Online Database Backup using RMAN as per policy defined
Database Restore using RMAN 
With RA Appliance, Database backup at any point of time can be restored in original or in the second database
Unified OEM console for tracking backup and restore/ recovery jobs
Able to generate chargeback reports against each database.
Using Oracle Secure Backup and SL150 Tape Library.
Manually initiated or through OEM using OSB plugin
Backup Restoration
Database restore during failure
Creates virtual Full Backup of database every-time → Reduces restore time.
Use of Backup Flash Area (NvME storage) for fast backup.
Use of Delta Push using RMAN during backup; 
Re-assemble delta push, validate the physical Full backup during the restore
Restore database backup at any N-day and time.
Uses RFS and MRP processes for REDO block shipping and applying changes.
Use tagged backup during restore, Catalog TAPE before-hand manually or through OSB.
Ownership of taking and restoring Backup
Responsibility of taking backup 
Single OEM console, Application or Stream-based policies;
Multiple back-ups between DBA's, storage admin and Backup Admin
Manually attached policies at the time of backup provisioning.
Backup Protection
Protection against deletion, manual (ad-hoc backup execution)
Policy to protect deletion of backup manually or through RMAN
No protection
Handling Outages
Type
Event
Downtime
Data Loss Potential
Unplanned
Database instance failure
Seconds
Zero
Unplanned
Recoverable server failure
Seconds
Zero
Unplanned
Data corruptions, unrecoverable server failure, database failures or site failures
Zero to minutes (instead of hours to days)
Near-zero if using ASYNC (instead of since the last backup)
Zero if using Data Guard synchronous transport (instead of since the last backup)
Planned
Online File Move, Online Reorganization, and Redefinition, Online Patching
Zero
Zero
Planned
Hardware or operating system maintenance and database patches that cannot be done online
Zero
Zero
Planned
Database upgrades: patch sets and full database releases
Seconds (instead of minutes to hours)
Zero
Planned
Platform migrations
Seconds (instead of hours to a day)
Zero
Planned
Application upgrades that modify back-end database objects
Hours to days
Zero

Data protection offering
Type
Capability
Physical Block Corruption
Logical Block Corruption
Type
Capability
Physical Block Corruption
Logical Block Corruption
Manual
Dbverify, Analyze commands
Physical block checks
Logical checks for intra-block and inter-object consistency
Manual
RMAN
Physical block checks during backup and restore
Intra-block logical checks
Runtime
Oracle Active Data Guard
Physical block checking at standby
Strong isolation between primary and standby eliminates the single point of failure
Automatic repair of physical corruptions
Automatic database failover
Detect lost write corruption, auto shutdown, and failover
Intra-block logical checks at standby
Runtime
Database
In-memory block and redo checksum
In-memory Intra block logical checks
Runtime
ASM 
Automatic corruption detection and repair
Runtime
Recovery Appliance
HARD checks on write
HARD checks on write
Background
Recovery Appliance
Automatic Hard Disk Scrub and Repair
n/a
Background
Recovery Appliance
Complete backup validation including control file, data file backups and REDO
n/a