Azure Oracle Advanced

This paper is for DBA’s, Architects and database managers contemplating using Silk + Azure for their Oracle IaaS based solution. It covers many aspects of basic Linux/Oracle administration and prep work as well reference architecture and DR strategies for using SILK + Azure in a risk aversive, highly performant and cost-effective manner. One overlying principle should be kept in mind. Silk requires no modification to the Oracle stack whatsoever. Once the Silk LUN’s are presented to Oracle via ASM all tuning, administration and life cycle management are the same as generic high performance , I/O databases. As such the topics such as huge page configuration and semaphore values though universal to Oracle performance tuning will be key to maximizing Oracle workload performance.

Oracle Licensing considerations.

The following is a link to Oracle public facing policy regarding licensing Oracle RDMS and other Oracle Corporation software on non-Oracle public clouds including Azure. For Intel and AMD processors the core factor is the same at .5. That that is 2 vCPU’s equal one Oracle processor license.

Microsoft Azure – count two vCPUs as equivalent to one Oracle Processor license if multithreading of processor cores is enabled…

Oracle IAAS VM Configuration

Choose vm based on compute not storage bandwidth.

You can use one of the Marketplace images available or build your own custom image/kernel based on your on-premise source server. The current Enterprise Edition release of Oracle on the Azure Marketplace is 19.3.0.0. Note that the current Marketplace image for 19c contains only the RDBMS software. The Grid Infrastructure software must be separately download installed and configured before ASM can be deployed. Additionally, 19.0.3 is a base installation and will have to be upgraded. Thus, it is recommended that a custom image be built on your IaaS VM rather than utilize a marketplace image. The current patch/version matrix of Oracle 19c is listed below:

(Oracle ASM support for marketplace images?) point to any available document. Describing oS/ASM setup

Which Azure VM shapes are recommended?

The vCPU count, memory configuration and perhaps most key, the I/O characteristics of the shape must be taken into consideration. The goal is to find the best, most optimal shape in terms of vCPU count, memory foot print and I/O using Silk I/O capabilities to enhance performance and reduce cost.  I/O bandwidth is key right up along with cpu count. Here is how that breaks out for some typical Oracle/Azure/Silk DB server shapes:

Shape/vCPU CountMAX IO Ops / Second
D16ads_v5 – 32vCPU51200
D32ads_v5 – 48 vCPU76800
D48ads_v5-  64 vCPU80000

 

D96ads_v5 – 96 vCPU

80000

Note that I/O ops per second increases linearly until we arrive at the 64-vCPU shape. This is key. Silk can easily handle 80000K IO’/sec even in a base, entry level deployment format. Silk does not use the same data path way that Azure disk I/O normally traverses. It takes advantage of the larger pipe size of the network interface. Note that the smaller shapes have less network bandwidth, and this can be crucial to Oracle workload right sizing – with Silk you may be able to use a smaller vCPU shape and still accommodate your I/O requirements.

+++Give an example of what I/O optimizations Silk can realize by showing a specific nodes capability with/ without tsilk.

+DATA + RCO accelerated networking at least  4 luns per diskgroup

Vmiages based

How many I/O’s how much cpu/memory required?

These questions need to be answered by something of a deep analysis of the workload, it’s status and its projected growth and development path. Sizing the Oracle instance to the current workload is best accomplished by using the AWR functionality built into Oracle Enterprise Edition of the Oracle database. An ideal set of data would be a one hour strobe of the system while it is under it’s heaviest load and then a week long interval to determine work rate patterns. The more extensive the awr analysis done the better the targeted system will be sized.

For RAC systems, one instance, preferably the heaviest loaded should be analyzed  via AWR along with the global cache reporting capability of AWR in RAC mode. The awr scripts are found under $ORACLE_HOME/rdbms/admin. The primary script is called awrrpt.sql the RAC version is called  awrgrpt.sql. AWR is key to workload analysis early on and to be aware that not all workloads will benefit from the optimizations that Silk provides. A significant I/O workload must be present for Silk to show effectiveness. Analyzing the AWR reports is the best way to determine this.

Linux installation

Oracle Enterprise Linux has several rpms and libraries added to it that are required to run Oracle. The exact manifest varies from release to release. It is based on the Fedora/RedHat branch of Linux and uses rpm’s to manage its software libraries and binaries. A number of OS tunings can be made to optimize Oracle RDBMS performance. These are discussed below.

Find ms or oracle document about grid rdbms install on azure

Oracle Grid Infrastructure Installation

Consider increasing the size of your root filesystem to reflect the additional logging and binaries added by the Oracle/ASM stack. It is a very good practice to locate the “ORACLE_BASE” (typically /u01/app/oracle/) on a non -root device. The software stack can get quite large and there is considerable logging and trace overhead from the Oracle processes.

Oracle Grid Infrastructure binaries include components required by ASM. But a number of Linux/ASM rpm’s must also be installed. Generally, Grid Infrastructure is installed first and the GRID Home is where the systems TNS Listener will be running. – Not the RDBMS ORACLE_HOME. This will not be the case with the current 19c MarketPlace image which has the Oracle RDBMS binaries pre- installed but not the Grid Infrastructure software.

There are several choices for managing ASM at the device level. Oracle corporation itself offers two different methods. ASMLIB and ASM Filter Driver. ASM Filter Driver is the newer technology. ASMLIB and ASMFD are mutually exclusive. During Grid installation you are given the option to install the ASMFD drivers and configuration. In either case root access is required to identify, prepare and present LUNS to ASM.

This is how to prepare a Silk disk ( put link to multipath config here) with ASMFD:

First install and patch the grid software – but do not run the root based configuration scripts yet. Next, immediately patch up to your most current patchlevel with a command similar to this:

opatchauto apply /home/oracle/stage/22646084 -oh $ORACLE_HOME

If performing a GUI installation there is a check box for installing ASMFD. The silent install parameter is “oracle.install.asm.configureAFD=true”. Once the silent install is completed ASM disks can be prepared. The following commands are run:

# $ORACLE_HOME/bin/asmcmd afd_configure

Verify that ASMFD is properly configured:

$ $ORACLE_HOME/bin/asmcmd afd_state

ASMCMD-9526: The AFD state is ‘LOADED’ and filtering is ‘ENABLED’ on host ‘myhost’

To  label a disk invoke the followining command:

# asmcmd afd_label DATA1 /dev/disk1a –init

This puts an ASMFD label on the device /dev/disk1a. Subsequently we will put this ASM disk into an ASM diskgroup.

Oracle Grid/RDBMS Installation

RDBMS/Grid Infrastructure Patching

Installation is an ideal time to get your binary stack up the latest patch level. Patch levels are constantly changing. Consult Oracle My Oracle Support ( Formerly Metalink) to get the latest patches for your installation environment.

SQLNET/Network Configuration

Azure networking configuration must be configured to allow communication on  Oracle SQLNETTCP Port 1521 by default. While this is the default port other ports can be specified and in fact the DOD STIG recommends a non-1521 port number to reduce attack exposure.

Azure/Oracle OS certification information can be found here:

Database Migration

Database migration from on premise or from another cloud can be accomplished in several ways. But there are typically two distinct challenges; the initial bulk – load/creation of the database and then establishing a continuous data capture (CDC) mechanism to keep it in synch with its original source until official cutover is accomplished. – We are assuming non-trivial typically mixed/primary OLTP environments that are mutating constantly and need to be kept updated. ( It is far easier to restore from backup or DataPump out a database in non-real time for lower regions.)

Oracle provides several tools to accomplish both the requirements of initial load and CDC.

RMAN, Data Pump help accomplish the initial db load. Azure Data Factory, DataGuard and Golden Gate are all tools to accomplish CDC once the base db is instantiated in the cloud.

RMAN can also be used to create a databases online via the duplicate RMAN command . This requires high network bandwidth and low latency to be successful. Typical alternatives over doing duplication ‘over the wire’ are to create multiple RMAN backup pieces that can be streamed up to the cloud in parallel– again restricted only by bandwidth and latency. Subsequently after the DB is reincarnated and recovered, a CDC mechanism will be required to synch it to its on-premise source.

Other uses for RMAN include using it to manage transportable tablespaces and doing source side endian conversions prior to migration. Finally, RMAN is key to instantiation of DataGuard. RMAN remains an invaluable tool for the DBA being able to do Point In Time Recoveries (PITR) to come back from logical operator errors/ malicious intent. RMAN is key to a robust production database environment

Data Pump

DataPump can be used to move non-critical databases that don’t need to be exactly synched up to the latest transaction. Data Pump can also be used to instantiate a production database if a CDC mechanism is used in conjunction.  DataPump can solve many problems. It is by far the easiest way to solve Big/Little Endian conversion issues. You can take a Solaris Big Endian data dump file and data pump it into a x86_64 Linux machine transparently, It can also handle many character set conversions as well. Upgrades and migration are also be done this way. The big issue is that it takes time to dump, transport and reload with impdp.

Golden Gate and DataGuard are CDC mechanisms that allow the DBA to keep two Oracle  DB’s  ( Or  in Golden Gate use cases heterogenous databases) in synch with either logical ( Golden Gate) or physical (Data Guard) replication technologies. These are critical to keep the DB’s up to date until cutover time when migrating to the cloud.

CDC Configurations

Change Data Capture sub systems are used to keep source and target databases in synch both before and after migration to the cloud.

Golden Gate

Golden Gate uses a proprietary technique to generate what it refers to as ‘trail files’ These are parsed redo log activity streams that GoldenGate can then forward to other DB targets and replay as actual SQL statements against the target Database using its so called replicat processes. Newer versions of GoldenGate are MicroService based and ar somewhat restricted as to which older RDBMS versions that are supported. For example, Oracle RDBMS 10g to 19c GoldnGate replication won’t happen but 11gR2 to 18C is possible.

DataGuard Physical Standby

DataGuard physical standby is an Oracle RDBMS only technology that uses the redo stream/and or the archive logs to replicate at a physical block level microtransactions committed against the primary(source) database to one or more standby (target) databases. These can be cascaded or multiplexed as well. Versions must be kept very close between the two hosts. The OS and version of DB must be used. A number of other criteria such as adding standby redo logs and other items need to be done.

Implementation is done using RMAN to create the initial replica. The standby database is essentially in a modified read only mode that allows the redo stream to update the DB simultaneously. The active DataGuard option allows end users to access the standby as a read only report server. This can help leverage the cost and value of a near local DR site – it can also be used for reporting purposes to affray some cost of DR.

Post DB Migration tasks

A number of tasks should be undertaken post migration/upgrade. Stats collection as deep as possible (higher sample percentages are better) —  is recommended. This includes the system schemas as well.  Once this is accomplished and before production cutover extensive performance and regression testing should be performed. Key stake holders should be called upon to validate data functionality and performance levels. UAT environments should be already present but a refresh of the UAT regions is probably in order after DB cutover.

Database Fault Tolerance

Multi-availability zone with synchronous DG configuration:

Azure multiple availability zones and Oracle synchronous DataGuard Repication can help create a highly available resilient database system. Oracle’s MAA architecture uses DataGuard as an integral part of its architecture and it fits well within the Azure ecosystem as well

While Oracle RAC can also be used for high availability on-premises, Oracle RAC alone cannot be used for high availability in the cloud as it only protects against instance level

Customers generally require a high SLA for running their mission critical applications. Oracle RAC is currently not certified or supported by Oracle on Azure. However, Azure offers features such as Azure offers Availability Zones and planned maintenance windows to help protect against instance-level failures. In addition to this, customers can use technologies such as Oracle Data Guard, Oracle and Golden Gate for high performance and resiliency. These technologies help by protect Azure Oracle  databases from rack-level as well as datacenter-level and geo-political failures.

Inter region DataGuard with asynch option

One step further with DataGuard is to place a second DataGuard instance in another region entirely. This one however, should be set up with asynchronous communication back to the primary server – or cascaded from the first standby. Synchronous DataGuard can effectively stop the primary should communication fail or the standby fails for whatever reason. Going across continents to different regions with a synchronous communications link is not recommended. But true DR capability is achieved by inter region DataGuard in asychnch or far syynchmode.

Azure Blob storage and RMAN with Silk

Silk recommends placing the Oracle Fast Recovery Area in the Silk Data Pod – and to have multiple layers of redundancy based on DataGuard and Azure availability zones and regions. Further protection can be assured by utilizing RMAN to not only backup to the FRA but also to Azure blob destination set up as a file system presentation.  On Azure RMAN can use the Disk channel rather than SBT libraries. This means one less library management overhead while still affording a near online but system unrelated “last chance/try” back up.

Oracle / Silk patching guidelines:

 

The Silk SDP will be transparently patched and maintained with no intervention needed by the DBA or system admin. The OS, the Oracle Grid Infrastructure and the Oracle RDBMS software ( as well as any application/schema changes) are still in their realm of customer/MSP responsibility. DataGuard switchover technology can be employed to maintain business continuity during OS and Oracle patching events. Oracle has published numerous white papers and articles regarding the use of DataGuard to perform patching. Here is one reference.

Here is a RACI breakdown of the different roles and responsibilities concerning an Oracle/Silk/Azure implementation.

 

RResponsible
AAccountable
CConsulted
IInformed

 

RACI Diagram for Oracle/Silk

/Azure Deployment

CustomerMSPAZURESilk  OracleAzure
Oracle RDBMSA/RRIC  II
Oracle Grid InfrastructureA/RRIC  II
Operating SystemA/RRIC  II
Silk Data PodIIIR  II
Hardware/Network/HypervisorsIIRI  IR

Conclusion

Silk + Azure offers on premise performance, SAN like features and high value process automation capabilities that make it far superior than other cloud offerings. Full control of the IaaS stack by an experienced hands-on IT team can provide equal or better service levels than some PaaS solutions. An IaaS solution though labor intensive gives the most control and customization available in the cloud. Yet cloud features such as automation via IAC, dynamic scaling and easily connected external data services and analytics remain available and at the ready.Silk + Azure IaaS is a solution for an organization’s most challenging workloads.

https://silk.us/blog/migrating-oracle-databases-from-on-prem-to-silk-azure-using-shareplex/

Scroll to Top