Recently i acquired the Oracle Cloud Infrastructure Foundations 2020 Associate certification. In this blog i would like to brief about my learning on the topic.

Oracle Cloud Infrastructure (OCI) Architecture

Knowing the architecture will provide a proper foundation in understanding the future topics. Here you will come across terms like Regions, Availability Domains(similar to Availability Zones in AWS), Fault Domains, High Availability Design and Compartments.

There are 21 OCI Regions available worldwide and additional regions are scheduled to be setup. Moving on to the definitions, a Region is localized area comprised of one or more Availability Domains (AD). Availability Domains are one or more fault tolerant, isolated data centers located within a region that are connected to each other by a low latency, high bandwidth network. Since they are physically isolated, the failure of one AD is unlikely to affect the availability of others. Fault Domains refer to the grouping of hardware and infrastructure within an Availability Domain. Each AD has three Fault Domains (FD). These act as logical data centers within an AD.

The whole idea of a Region having multiple ADs and an AD having multiple FDs is to provide High Availability and avoid single points of failure. Hence designing an architecture to deploy an instance that performs same tasks in different FDs in a single AD region and in different ADs for a multiple AD region is the best practice. Finally, a compartment is a collection of related resources. It helps to isolate and control access to your resources independent of region. Resources in one compartment can be moved to another compartment and also one compartment resources can interact with resources in another compartment.

OCI Core Services

The core services offered by OCI include Compute, Storage, Networking, IAM, Database. I will brief on each of the topic as follows ;

Compute

There are multiple offerings under this based on customer workload and application requirements. The offerings are Bare metal, Dedicated virtual hosts, Virtual machines, Container engine and Functions. In case of the bare metal, the physical server is provided with no virtualization and the other requirements are taken care by the customer. However in case of dedicated virtual machine hosts, a dedicated machine with virtualization is provided so that all single tenant VMs can be deployed. Next is the virtual machine based offering which is a multi tenant one. This is similar to the earlier hypervisor based offering but is multi tenant , i.e, multiple customer VMs are present on a single physical machine. In case of a container engine, the customer manages the code and App container (container run time which executes containers and manages container images on a node). Finally, functions provides a feature wherein the customer only writes the code and the underlying infrastructure is taken care of by OCI, which follows a consumption based pricing model.

The term compute instance applies to all the offerings from bare metal to VM types mentioned above. The size depends on the workloads i.e, the CPU, RAM etc. The compute instance depends on other services like networking and storage for booting and data access. The instances support vertical (CPU, Memory etc) and horizontal/auto scaling (instance/VM count ). In comparison to the VMs, containers include the application and all its dependencies but share the OS with other containers unlike the VM that have separate OS. Also, containers are not tied to any specific infrastructure like on-premise, public cloud etc and can run anywhere. Oracle Kubernetes Engine is a fully managed, scalable and highly available service that you can use to deploy your containerized apps in OCI. Here you come across terms like pod and node. Pod refers to a group of containers with shared memory in a Fault Domain. Each pod is connected to a node where it is scheduled. These nodes are the actual instances that can be either bare metal or VM type.

Storage

There are multiple offerings here as well, but the choice of a particular type is based on the storage workload requirements. The storage requirements are based on parameters like persistent/non-persistent type, type of data, performance of storage(capacity, IOPS, throughput), durability, connectivity, storage protocol etc. The storage types are block volume, local NVMe, file storage, object storage.

  • Block volume – This is a block storage for compute instances on hard drive in a server except the hard drive is installed on a remote chassis. The 2 types are boot volume(OS) and block volume(data). It is persistent type and highly durable as it replicates the same data in 3 seperate fault domains. There are 3 block volume tiers namely; Basic, Balanced and Higher Performance based on IOPS and throughput. Usage depends on the workloads like the throughput intensive big data, streaming (Basic) to the IO demanding large databases (Higher Performance).
  • local NVMe – This is a directly attached block storage to the instance that is non-persistent and non-durable. This is designed for applications that require high-performance local storage.
  • File storage – Distributed file systems that look like local file systems and are hierarchically structured. It is frequently used as a shared file system storage for compute instances. This storage is persistent and highly durable like the block volume.
  • Object storage – Here, all data are stored in a single, flat structure without a folder hierarchy. Also, unlike block storage, metadata is present for an object that makes it easier to index and access. This is a regional service, not tied to any compute instance and is ideal for storing unlimited amount of unstructured data like images, media files etc. It is also persistent and highly durable like the block and file storage. There are 2 Object storage tiers namely; Standard Storage Tier (Hot) and Archive Storage Tier (Cold). In case of the hot type, data retrieval is instantaneous and can’t be downgraded to archive storage. Whereas the archive type is seldom accessed but must be retained for long periods of time and can’t be upgraded to standard storage.

Networking

Under networking services, Virtual Cloud Networking (VCN) is a software defined private network set up in OCI that is highly available, scalable and secure. Enables OCI resources such as compute instances to securely communicate with internet, other instances or on-premise data centers. The compute instances are placed in subnets that are sub networks within a VCN. The below picture depicts a VCN with subnets, instances and connectivity.

Next let us have a look at the various gateways available that is depicted in the above figure. Starting with the Internet gateway(A) provides a path for network traffic between your public subnet instance in the VCN and internet. NAT gateway(B) provide a secure connection by enabling outbound connections to the internet, but blocks inbound connections initiated from the internet. Dynamic Routing Gateway(C) is a virtual router that provides a path for private traffic between your VCN instance and destinations other than internet like an onpremise data center. Service gateway(D) lets resources in VCN access public OCI services like an Object storage without an internet/NAT gateway. There is security provided within a VCN for the subnets in the form of security list. Security list specifies the types of traffic allowed in and out of the subnet, and applies to the instance communication with another instance in the VCN or a host outside the VCN.

Moving on to the next service, i.e, VCN peering that is a process of connecting multiple VCNs. Here you have the Local VCN peering to connect two VCNs in the same region and Remote VCN peering to connect two VCNs in different regions so that their resources can communicate using private IP addresses. The final networking service, i.e, the load balancer interfaces between the clients and backends by providing benefits like fault tolerance and HA, scale.

Identity and Access Management (IAM)

IAM, Authentication, Authorization and Policies are the topics we shall be looking into. Principal is an IAM entity used to refer to IAM users and Instances that are allowed to interact with OCI resources. In order for a user to access an OCI resource, the person/application should be part of a group, which in turn should have a policy with permission to tenancy or a compartment. Here tenancy refers to a secure and isolated partition within OCI where you can create, organize, and administer your cloud resources. Next is the authentication topic that deals with user identity. OCI IAM service authenticates a Principal by User name and Password, API signing key, Auth Tokens. The various actions that can be performed by an authenticated principal is called authorization. OCI Authorization can be specified by writing policies in the form of allowing a group to access a specific tenancy/account or compartment with conditions if any.

Database

Let me start this service by listing out the various DB options.

  • VM DB systems – Virtual Machine with managed DB instance running.
  • Bare metal DB systems – Running the oracle DB in the bare metal machine.
  • Oracle RAC – Oracle Real Application Clusters enable multiple servers to mount a single database. In the event any computer in the cluster fails, the database continues to provide service on the remaining computers. Oracle DB is available from any of the node in the cluster and provides High Availability.
  • Exadata DB systems – It is a database machine or server using Oracle database software and hardware server equipment and acts as a computing platform for running Oracle Database.
  • Autonomous DB Shared/Dedicated – Autonomous DB is a fully managed database with 2 workload types namely: Autonomous Transaction Processing (ATP), Autonomous Data Warehouse (ADW). Incase of the dedicated type, the user has exclusive access to the Exadata hardware whereas in the shared type the user can only provision and manage the autonomous db while oracle handles the infrastructure. Both the db types support ATP and ADW. Autonomous DB in general is self driving where the DB automatically patches, updates and tunes without human intervention or downtime. It provides encryption by default and protects from system failure or downtime.

DBsystem operations include launch, start, stop or rebooting the BM/VM DB systems. The scaling or patching the BM/VM DB systems is also part of operations task. DB systems backup/restore involves manual or automatic backups to the object storage from the private DB instance through the service gateway. In case of DB systems DR, oracle data guard provides standby databases to enable oracle db to survive disasters and data corruptions. It maintains synchronization between primary and standby db. This has two modes, namely: switchover or planned migration with no data loss and failover or unplanned migration with minimal data loss. The HA and DR can be used together to achieve maximum availability within a single region or across multiple regions.

OCI Security

In case of an on-premise environment, starting from the infrastructure (networking, storage, server, virtualization) to everything running on top of it like the OS, middle ware, runtime, data and application will be handled by the customer. But in OCI shared security model, only the infrastructure is managed by Oracle whereas the rest is customer responsibility. Even here similar to AWS, Oracle is responsible for “security of the cloud” that involves physical security of the data centers, hardware, software and networking. Whereas, customer is responsible for “security in the cloud” that involves patching applications and OS, IAM, Network security, End point protection, data classification and compliance.

Security Services

Under security services, Oracle provides IAM, Data Protection, OS and workload management, Infrastructure protection.

IAM consists of the OCI IAM (as discussed above), Multi Factor Authentication (user authentication by password and additional factor), Federation(federate with a supported identity provider like AD to login to OCI).

In case of data protection, data is either encrypted at rest or in transit for the block volume, file storage with bring your own keys feature. Object storage supports encryption at rest and pre-authenticated requests. The database supports transparent data encryption, data safe (managed service for protection of data on OCI DB) and data vault (prevents administrators from snooping on user data) features to safeguard data.

Moving on to OS and workload management, dedicated VM host provides the security of bare metal combined with ease and flexibility of VMs. Since it is a single tenant, HW is not shared with another customer’s VM. In case of the instances, OS Management Service executes and automates common, complex and critical tasks. Security/compliance reporting feature is also present.

Finally, considering the network protection part of infrastructure, VCN tiered subnet strategy is used ,i.e, DMZ for the load balancers, public sub-net for web servers and private subnet for internal hosts provide necessary protection. Gateways also provide the necessary connectivity related protection. Lastly, the security lists and network security groups provide the necessary traffic related protection. The OCI Web Application Firewall is a cloud based global security service that protects applications from malicious and unwanted internet traffic.

OCI Pricing and Cost Management

There are 3 pricing models, namely :

  • Pay as you go (PAYG) – Charged only for the resource consumed with no upfront commitment.
  • Monthly Flex – A minimum monthly charge and a minimum term commitment is present. However, 33%-60% savings observed compared to PAYG type pricing.
  • Bring Your Own License (BYOL) – The current on-premise Oracle license can be applied to the equivalent highly automated oracle IaaS & PaaS services in the cloud.

Factors that impact pricing are resource size, resource type (VM, BM, Functions etc), data transfer costs (No Ingress cost,Egress cost is conditional), region independent pricing unlike other cloud providers.

Block Volume(BV) pricing uses the formula, storage cost(x BV size) plus the performance cost(x BV size) to get the total pricing. Calculating the data transfer costs, takes into account the transfer between the instances in an AD where the ingress/egress is free. Same with data transfer between ADs in a region. However, data transfer charges vary between regions,i.e, ingress is free but egress is charged. Even the access of internet from an instance follows the same rule as between regions. Data transfer between an instance and on-premise data center using DRG router is free for ingress/egress.

In case of OCI there is a concept called cost tracking tags that can be used to tag resources when created. Benefits include, grasping the spending pattern, filter costs by date, compartment and tags for better cost management. Another feature called the budget is used, where a monthly threshold is defined for your OCI spend that can be set on compartment or cost tracking tags. Email alerts can be set for the budget when the spending reaches a certain threshold.

Free tier service includes $300 free credits for 30days that includes access to a wide range of OCI services. Upto 5TB of storage, 8 instances can be used here. The OCI services that are always free include 2 autonomous databases, 2 OCI compute VMs; block, object and archive storage; Load balancer and data egress; Monitoring and notifications.

OCI SLA and Support

Service Level Agreement (SLA) is a financially backed commitment to provide a minimum level of service to customers. This is usually defined as a number of “nines” for a month and a percentage credit based on tiers and definition. Ex: (99.9%, 10% credit) .

An example tier for an OCI service would be as follows:
Monthly uptime between 99.9 - 99.0% -> Service credit 10%
Monthly uptime between 99.0 - 95.0% -> Service credit 25%
Monthly uptime less than 95.0%      -> Service credit 100%

Among cloud service providers, only Oracle offers end-to-end SLAs covering performance, availability and manageability.

  • Availability applies to data plane where resources are utilized. Incase of OCI functions service, data plane would involve InvokeFunction API. Availability refers to services are in operation with uptime and connectivity commitments.
  • Manageability applies to control plane where resources are administered.In case of OCI functions service, control plane would involve CreateFunction API. Manageability is about manage, monitor and modifying OCI resources.
  • Performance parameter refers to services consistently performing as expected.

Compute and Block Volume services are measured on all the 3 SLAs, whereas other services like File storage, DB Cloud Service, Data Safe etc are measured on the data plane and control plane SLAs. Other services like API gateway, ATP, ADW etc are measured only on the data plane/availability SLA.

OCI Support

Moving on to OCI Support, Oracle provides OCI status dashboard to display all the different OCI services present in all the regions that shows operation status like services running/not running, incident history etc. Notifications are sent when OCI creates or resolves an incident in the form of email, text, rss etc. The dashboard can be accessed using the link here. First time users need to signup for an oracle support account that is different from the oracle cloud infrastructure account and needs to be linked together to get the unique Customer Support Identifier(CSI) number.

Only paid accounts can open service requests. Customers using always free resources are not eligible for OCI support, but free tier account holders with free trial credits are provided limited support.

To register and log support requests; CSI number, Tenancy OCID (Oracle cloud Identifier), Resource OCID would be needed. Once logged in, the paid account holders can open service request for : resolving technical issues (incase cloud customer connect doesn’t provide the answers), resetting/unlocking the password, adding/changing a tenancy administrator, service limit increase.