Azure Cosmos DB Interview Questions
Published on

Azure Cosmos DB Interview Questions

Authors

Azure Cosmos DB is a fully managed NoSQL database for modern app development. Single-digit millisecond response times, and automatic and instant scalability, guarantee the speed at any scale.

Azure Cosmos DB takes database administration off your hands with automatic management, updates, and patching as a fully managed service. It also handles capacity management with cost-effective serverless and automatic scaling options that respond to application needs to match capacity with demand.

Table of Contents

What are the advantages of NoSQL databases in comparison to SQL Databases?

  • High throughput – obvious challenges when maintaining a relational database system is that most relational engines apply locks and latches to enforce strict ACID semantics. This approach has benefits in terms of ensuring a consistent data state within the database. However, there are severe tradeoffs concerning concurrency, latency, and availability. Distributed databases can offer a more scalable solution. If your transactional volumes are reaching extreme levels, such as thousands of transactions per second, you should consider a distributed NoSQL database.

  • Hierarchical data – Relationships can grow significantly over time and prove difficult to manage in SQL databases but is very simple to implement with NoSQL/Document databases. Document-oriented databases nonetheless coalesce much better with object-oriented approaches.

  • Fluid schema - Another characteristic of relational databases is that schemas must be defined at design time. Suppose you are managing data whose structures are constantly changing at a high rate. In that case, mainly if transactions can come from external sources where it is challenging to enforce conformity across the database, you may want to consider a more schema-agnostic approach using a managed NoSQL database service like Azure Cosmos DB.

What are the disadvantages of NoSQL Databases?

A best practice approach in a NoSQL document database would be to denormalize the category name and tag names directly in a "product document". However, to keep categories, tags, and products in sync, the design options to facilitate this have added maintenance complexity because the data is duplicated across multiple records in the product, rather than being a simple update in a "one-to-many" relationship, and a join to retrieve the data.

The tradeoff is that reads are more efficient in the denormalized record and become increasingly more efficient as the number of conceptually joined entities increases. However, just as the read efficiency increases with increasing numbers of joined entities in a denormalize record, so too does the maintenance complexity of keeping entities in sync.

Explain about Global Distribution & Multi-Region writes in Cosmos DB?

Azure Cosmos DB is a globally distributed database system that allows you to read and write data from the local replicas of your database. In addition, Azure Cosmos DB transparently replicates the data to all the regions associated with your Cosmos account.

To lower the latency, place the data close to where your users are. Choosing the required regions depends on the global reach of your application and where your users are located.

How many different APIs are available for Azure Cosmos DB?

There are five different APIs available in Azure Cosmos DB.

  • Core SQL API
  • MongoDB API
  • Cassandra API
  • Gremlin API
  • Table API

Explain differences between Standard, Autoscale & Serverless for Provisioned throughput?

StandardAutoscaleServerless
Best suited forWorkloads with steady or predictable trafficWorkloads with variable or unpredictable traffic.

Workloads with intermittent or unreliable traffic and low average-to-peak traffic ratio

Geo-distributionAvailable (unlimited number of Azure regions)Available (unlimited number of Azure regions)Unavailable (Serverless accounts can only run in 1 Azure region)
Billing model

Billing is done on a per-hour basis for the RU/s provisioned, regardless of how many RUs were consumed.

Billing is done per hour for the highest RU/s the system scaled to in the hour.

Billing is done on a per-hour basis for the number of RUs consumed by your database operations.

Maximum storage per containerUnlimitedUnlimited50GB

Explain Logical partitioning?

A logical partition (LPAR) is a subset of a computer’s hardware resources, virtualized as a separate computer. In effect, a physical machine can be partitioned into multiple logical partitions, each hosting a particular instance of an operating system.

In Cosmos DB, Logical partitions are formed based on the value of a partition is associated with each item in a container. Thus, all the items in a logical partition have the same partition key value.

A logical partition also defines the scope of a database transaction, and you can update items within a logical partition by using a transaction with snapshot isolation. Also, there is no limit to the number of logical partitions in your container.

Each logical partition can store up to 20GB of data.

Explain Physical partitioning?

Unlike logical partitions, physical partitions are an internal system implementation, and Azure Cosmos DB entirely manages them. Therefore, the 10,000 RU/s limit for physical partitions implies that logical partitions also have a 10,000 RU/s limit, as each logical partition is only mapped to one physical partition.

There is no limit to the total number of physical partitions in your container. Azure Cosmos DB will automatically create new physical partitions by splitting existing ones as your provisioned throughput or data size grows. Physical partition splits do not impact your application’s availability.

Explain Replica sets?

Each physical partition consists of a set of replicas also referred to as a replica set. Each replica set hosts an instance of the database engine. A replica set makes the data stored within the physical partition durable, highly available, and consistent.

Typically, smaller containers only require a single physical partition, but they will still have at least 4 replicas.

Explain stored procedures in Azure Cosmos DB?

Stored procedures are written using JavaScript, and they can create, update, read, query, and delete items inside an Azure Cosmos container. Stored procedures are registered per collection and can operate on any document or an attachment present in that collection.

var helloWorldStoredProc = {
  id: 'helloWorld',
  serverScript: function () {
    var context = getContext()
    var response = context.getResponse()

    response.setBody('Hello, World')
  },
}

The context object provides access to all operations that can be performed in Azure Cosmos DB, as well as access to the request and response objects. In this case, you use the response object to set the body of the response to be sent back to the client.

How do you implement transactions using stored procedures in Azure Cosmos DB?

You can implement transactions on items within a container by using a stored procedure. If any errors are encountered along the way, the stored procedure throws a JavaScript exception that implicitly aborts the transaction.

How is the Request unit calculated in Azure Cosmos DB?

The cost of all database operations is normalized by Azure Cosmos DB and is expressed by Request Units (or RUs, for short). Request unit is a performance currency abstracting the system resources such as CPU, IOPS, and memory required to perform the database operations supported by Azure Cosmos DB.

No matter which API you use to interact with your Azure Cosmos container, costs are always measured by RUs. Whether the database operation is a write, point read, or query, costs are always measured in RUs.

Azure Cosmos DB ensures that the number of RUs for a given database operation over a given dataset is deterministic to manage and plan capacity. You can examine the response header to track the number of RUs consumed by any database operation. As a result, you can run your application cost-effectively when you understand the factors that affect RU charges and your application’s throughput requirements.

Azure Synapse Link for Azure Cosmos DB is a cloud-native hybrid transactional and analytical processing (HTAP) capability that enables you to run near real-time analytics over operational data in Azure Cosmos DB. Azure Synapse Link creates a tight seamless integration between Azure Cosmos DB and Azure Synapse Analytics.

The below are some of the advantages of Azure Synapse:-

  • Direct access to Azure Cosmos DB analytical store. Any updates made to the operational data are visible in the analytical store in near real-time with no ETL or change feed jobs.
  • No impact on operational workloads. With Azure Synapse Link, you can run analytical queries against an Azure Cosmos DB analytical store (a separate column store)
  • Azure Cosmos DB analytical store is optimized to provide scalability, elasticity, and performance for analytical workloads without dependency on the compute run-times.
  • It is Cost-effective. It eliminates the extra layers of storage and computes required in traditional ETL pipelines for analyzing operational data.

Explain Consistency levels in Azure Cosmos DB?

Distributed databases that rely on replication for high availability, low latency, or both must make a fundamental tradeoff between the read consistency, availability, latency, and throughput as defined by the PACLC theorem.

Most commercially available distributed NoSQL databases available in the market today provide only solid and eventual consistency. Azure Cosmos DB offers five well-defined levels. From strongest to weakest.

  1. Strong consistency
  2. Bounded staleness consistency
  3. Session consistency
  4. Consistent prefix consistency
  5. Eventual consistency

The consistency levels are region-agnostic and guaranteed to work for all operations regardless of the region where reads and writes are served or whether your account is configured with a single or multiple write regions.

Consistency LevelQuorum ReadsQuorum Writes
StrongLocal MinorityGlobal Majority
Bounded StalenessLocal MinorityLocal Majority
SessionSingle Replica (using session token)Local Majority
Consistent PrefixSingle ReplicaLocal Majority
EventualSingle ReplicaLocal Majority

Can we manage the consistency level for Azure Cosmos DB?

You can configure the default consistency level on your Azure Cosmos account at any time. The default consistency level configured on your account applies to all Azure Cosmos databases and containers under that account.

Can client/request override default consistency level for Azure Cosmos DB?

All reads and queries issued against a container or a database use the specified consistency level by default, or you can also relax the consistency levels for individual queries.

Can we have multiple write regions in Azure Cosmos DB?

Yes, we can have multiple write regions enabled in Cosmos DB. However, Cosmos accounts configured with multiple write regions cannot be configured for strong consistency as a distributed system can’t provide an RPO of zero and an RTO of zero.

Explain conflict resolution policies for Azure Cosmos DB?

With multi-region writes, when multiple clients write to the same item, conflicts may occur. When a conflict occurs, you can resolve the conflict by using different conflict resolution policies.

  1. Last writer wins conflict resolution policy – As the name suggest last written record is final
  2. Custom conflict resolution policy – We can write our resolver function using Javascript.

What is OSS NoSQL Database?

Operations Support System NoSQL Databases. They are listed below:

  1. MongoDB
  2. Redis
  3. Couch DB
  4. Memcache DB
  5. Others like RavenDB, Neo4j, etc.

Explain the Total cost of ownership of Azure Cosmos DB?

The serverless provisioning model of Azure Cosmos DB eliminates the need to over-provision the database infrastructure. Azure Cosmos DB resources are provided without any need for specialized configurations or licensing. As a result, the Azure Cosmos DB-backed applications can run with a 70 percent Total cost of ownership savings compared to OSS NoSQL databases. Other benefits are listed below:-

  • Great value for the price
  • No NoSQL DevOps administration is required
  • Ability to elastically scale
  • Economies of scale
  • Optimized for the cloud
  • You pay by the hour
  • You automatically get high availability

Can we have strong consistency for multi-region write Azure Cosmos DB?

Cosmos accounts configured with multiple write regions cannot be configured for strong consistency as a distributed system can’t provide an RPO of zero and an RTO of zero. Additionally, there are no write latency benefits on using strong consistency with multiple write regions. This is because a write to any region must be replicated and committed to all configured regions within the account, which results in the same write latency as a single write region account.

What is the recovery time objective?

The recovery time objective (RTO) is the amount of real-time a business has to restore its processes at an acceptable service level after a disaster to avoid intolerable consequences associated with the disruption.

What is the recovery point objective?

Recovery Point Objective (RPO) is a measure of how frequently you take backups. The RPO indicates the amount of data (updated or created) that will be lost or need to be re-entered after an outage.

Is it possible to have an RPO & RTO of zero?

It is important to note that in a distributed system, even with strong consistency, it is impossible to have a distributed database with an RPO and RTO of zero due to CAP Theorem.

How many replicas of data are stored within a region for Azure Cosmos DB?

Azure Cosmos DB maintains 4 replicas of data within a region. Also, Azure Cosmos DB replicates data across regions configured within a Cosmos DB account, providing High availability.

How many RU’s are consumed while reading Strong/Bounded staleness vs. other consistency levels?

The RU/s cost of reads for Local Minority reads is twice that of weaker consistency levels because reads are made from two replicas to provide consistency guarantees for Strong and Bounded Staleness.

What are the differences between Azure Cosmos DB serverless vs. provisioned?

  • A serverless account can only run in a single Azure region. It is not possible to add additional Azure regions to a serverless account after you create it.
  • Serverless containers can store a maximum of 50 GB of data and indexes.
  • Service Level Objective (SLO) of 10 milliseconds or less for point-reads and 30 milliseconds or less for writes.

What is the time limit for Stored Procedures, Triggers, User Defined Functions?

The timeout limit of 5 seconds applies to JavaScript functions – stored procedures, triggers, and user-defined functions. If an operation does not complete within that time limit, the transaction is rolled back.

What are the different types of backups in Azure Cosmos DB?

  • Periodic Backup Mode – This mode is the default backup mode for all existing accounts. Backup is taken at a regular interval, and the data is restored by creating a request with the support team. The minimum backup interval can be one hour.

  • Continuous backup mode – You choose this mode while creating the Azure Cosmos DB account. This mode allows you to restore to any point of time within the last 30 days. However, the restore window is only 30 days, so the window cannot be changed.

Which APIs does Continuous backup mode support?

Only Azure Cosmos DB APIs for SQL and MongoDB are supported for continuous backup. Cassandra, Table, and Gremlin APIs are not yet supported.

Is Continuous backup mode supported for Multi-region writes?

No, it only supports a single write region.

What are the constraints or limits for Serverless Cosmos DB?

ResourceLimit
Maximum RU/s per container5,000
Maximum storage across all items per (logical) partition20 GB
Maximum number of distinct (logical) partition keysUnlimited
Maximum storage per container50 GB

Explain the Change feed processor in Azure Cosmos DB?

The change feed processor is part of the Azure Cosmos DB SDK V3. It simplifies the process of reading the change feed and distributes the event processing across multiple consumers effectively.

The main benefit of the change feed processor library is its fault-tolerant behavior that assures an "at-least-once" delivery of all the events in the change feed.

Elaborate on the monitored containers?

The monitored container has the data from which the change feed is generated. Any inserts and updates to the monitored container are reflected in the change feed of the container.

Elaborate on the lease containers?

The lease container acts as state storage and coordinates processing the change feed across multiple workers. The lease container can be stored in the same account as the monitored container or in a separate account.

Explain the concept of foreign keys in Azure Cosmos DB?

Because there is currently no concept of a constraint, foreign-key or otherwise, any inter-document relationships you have in documents are effectively "weak links," and the database will not verify it automatically. If you want to ensure that the data the document is referring to exists, then you need to do this in your application or through the use of server-side triggers or stored procedures on Azure Cosmos DB.

What is the synthetic partition key in Azure Cosmos DB?

The best practice is to have a partition key with many distinct values, such as hundreds or thousands. The goal is to distribute your data and workload evenly across the items associated with these partition key values. If such a property doesn’t exist in your data, you can construct a synthetic partition key.