Azure Table Storage Interview Questions
Published on

Azure Table Storage Interview Questions

Authors

What are the main characteristics of Azure table storage?

  • Tables are independent of each other.
  • Features like foreign keys join, and custom indexes don’t exist.
  • Only Partition & Row Key are indexed
  • Table schemas are flexible. That is, it’s not mandatory to have all fields on all records.

What are the major differences between Azure Table & Cosmos DB?

Azure Table StorageCosmos DB
LatencyAzure Table storage there is no upper bound on the latency of your operations.Cosmos DB limits read/write latency to under 10 milliseconds.
ThroughputLimited to 20k operations per secondThroughput is supported for up to 10 million operations per second
Global distribution

Azure Table in a single region with a secondary, read-only region for increased availability.

Cosmos DB you can distribute your data across up to 30 regions. Automatic, global failover is included and you can choose between five consistency levels for your desired combination of throughput, latency, and availability.

FunctionalityLimitedSuperset of Azure Table Storage functionality.
Billing

Table storage is determined by your storage volume use. Pricing is per GB and is affected by your selected redundancy level. The more GB you use, the cheaper your pricing. You are also charged according to the number of operations you perform, per 10k operations.

Cosmos DB is determined by the number of throughput request units (RUs). Your database is provisioned in increments of 100RU per second and you are billed hourly for any units used. You are also billed for storage per GB at a higher rate than Table storage.

IndexingOnly primary index on PartitionKey and RowKey. No secondary indexes.Automatic and complete indexing on all properties, no index management.

How to reduce data transfer costs between Azure resources with Table Storage?

To minimize latency, you should try to place your client and database in the same region in Azure. This has the added benefit of eliminating bandwidth costs since data transfers within a region are free.

What do you mean by Denormalization, why is it important for Table Storage?

Unlike working with relational databases, the proven practices for efficiently querying table data lead to denormalizing your data. That is, duplicating the same data in multiple entities (one for each key you may use to find the data) to minimize the number of entities that a query must scan to find the data the client needs, rather than having to scan large numbers of entities to find the data your application needs.

What are the Azure Table design patterns?

These patterns are recommended by Azure (Microsoft) for working with Table storage efficiently.

https://docs.microsoft.com/en-us/azure/storage/tables/table-storage-design-patterns

  • Intra-partition secondary index pattern
  • Inter-partition secondary index pattern
  • Eventually consistent transactions pattern
  • Index entities pattern
  • Denormalization pattern
  • Compound key pattern
  • Log tail pattern
  • High volume delete pattern
  • Data series pattern
  • Wide entities pattern
  • Large entities pattern
  • Prepend/append anti-pattern
  • Log data anti-pattern

Explain entity group transactions in the Azure table storage?

Entity Group transaction is similar to the atomic transaction concept in SQL Server. Entity Group transactions can be performed on a single partition. With this feature, we can perform multiple CRUD operations on entities within a single partition in a single batch operation. That means these operations can be performed on entities having the same partition key within a single table.

You choose a partition key to ensure atomic transactions are possible on a set of entities. Let us consider the Customer table where customer details and his invoice details are stored together. To support transactions that span the header and detail records of the invoice a partition key that includes the CustomerID will help to perform entity group transactions. In an Entity Group transaction, we can perform one operation against a single entity. And this rule is enforced by Azure Table Storage.

If an operation on an entity in an Entity Transaction fails, the entire transaction is rolled back. A transaction can include 100 entities at the maximum, and its total size should not exceed more than 4 MB in size. In a real-time situation when multiple updates are happening, other queries see the data which is successfully committed.

What are the benefits of Entity group transactions in the Azure table storage in terms of cost?

Every transaction is charged by azure. So, it makes sense to include multiple operations on entities in a single partition into one Entity group transaction. When we do this Azure treats the Entity Group Transaction as a single transaction and applies the charge for it.

Explain about PartitionKey in Azure Table Storage?

Tables are partitioned to support load balancing across storage nodes. A table’s entities are organized by partition. A partition is a consecutive range of entities possessing the same partition key value. The partition key is a unique identifier for the partition within a given table, specified by the PartitionKey property. The partition key forms the first part of an entity’s primary key and may be a string value up to 1 KiB in size.

You must include the PartitionKey property in every insert, update, and delete operation.

A table comprises one or more partitions, and many of the design decisions you make will be around choosing a suitable PartitionKey and RowKey to optimize your solution.

Partitions define a scope for transactions i.e. Entity Group Transactions.

Explain about RowKey in Azure Table Storage?

The second part of the primary key is the row key, specified by the RowKey property. The row key is a unique identifier for an entity within a given partition. Together with the PartitionKey and RowKey uniquely identify every entity within a table. The row key is a string value that may be up to 1 KiB in size.

Explain about Timestamp property in Azure Table Entity?

The Timestamp property is a DateTime value that is maintained on the server-side to record the time an entity was last modified.

The Table service uses the Timestamp property internally to provide optimistic concurrency. The value of the Timestamp property for an entity advances each time the entity is modified. This property should not be set on insert or update operations

What is the maximum size of an entity in Azure table storage?

1 MB

What is the maximum number of properties in an entity of Azure table storage?

255 properties (Includes 3 system properties:- PartitionKey, RowKey, and Timestamp)

What are the data types supported by Azure table storage?

Byte array, Boolean, DateTime, Double, GUID, Int32, Int64, and String (up to 64KB in size)

Does Azure Table Storage Support Objects and Arrays?

Azure table storage does not support relational tables. Furthermore, ATS does not support storing of strongly typed child objects as a part of parent entities. ATS is key-value entity-based table storage. It only supports basic data types like string, date, double, boolean, etc.

If you want to store complex objects in ATS (complex, meaning objects that contain other objects), it is suggested that you should serialize the child objects as strings rather than objects, when storing the data and de-serialize the strings back into objects during retrieval.

Does Azure keep Table Storage entries sorted in order?

Entities are sorted in ascending order based on PartitionKey and then by RowKey. These keys are string values and to ensure that numeric values sort correctly, you should convert them to a fixed length and pad them with zeroes.

How to encrypt Azure Table Storage?

Azure Storage encrypts all data in a storage account at rest. By default, Queue storage and Table storage use a key that is scoped to the service and managed by Microsoft.

Explain about Last Sync property for Storage account?

Geo-redundant storage (GRS) and geo-zone-redundant storage (GZRS) both replicate your data asynchronously to a secondary region. Because geo-replication is asynchronous, it is possible that data written to the primary region has not yet been written to the secondary region at the time an outage occurs.

The Last Sync Time property indicates the last time that data from the primary region was written successfully to the secondary region. Writes made to the primary region after the last sync time property may or may not be available for reads yet.

The Last Sync Time property is a GMT date/time value.