Google Cloud Spanner – Fundamentals & Best Practices

Cloud Spanner is a highly scalable, globally distributed, ACID-compliant database from Google Cloud Platform. It is a distributed transactional database built for scale. Industry-leading 99.999% availability for multi-regional instances, automatic sharding for optimized performance, fully managed serverless architecture, strong transactional consistency makes Spanner the best database among others.

Let us look at some of the key concepts and best practices in Spanner.

multi-regional

In cloud spanner, the compute and storage are isolated. It relies on the distributed file system called Colossus. To use cloud spanner, we should first create an Instance that includes two important choices (the Instance configuration whether regional or multi-regional and the node count).

Replication:

Cloud spanner relies on the distributed file system which automatically provides byte-level replication. The data is divided into splits (a set of contiguous rows) and each split is replicated across zones (in case of regional configuration) or regions (in case of multi-regional configuration).

Schemas:

Like all relational databases, Spanner also stores the data in tables with the predefined schema. It supports all the concepts like primary keys, joins, and other SQL queries that a traditional RDBMS supports. For defining a relationship between two related tables, the Spanner provides an option to define them as parent-child tables (also called Table Interleaving) so that the child rows are physically stored with the corresponding parent rows.

relational databases

Transactions:

Read Types:

Cloud Spanner supports two read types

  1. Strong read: Read at the current timestamp and is guaranteed to see all the data that has been committed up until the start of this read. This is the default read type
  2. Stale read: Read at a timestamp in the past. If your application is latency-sensitive and tolerant of stale data, this reading type provides better performance

Transaction types:

A transaction in a Cloud Spanner is a set of reads and writes that are executed automically.
Cloud Spanner supports the following transaction modes:

  • Locking read-write: This is the only transaction that supports writing the data. These transactions hold the lock and may abort the applications to retry
  • Read-only: This transaction doesn’t allow writes and provides guaranteed consistency across several reads. These transactions don’t hold any locks
  • Partitioned DML: This transaction type executes DML and used for bulk updates, deletes, and backfilling

External consistency:

Traditional RDBMS provides serializability which means that all transactions
appear as if they are executed in serial order (although they might not be
executed serially). Cloud Spanner offers a stronger guarantee than serializability called “External consistency”. Under external consistency, the system behaves as if all the transactions were executed sequentially, even though Cloud Spanner runs them across multiple servers globally.

Encryption

By default, the data in Cloud Spanner is encrypted with Google-managed default Encryption. If you have specific compliance or regulatory requirements related to the keys that protect your data, you can use customer-managed encryption (CMEK) for Cloud Spanner.

Sessions

A session represents a communication channel with the Cloud Spanner database service which is used to perform transactions (read, write or modify data).

Backup and restore, Point-in-time recovery (PITR)

Cloud Spanner Backup and Restore lets you create backups of the Cloud Spanner databases on demand, and restore them to protect operator and application errors which result in logical data corruption. Cloud Spanner Point-in-time recovery (PITR) protects against accidental deletion and writes. You can recover to a point-in-time up to a maximum of 7 days.

Best Practices:

Bulk loading best practices

  1. Minimize the number of splits that are involved in each written transaction
  2. Maximize the use of partitioning to distribute writing the partitions across worker tasks.
  3. Commit between 1MB to 5MB of mutations at a time

DML best practices

  1. Use a WHERE clause to reduce the scope of lookups
  2. Use Batch DML to send multiple DMLs in a single request
  3. Use a single DML statement with an array of data to insert or delete multiple rows of the same table

Schema design best practices

  1. Choose a primary key to prevent hotspots
  2. Use descending order for timestamp-based keys
  3. Use an interleaved index on a column whose value monotonically increases or decreases

SQL best practices

  1. Use parameterized queries to speed up frequently executed queries
  2. Use secondary indexes to speed up common queries
  3. Write efficient queries for range key lookup
  4. Write efficient queries for joins using join directives.
  5. Avoid large reads inside the read-write transactions.
  6. Use STARTS_WITH instead of LIKE to speed up parameterized queries

Thought Leadership