Properties of a Database

The increase in data led to the demand for databases, and the growing data volume necessitated database systems and cluster setups.

Β·

8 min read

Welcome Babe!

Hmm... What's a poetic maniac without a welcoming poem?

Babe,
I am humbled and inspired
To be loved this much,
So much that this series
Is consumed by an engineer
With count(hexes) < mine.
In all reverence, I will
Ensure this is a learning
Experience to forever
Remember!

🀐
I will keep this welcome section shut in future articles for this series.

With welcoming poetic vibes emanating from your glowing, hungry eyes, I suppose it's time to kick-start this party.

What This Series Is

This series aims to uncover layers of understanding often overlooked by junior backend engineers. It won't cover everything about databases or serve as a step-by-step guide.

Topics include database properties, relational design, SQL's fantastic four (DQL, DML, DDL, and DCL), database development lifecycle, normalization in logical design, connecting databases to web apps (using Node.js, Python, and C across multiple articles), ORM workings (developing a not-so-basic ORM library in Node.js), and security in deploying database-reliant applications.

The series, from a writer's perspective, spans one year, with the goal of continual learning. While not an expert, I'm cursed with understanding the 'why.' This is the curse I wish to share.

I aim to keep each article at less than or equal to 750 words. In fact, this article will be the longest of all the articles in this series because it is the first.

πŸ‘‰
This series won't cover managing and working with databases in the cloud; perhaps, a future series will delve into that.

Overview

This article focuses on what I consider the most crucial properties of any database. Consequently, we won't explore database history or delve into vendor-specific details of database management systems.

By delving into these four database properties, we aim to gain a better understanding of the underlying details about databases.

πŸ“’
It's worth noting that this article draws inspiration from a section in the first chapter of the book "Databases: A Beginner's Guide" written by Andy Oppel.

These four properties include:

  1. A database management system (acronym for DBMS)

  2. Layers of data abstraction

  3. Physical data independence

  4. Logical data independence

Why These Properties?

By definition, a database varies among vendors such as Oracle, Microsoft, SAP, and IBM. Assigning a single definition to "database" can lead to confusion when dealing with different vendors.

This disparity arises from how data is stored and managed in the database systems of these vendors. Various algorithms, data structures, and architectures are employed in creating these systems. If you've been in tech for a while, you'll understand that many technologies develop terminologies and concepts based on their specific implementations.

To comprehend what constitutes a database and distinguish it from a file, it's essential to be acquainted with the properties that define databases.

With your question answered, let's look briefly into each of these properties to satisfy our hex for deeper learning.

The Database Management System

This is software provided by a database vendor that furnishes all the essential services needed to organize and maintain the database, including:

  • Allows users to define the database structure, such as creating, altering, and deleting tables and establishing relationships, using a data definition language (DDL).

  • Enables users to interact with the data, including querying, updating, and deleting records, using a data manipulation language (DML).

  • Provides mechanisms for extracting information from the database using queries.

  • Ensures the integrity of the database by managing transactions, allowing users to commit or rollback changes.

    πŸ—£
    We will talk more about transactions and how they improve database performance and ensure database integrity in a future article.
  • Manages multiple users accessing the database concurrently to prevent data inconsistencies.

  • Implements access controls to safeguard the database from unauthorized access and ensures data privacy. A data control language is also provided here to give users (often administrators or developers) control over control πŸ˜‚.

  • Optimizes database performance through indexing, query optimization, and other techniques.

As you can see, there isn't much to say about this property of a database. You can easily differentiate between a file and a database if that database has a DBMS to manage it.

❓
Have you come across a database lacking a DBMS provided by its vendor for management?

Layers of Data Abstraction

The ANSI/SPARC architecture defined three (3) layers of data abstraction which include:

  1. Physical Layer

  2. Logical Layer

  3. External Layer

Let's look into each layer briefly...

Database layers of abstraction


The Physical Layer

This layer contains the data files that hold all the data for the database. Most DBMSs today store a database across multiple data files, spanning multiple physical disk drives. This enables them to accommodate many concurrent users and maximize performance, as the disk drives can work in parallel.

It is crucial to note that the physical layer is usually abstracted away from you as a developer. Therefore, you do not need to know how data is stored in these files or which file contains the data item of interest.

❔
Until now, were you aware of the existence of this layer?
Who works with the physical layer?
The management of the physical layer is typically handled by database administrators (DBAs) or system architects who specialize in the infrastructure and storage aspects of the database system.

The DBMS collaborates with the computer's operating system to automatically manage the data files, handling all file opening, closing, reading, and writing operations. A database user should not need to know the location of these data files on the file system.

The Logical Layer

This is the first of the two layers of abstraction.
I noted that whenever I mention this to someone, they get confused.The physical layer, while considered a layer of data abstraction, doesn't abstract anything away; it deals directly with the concrete storage and retrieval of data. The first layer that introduces abstraction is the logical layer. Here, the organization and structure of data are abstracted from the physical implementation, allowing for a conceptual view that simplifies interaction for users and developers.

At this layer, the DBMS constructs abstract data structures from the data stored in the physical layer's data files into a unified structure known as the schema. The schema represents a collection of data items stored in a specific database. The types of data structures or database objects that this schema can contain depend on the specific DBMS.

For instance, a relational database management system may include a set of two-dimensional tables.

Future articles will extensively explore this layer.

The External Layer

This is the second layer of abstraction within the ANSI/SPARC architecture, elucidating the segment of the database pertinent to a specific user or application. This layer introduces the concept of User Views, offering a tailored perspective tailored to individual user requirements or application needs.

One unique ability of a database is presenting multiple users with their own distinct views of the data while storing the underlying data only once. These views are collectively called user views.

The user views can be predefined and stored in the database for reuse, or they can be temporary items that the DBMS generates to hold the results of a single ad hoc database query until they are no longer needed by the database user.

πŸ’‘
Ad hoc is a Latin word that means "For this," which can be re-constructed as "for this purpose only". What is an ad hoc query?

You will notice from the architectural diagram above that there are data independence at each layer of abstraction. That is the objective of the ANSI/SPARC architecture.

What is Data Independence?

This means that the upper layers of data abstraction are isolated from changes to the lower layers. There are two types of data independence:

  1. Physical data independence

  2. Logical data independence

Physical Data Independence

This is the ability to alter the physical file structure of a database without disrupting existing users and processes.

In essence, it means that changes to the underlying storage system, such as adding or removing data files, can be made seamlessly without impacting the ongoing functionality of the database or affecting the applications and processes relying on it.

This flexibility allows for efficient database maintenance and optimization, ensuring that modifications to the physical structure can be performed with minimal or no downtime.

πŸ“’
Every DBMS has physical data independence, but some exhibit more or less of this independence than others. This is typically quantified as the degree of physical data independence, defined by how much change can be made to the file system without affecting the logical layer.

Logical Data Independence

This is the ability to make changes to the logical layer without disrupting existing users and processes.

Like physical data independence, every DBMS has logical data independence, which can be measured using the degree of logical data independence.

πŸ’‘
Note that most logical changes also involve a physical change.

Conclusion

As I ponder the number of words I've written to bring this article to this point, I reiterate my commitment that future articles will be 750 words or less.

Hopefully, you have gained valuable insights from this article and are eager for future content. If you have any questions, feel free to leave them in the comments. I (or other readers) will address them accordingly.

Savor the memories and reap the benefits that come with a comprehensive understanding of database properties in both your life and career.

Β