Behind Relational Databases

You've worked with SQL Server, but what's working behind the scenes? This new series delves into the processes happening on the back end.

What's a database? OK, maybe we're starting with a very basic question. But, it might be worth examining the details of exactly what a database is and what it is supposed to do. There's a lot of talk about databases and database-related technology in the IT industry today.

For the most part, this is for good reasons: First and foremost, the lifeblood of many large companies is their information. Althought that information may be stored and managed in many different ways, it's extremely important that data be stored and managed properly. The information itself may range from a list of employees to an e-commerce application that tracks sales. Next, many businesses rely on the availability of their computing systems in order to conduct business. Think about the results of a large e-commerce site's data becoming unavailable, even only for a few minutes (especially during the all-important holiday season).

In this article, I'll focus on technical information that pertains to Microsoft's SQL Server 2000. However, due to the high-level nature of what I'm presenting, almost all of the information in this article also pertains to all major relational database platforms.

So, we can probably agree that databases are important. With that in mind, let's dive into the details of what they're really supposed to do and how they do it. I've written this article to serve as an introduction to relational database terms and concepts. However, I think that even "intermediate" level database developers and administrators could use a refresher on some of these concepts. (I'm always surprised by the number of database "experts" that have trouble trying to explain what a "view" really is!)

Features of a Database Server
Let's start by talking about the fundamentals of what a database server actually does. There are several features that are required in a database platform. All of these features lend themselves to the main goal of database servers: to store and manage data. Some of these requirements include:

  • Information accessibility — The most important data in the world is useless if people can't access it. The database server must provide interfaces for accessing the information in a usable format. In the current relational database world, a standard known as the Structured Query Language, or SQL (pronounced "sequel") is often used.

  • Reliability — The database server must be available when it is needed, and the data within it must remain consistent (more on this soon).

  • Security — The database server platform must allow administrators and users to ensure that only authorized individuals can view specific information. In some cases, the level of security required for business reasons is very granular.

  • Performance and Scalability — A database server must be able to complete transactions and requests for information quickly. Furthermore, the database platform must be able to scale to the number of transactions that a business expects to support.

  • Data protection — Database administrators must be able to backup and restore data quickly and easily, based on business requirements. · Ease of use and administration: Database systems should be easy to manage. If a database server requires hundreds of people to administer, it's clearly not meeting it's goals.

Of course, there are many more requirements that you might have based on your specific business purposes; this is only an overview of what I see as some of the most important features of database servers.

Based on this overview, let's move on to look at how "transactions" should work.

Database Transactions
One of the most important facets of database transactions is that they remain valid and consistent — that is, the information that you plan to add, update or change must do so in the manner that you expected. This may be a small task if you're the only on writing data to a specific system. But imagine the other end of the spectrum: thousands of concurrent users that may choose to update the same information at the same time (or close to it). How should this be handled? If used properly, relational database transactions can take care of a lot of the dirty work for you.

Transactions are SQL commands that are not necessarily related but must execute in an all-or-nothing fashion. Let's look at a common example. Assume that you want to carry out a financial transaction between two different banks. The basic operation should subtract money from User A's account and add the money to User B's account. For obvious reasons, you don't want one transaction to occur without the other (or you'll have made someone either very happy or very upset). To avoid this potential problem, you could combine both operations into a single transaction. If an error occurs during the transaction (for example, one of the two servers is unavailable), neither of the two transactions is performed. Another example in applications is when you're updating information such as a customer's address. You don't want an operation to update information regarding her street address without also updating her ZIP code. Again, this is a perfect place for a transaction.

In order to be safe, transactions must pass the "ACID" test, having all four of these properties:

  • Atomicity — Each transaction is represented by a single all-or-nothing operation: All steps are carried out or the entire process is aborted.
  • Consistency — No data changes should violate the structure and design of the database.
  • Independence — Transactions are not aware that other transactions are being run at the same time.
  • Durability — If an error occurs during the processing of a transaction, partial transactions should be reversed.

An Overview of Database Objects
Before diving too far into the technical guts of a relational database management system (RDBMS), let's look at the common types of database objects they support. These are the structures related to data with which users interact. The purpose of a database server is to manage these objects and the information they store and represent. Database objects supported in SQL Server 2000 are described next.

The fundamental unit of data storage in a relational database is the table. A table stores information in rows. In turn, each row contains values for one or more columns that specify related information about a single data item.

Tables are the structures in which actual information resides. Tables generally refer to a single logical entity. For example, I might create a table called "Employee." Within the Employee table, I might have rows that include information related to each employee ("First Name," "Last Name," etc.). I can have many rows, each of which represents one instance of the entity that the table describes (in this case an employee). Table 1 shows the structure of a simple database table. Each row in the table refers to an instance of the item in the table (in this case, a specific employee) and the columns refer to pieces of information about that instance.

First Name Middle Initial LastName Department Phone
1 Anil K Desai Engineering 0937
2 Jane User Marketing 1554
3 Bob Manager 1152
Table 1. A sample table containing information about employees.

In SQL Server, tables are built within a database (which, in turn, resides on a data file). We'll look at the actual storage concepts related to table data later. For now, know that the majority of user-related database functions will involve the use of information stored in tables in one way or another. These are also the fundamental objects that you'll be responsible for protecting in your backup and recovery plans.

Views are database objects that refer to data stored in one or more tables. Views are defined by SQL queries (which are commands that retrieve data; I cover this later). SQL queries specify the information to be returned to a user. Users interact with views in the same way that they interact with tables. However, views don't store data; instead, they retrieve relational information from tables (see Figure 1).

Figure 1. Using a view to access database information.
Figure 1. Using a view to access database information.

Views are used in several types of scenarios and can be helpful in managing information. One benefit is security. You can create a view of a table that allows users to see only a subset of the information stored in one or more tables. This is useful, for example, if you have an employees table that contains sensitive information. You might create a view that allows users to query against all information except salary. Furthermore, you might restrict permissions on the underlying database itself and assign permissions to the view only.

Additionally, views are useful for encapsulating business logic. By storing commonly used queries as views, you can reduce the chance that a developer will make an error in retrieving data. Views can even refer to other views, although this practice can sometimes make it difficult to debug any problems that crop up.

Overall, though, views enable you to simplify administration, increase manageability, and improve security.

Indexes are database objects that store a subset of a table's columns. They speed data searches by minimizing the amount of information that must be searched by the database server engine. Indexes works like the index of a book. Instead of flipping through all of the pages, you use the index to look up simple keywords. When you find what you're looking for, you're referred to a page number that contains the detailed information. Searching through the index is much quicker than searching through a book page by page.

Indexes can reduce the amount of time it takes the database server to find the information a user requests. Suppose you often run queries that search for employees based on their last names. However, your database table for employee information actually contains much more information (for example, address, manager's name, job title, department, etc). If an index is placed on the First Name and Last Name columns of this table, SQL Server won't have to search through all the information in the table to gain this information. Instead, it will search the index (in this case, a list of all of the Last Names) and then go to the pertinent rows for the remainder of the data.

A query that refers only to indexed columns is often called a covered query. If details on those rows are required, the index will point to the appropriate data storage areas. Placing the proper indexes on database objects is extremely important for performance reasons. It's too complicated a topic to cover completely in this article.

Stored Procedures
Stored procedures are simply named Transact-SQL statements or transactions that are stored within a database as an object. They contain procedural code that can be executed on demand. There are several benefits to using stored procedures versus performing the same queries manually. Perhaps the most important benefit is the dramatic speed increase that stored procedures can cause.

Stored procedures execute much more quickly than the same statements that execute on an ad-hoc basis. The main reason for this performance increase is that the database engine stores a predetermined optimal data recovery plan in cache memory. This is also one reason that yo'll get a much quicker response time from a query the second time you run it (provided that the data pages haven't yet expired from the cache due to other activities).

Stored procedures offer functionality improvements and allow you to perform operations that are otherwise difficult (or impossible). For example, you can loop through the records in a table one by one and take actions based on a complex algorithm that fits your business rules.

There are also security benefits to using stored procedures. Like views, stored procedures can be used to hide from users the underlying database objects that are being affected. This helps you restrict the actions that users can perform directly on database objects, and prevents you from having to rely on (and manage) complicated permissions structures.

Stored procedures also allow for better management of underlying table structures. For example, if you want to change an employee's address information in several tables at once, you can create a single stored procedure that will make sure that all the necessary operations are carried out.

Additionally, stored procedures can call other stored procedures, allowing the development of modularly coded business rules and SQL statements.

In some cases, you'll want to take some action every time data in a table is accessed or modified. Triggers allow you to automatically fire a SQL statement whenever users execute commands that access a table. The statement can then execute one or more other SQL statements that modify data or perform validity checks on the operation. For example, a trigger may be used to automatically delete all employee time sheet data whenever an employee is deleted from the database. The trigger can access both "before" and "after" images of the data when it is executed.

There are many more database objects and constructs - we just don't have enough room to discuss them all here. For more information, be sure to consult database-related resources. For example, the Books Online that are installed with Microsoft's SQL Server 7.0 and SQL Server 2000 database servers provides a good starting place for learning what you need to know.

What's Next?
Now that you have a fundamental understanding of the basic ideas behind a relational database platform, you can build on those ideas to meet business needs.

Next time, I'll cover the standard interface with SQL-based databases — the SQL language itself. Stay tuned! If you're learning about relational databases for the first time, this sequel might be as good as the original.

comments powered by Disqus
Most   Popular