Behind Relational Databases
You've worked with SQL Server, but what's working behind the scenes? This new series delves into the processes happening on the back end.
What's a database? OK, maybe we're starting with a very
basic question. But, it might be worth examining the details
of exactly what a database is and what it is supposed
to do. There's a lot of talk about databases and database-related
technology in the IT industry today.
For the most part, this is for good reasons: First and
foremost, the lifeblood of many large companies is their
information. Althought that information may be stored
and managed in many different ways, it's extremely important
that data be stored and managed properly. The information
itself may range from a list of employees to an e-commerce
application that tracks sales. Next, many businesses rely
on the availability of their computing systems in order
to conduct business. Think about the results of a large
e-commerce site's data becoming unavailable, even only
for a few minutes (especially during the all-important
holiday season).
Note
|
In this article, I'll focus
on technical information that pertains
to Microsoft's SQL Server 2000. However,
due to the high-level nature of what I'm
presenting, almost all of the information
in this article also pertains to all major
relational database platforms. |
|
|
So, we can probably agree that databases are important.
With that in mind, let's dive into the details of what
they're really supposed to do and how they do it. I've
written this article to serve as an introduction to relational
database terms and concepts. However, I think that even
"intermediate" level database developers and administrators
could use a refresher on some of these concepts. (I'm
always surprised by the number of database "experts" that
have trouble trying to explain what a "view" really is!)
Features of a Database Server
Let's start by talking about the fundamentals of what
a database server actually does. There are several features
that are required in a database platform. All of these
features lend themselves to the main goal of database
servers: to store and manage data. Some of these requirements
include:
-
Information accessibility
— The most important data in the world is useless
if people can't access it. The database server must
provide interfaces for accessing the information in
a usable format. In the current relational database
world, a standard known as the Structured Query Language,
or SQL (pronounced "sequel") is often used.
-
Reliability — The database
server must be available when it is needed, and the
data within it must remain consistent (more on this
soon).
-
Security — The database
server platform must allow administrators and users
to ensure that only authorized individuals can view
specific information. In some cases, the level of
security required for business reasons is very granular.
-
Performance and Scalability
— A database server must be able to complete transactions
and requests for information quickly. Furthermore,
the database platform must be able to scale to the
number of transactions that a business expects to
support.
-
Data protection — Database
administrators must be able to backup and restore
data quickly and easily, based on business requirements.
· Ease of use and administration: Database systems
should be easy to manage. If a database server requires
hundreds of people to administer, it's clearly not
meeting it's goals.
Of course, there are many more requirements that you
might have based on your specific business purposes; this
is only an overview of what I see as some of the most
important features of database servers.
Based on this overview, let's move on to look at how
"transactions" should work.
Database Transactions
One of the most important facets of database transactions
is that they remain valid and consistent — that is, the
information that you plan to add, update or change must
do so in the manner that you expected. This may be a small
task if you're the only on writing data to a specific
system. But imagine the other end of the spectrum: thousands
of concurrent users that may choose to update the same
information at the same time (or close to it). How should
this be handled? If used properly, relational database
transactions can take care of a lot of the dirty work
for you.
Transactions are SQL commands that are not necessarily
related but must execute in an all-or-nothing fashion.
Let's look at a common example. Assume that you want to
carry out a financial transaction between two different
banks. The basic operation should subtract money from
User A's account and add the money to User B's account.
For obvious reasons, you don't want one transaction to
occur without the other (or you'll have made someone either
very happy or very upset). To avoid this potential problem,
you could combine both operations into a single transaction.
If an error occurs during the transaction (for example,
one of the two servers is unavailable), neither of the
two transactions is performed. Another example in applications
is when you're updating information such as a customer's
address. You don't want an operation to update information
regarding her street address without also updating her
ZIP code. Again, this is a perfect place for a transaction.
In order to be safe, transactions must pass the "ACID"
test, having all four of these properties:
- Atomicity — Each transaction
is represented by a single all-or-nothing operation:
All steps are carried out or the entire process is aborted.
- Consistency — No data changes
should violate the structure and design of the database.
- Independence — Transactions
are not aware that other transactions are being run
at the same time.
- Durability — If an error
occurs during the processing of a transaction, partial
transactions should be reversed.
An Overview of Database Objects
Before diving too far into the technical guts of a relational
database management system (RDBMS), let's look at the
common types of database objects they support. These are
the structures related to data with which users interact.
The purpose of a database server is to manage these objects
and the information they store and represent. Database
objects supported in SQL Server 2000 are described next.
Tables
The fundamental unit of data storage in a relational database
is the table. A table stores information in rows. In turn,
each row contains values for one or more columns that
specify related information about a single data item.
Tables are the structures in which actual information
resides. Tables generally refer to a single logical entity.
For example, I might create a table called "Employee."
Within the Employee table, I might have rows that include
information related to each employee ("First Name," "Last
Name," etc.). I can have many rows, each of which represents
one instance of the entity that the table describes (in
this case an employee). Table 1 shows the structure of
a simple database table. Each row in the table refers
to an instance of the item in the table (in this case,
a specific employee) and the columns refer to pieces of
information about that instance.
Employee
Number |
First Name |
Middle Initial |
LastName |
Department |
Phone
Extension |
1 |
Anil |
K |
Desai |
Engineering |
0937 |
2 |
Jane |
|
User |
Marketing |
1554 |
3 |
Bob |
|
Manager |
|
1152 |
|
Table 1. A sample table containing
information about employees. |
In SQL Server, tables are built within a database (which,
in turn, resides on a data file). We'll look at the actual
storage concepts related to table data later. For now,
know that the majority of user-related database functions
will involve the use of information stored in tables in
one way or another. These are also the fundamental objects
that you'll be responsible for protecting in your backup
and recovery plans.
Views
Views are database objects that refer to data stored in
one or more tables. Views are defined by SQL queries (which
are commands that retrieve data; I cover this later).
SQL queries specify the information to be returned to
a user. Users interact with views in the same way that
they interact with tables. However, views don't store
data; instead, they retrieve relational information from
tables (see Figure 1).
|
Figure 1. Using a view to access
database information. |
Views are used in several types of scenarios and can
be helpful in managing information. One benefit is security.
You can create a view of a table that allows users to
see only a subset of the information stored in one or
more tables. This is useful, for example, if you have
an employees table that contains sensitive information.
You might create a view that allows users to query against
all information except salary. Furthermore, you might
restrict permissions on the underlying database itself
and assign permissions to the view only.
Additionally, views are useful for encapsulating business
logic. By storing commonly used queries as views, you
can reduce the chance that a developer will make an error
in retrieving data. Views can even refer to other views,
although this practice can sometimes make it difficult
to debug any problems that crop up.
Overall, though, views enable you to simplify administration,
increase manageability, and improve security.
Indexes
Indexes are database objects that store a subset of a
table's columns. They speed data searches by minimizing
the amount of information that must be searched by the
database server engine. Indexes works like the index of
a book. Instead of flipping through all of the pages,
you use the index to look up simple keywords. When you
find what you're looking for, you're referred to a page
number that contains the detailed information. Searching
through the index is much quicker than searching through
a book page by page.
Indexes can reduce the amount of time it takes the database
server to find the information a user requests. Suppose
you often run queries that search for employees based
on their last names. However, your database table for
employee information actually contains much more information
(for example, address, manager's name, job title, department,
etc). If an index is placed on the First Name and Last
Name columns of this table, SQL Server won't have to search
through all the information in the table to gain this
information. Instead, it will search the index (in this
case, a list of all of the Last Names) and then go to
the pertinent rows for the remainder of the data.
A query that refers only to indexed columns is often
called a covered query. If details on those rows are required,
the index will point to the appropriate data storage areas.
Placing the proper indexes on database objects is extremely
important for performance reasons. It's too complicated
a topic to cover completely in this article.
Stored Procedures
Stored procedures are simply named Transact-SQL statements
or transactions that are stored within a database as an
object. They contain procedural code that can be executed
on demand. There are several benefits to using stored
procedures versus performing the same queries manually.
Perhaps the most important benefit is the dramatic speed
increase that stored procedures can cause.
Stored procedures execute much more quickly than the
same statements that execute on an ad-hoc basis. The main
reason for this performance increase is that the database
engine stores a predetermined optimal data recovery plan
in cache memory. This is also one reason that yo'll get
a much quicker response time from a query the second time
you run it (provided that the data pages haven't yet expired
from the cache due to other activities).
Stored procedures offer functionality improvements and
allow you to perform operations that are otherwise difficult
(or impossible). For example, you can loop through the
records in a table one by one and take actions based on
a complex algorithm that fits your business rules.
There are also security benefits to using stored procedures.
Like views, stored procedures can be used to hide from
users the underlying database objects that are being affected.
This helps you restrict the actions that users can perform
directly on database objects, and prevents you from having
to rely on (and manage) complicated permissions structures.
Stored procedures also allow for better management of
underlying table structures. For example, if you want
to change an employee's address information in several
tables at once, you can create a single stored procedure
that will make sure that all the necessary operations
are carried out.
Additionally, stored procedures can call other stored
procedures, allowing the development of modularly coded
business rules and SQL statements.
Triggers
In some cases, you'll want to take some action every time
data in a table is accessed or modified. Triggers allow
you to automatically fire a SQL statement whenever users
execute commands that access a table. The statement can
then execute one or more other SQL statements that modify
data or perform validity checks on the operation. For
example, a trigger may be used to automatically delete
all employee time sheet data whenever an employee is deleted
from the database. The trigger can access both "before"
and "after" images of the data when it is executed.
There are many more database objects and constructs -
we just don't have enough room to discuss them all here.
For more information, be sure to consult database-related
resources. For example, the Books Online that are installed
with Microsoft's SQL Server 7.0 and SQL Server 2000 database
servers provides a good starting place for learning what
you need to know.
What's Next?
Now that you have a fundamental understanding of the basic
ideas behind a relational database platform, you can build
on those ideas to meet business needs.
Next time, I'll cover the standard interface with SQL-based
databases — the SQL language itself. Stay tuned! If you're
learning about relational databases for the first time,
this sequel might be as good as the original.