Chapter 3: Introduction to SQL¶

1. Data Definition Language¶

Let's look an example below:

CREATE TABLE instructor(
    ID char(5),
    name varchar(20) not null,
    dept_name varchar(20),
    salary numeric(8,2),
    primary key(ID)
);

The main functions of DDL: - Define the schema for each relation. - Define the domain of values associated with each attribute. - Define the integrity constraints - Define the physical storage structure of each relation on disk. (e.g. table space in Oracle) - Define the indices to be maintained for each relations. - Define the view on relations

(1)Domain Types in SQL¶

char(n). Fixed length character string, with user-specified length n.
varchar(n). Variable length character strings, with user-specified maximum length n.
int. Integer (a finite subset of the integers that is machine-dependent).
smallint. Small integer (a machine-dependent subset of the integer domain type).
numeric(p, d). Fixed point number, with user-specified precision of p digits, with d digits to the right of decimal point.
real, double precision. Floating point and double-precision floating point numbers, with machine-dependent precision.
float(n). Floating point number, with user-specified precision of at least n digits.
Null values are allowed in all the domain types. Declaring an attribute to be not null prohibits null values for that attribute.
date. Dates, containing a (4 digits) year, month and date
- E.g. date ‘2007-2-27’
time. Time of day, in hours, minutes and seconds.
- E.g. time ’08:15:30’ time ’08:15:30.75’
Timestamp. date plus time of day
- E.g. timestamp ‘2007-2-27 08:15:30.75’

notice

Note: the type for date and time is “datetime” in SQL Server 2000.

(2)Create Table Construct¶

An SQL relation is defined using the create table command:

CREATE TABLE r(A1 D1, A2 D2, ..., An Dn,
    (integrity constraint1),
    ...,
    (integrity constraintk)
)

r is the name of the relation
each $A_i$ is an attribute name in the schema of relation r
$D_i$ is the data type of values in the domain of attribute $A_i$

There are two ways to define a primary key:

way1:

CREATE TABLE instructor(
    ID char(5),
    name  varchar(20) not null, 
    dept_name varchar(20),
    salary numeric(8,2),
    primary key (ID),
    check (salary >= 0)
);

way2:

CREATE TABLE instructor2(
    ID char(5) primary key,
    name varchar(20),
    dept_name varchar(20),
    salary numeric(8,2),
    check (assets >= 0)
);

(3)Drop and Alter Table¶

The drop table command deletes all information about the dropped relation from the database.
- e.g. DROP TABLE instructor2

The alter table command is used to add attributes to an existing relation.

ALTER TABLE r ADD A D;
ALTER TABLE r ADD (A1 D1, ..., An Dn);

where A is the name of the attribute to be added to relation r and D is the domain of A.

The alter table command can also be used to drop attributes of a relation
1
ALTER TABLE r DROP A

The alter table command can also be used to modify the attributes of a relation

(4)Create index¶

CREATE INDEX <i-name> ON <table-name>;
CREATE UNIQUE INDEX <i-name> ON <table-name>;

2. Basic Structure of select¶

(1)The select clause¶

SELECT A1, A2, ..., An
FROM r1, r2, ..., rm

This query is equivalent to the relationa algebra expression, $\prod_{A1, A2, ..., An}(\sigma_P(r_1\times r_2 \times ... \times r_m))$

something need to be noticed

SQL does not permit the '-' character in names
SQL names are case insensitive, hence you can use capital or small letters.

SQL allows duplicates(重复) in relations as well as in query results.
To force the elimination of duplicates, insert the keyword distinct after select.
Find the names of all departments in the instructor relations, and remove duplicates
1 2
SELECT distinct dept_name FROM instructor
The opposite keyword all allow duplicates. select all dept_name from instructor. By default, duplicates are allowed, i.e., ALL is the default.
An asterisk * in the select clause denotes “all attributes” :SELECT * FROM instructor. But, in the following example, SELECT ID, name, salary * 1.05 FROM instructor, where * means $\times$
- Reason: The select clause can contain arithmetic expressions involving the operation, +, –, ∗, and /, and operating on constants or attributes of tuples. (Generalized Projection)

(2)The where clause¶

The where clause specifies conditions that the result must satisfy, and it corresponds to the selection predicate of the relational algebra.
In WHERE clause comparison results can be combined using the logical connectives AND, OR, and NOT, and a BETWEEN comparison operator can be used for specifying a range.

A example below:

SELECT name FROM instructor
WHERE salary BETWEEN 90000 AND 100000;

(3)The from clause¶

The from clause lists the relations involved in the query
- corresponds to the Cartesian product operation of the relational algebra, if more than one relation is specified in the from clause.

$\prod_{name, course\_id}(\sigma_{dept\_name}='Biology'(instructor\ \bowtie \ teaches))$

which is equivalent with:

SELECT name, course_id
FROM instructor, teaches
WHERE instructor.ID = teaches.ID and dept_name = 'Biology';

(4)The Rename Operation¶

The SQL allows renaming relations and attributes using the as clause:old_name as new_name
- Example: Find the name, loan number and loan amount of all customers; rename the column name loan_number as loan_id.

something about as

as is optional.

In SQL Server, allow: new-name = column expression, E.g :

   SELECT  name as instructor_name, course_id FROM instructor, teaches
   WHERE  instructor.ID = teaches.ID and dept_name = ‘Biology’;

Tuple variables are defined in the from clause via the use of the as clause.
- Example: For all instructors in the university who have taught some course, find their names and the course ID of all courses they taught.
- an example:
  1 2
  SELECT T.name, S.course_id FROM instructor as T, teaches as S WHERE T.ID = S.ID;
- Another example: Find the names of all instructors whose salary is greater than at least one instructor in the Biology department.
  1 2
  SELECT distinct T.name FROM instructor as T, instructor as S WHERE T.salary > S.salary and S.dept_name = 'Biology';

(5)String Operation¶

SQL includes a string-matching operator for comparisons on character strings. Patterns are described using two special characters (通配符):
- % - matches any substring. (like * in file system)
- _ - matches any character. (like ? In file system)
Let's have a look at an example:
- Exam: Find the names of all departments whose building name includes the substring ‘Watson’.
- sql SELECT dept_name FROM department WHERE building LIKE '%Watson%'';
- Match the name “Main%”(转义符) LIKE 'Main\%' escape '\'
SQL supports a variety of string operations such as:
- concatenation(using "||")
- select '教师名=' || name from instructor where ...(把输出的内容拼接起来)
- converting from upper to lower case(and vice versa), using Function: lower(), upper()(转成大小写)
- find string length, extracting substrings

(6)Ordering the Display of Tuples¶

An example: To list in alphabetic order all instructors in the Physics department

SELECT name FROM instructor
WHERE dept_name = 'Physics'
order by name; #important

We may specify desc for descending order or asc for ascending order, for each attribute; ascending order is the default.
- E.g. List the entire instructor relation in descending order of salary. If several instructors have the same salary, we order them in ascending order by name.

SELECT * FROM instructor
ORDER BY salary desc, name asc; #important

两个词

注意，desc和asc分别是descending和ascending的缩写，它们的意思分别是降序和升序

(7)Duplicates¶

In traditional relational theory, no duplicates, but we need duplicates in practice.
Multiset(多重集) versions of some of the relational algebra operators:$\sigma_{\theta}, \prod_A, x$, which support the multiset.
Given multiset relations $$ and $$:
1. $\sigma_\theta(r_1)$: If there are $c_1$ copies of tuple $t_1$ in $r_1$, and $t_1$ satisfies selections $\sigma_\theta$,, then there are $c_1$ copies of $t_1$ in $\sigma_\theta(r_1)$.
2. $\prod_A(r)$: For each copy of tuple $t_1$ in $r_1$, there is a copy of tuple $\prod_A(t_1)$ in $\prod_A(r_1)$ where $\prod_A(t_1)$ denotes the projection of the single tuple $t_1$.
3. $r_1 \times r_2$: If there are $c_1$ copies of tuple $t_1$ in $r_1$ and $c_2$ copies of tuple $t_2$ in $r_2$, there are $c_1 \times c_2$ copies of the tuple $t_1$. $t_2$ in $r_1 \times r_2$

如何不duplicate

在SELECT后面加上distinct即可

3. Set Operations¶

We have three examples below:

Example 1: To find the set of all courses taught either in Fall 2009 or in Spring 2010, or both:

(SELECT course_id FROM section
WHERE semester = 'Fall' and year = 2009)
UNION
(SELECT course_id FROM section
WHERE semester = 'Spring' and year = 2010);

Example 2: To find the set of all courses taught in the Fall 2009 as well as in Spring 2010:

(SELECT course_id FROM section
WHERE semester = 'Fall' and year = 2009)
INTERSECT
(SELECT course_id FROM section
WHERE semester = 'Spring' and year = 2010);

Example 3: To find all courses taught in the Fall 2009 semester but not in the Spring 2010 semester:

(SELECT course_id FROM section
WHERE semester = 'Fall' and year = 2009)
EXCEPT
(SELECT course_id FROM section
WHERE semester = 'Spring' and year = 2010);

4. Aggregate Functions¶

These functions operate on the multiset of a column of a relation, and return a value.
- avg(col): average value
- min(col): minimum value
- max(col): maximum value
- sum(col): sum of values
- count(col): number of values
example 1: Find the average account balance at the Perryridge branch(基本应用)

SELECT avg(salary) as avg_salary FROM instructor
WHERE dept_name = 'Comp.Sci.';

example 2: Find the average salary in each department.(GROUP BY的使用)

SELECT dept_name, avg(salary) as avg_salary FROM instructor
GROUP BY dept_name;

example 3: Find the number of depositors for each branch(distinct)

SELECT branch_name, count(distinct customer_name) as tot_num 
FROM depositor D, account A
WHERE D.account_number = A.account_number
GROUP BY branch_name;

example 4: Find departments where the average salary of the instructors is more than $42,000(Having Clause的使用，注意，having clause只能用在GRUOP BY语句之后，只能用聚合语句用来筛选分组后的结果)

SELECT dept_name, avg (salary) as avg_salary FROM instructor
GROUP BY dept_name
HAVING avg (salary) > 42000;

5. Null Values¶

It is possible for tuples to have a null value, denoted by null, for some of their attributes
null signifies an unknown value or that a value does not exist.
The result of any arithmetic expression involving null is null.
- 5 + null returns null
Any comparision with null returns unknown
- 5 < null or null < or > null or null = null
The predicate is null, is not null can be used to check for null values.
"P is unknown" evaluates to true if predicate P evaluates to unknown
However, aggregate functions simply ignore nulls
Total all salary amounts
- result is null if there is no non_null salary, that is, all values of salary in instructors are null.
All aggregate operations except count(*) ignore tuples with null values on the aggregated attributes.

6. Nested(嵌套) Subqueries¶

SQL provides a mechanism for the nesting of subqueries.
A subquery is a select_from_where expression that is nested within another query.
A common use of subqueries is to perform tests for set membership, set comparisons.
Example 1: Find all the courses taught in the both the Fall 2009 and Spring 2010 semesters.

SELECT distinct course_id from section
WHERE semester = 'Fall' and year = 2009 and
course_id in(select course_id from section
WHERE semester = 'Spring' and year = 2010);

Example 2: Find all the courses taught in the Fall 2009 semester but not in the Spring 2010 semester.

SELECT distinct course_id from section
WHERE semester = 'Fall' and year = 2009 and
course_id not in(select course_id from section
WHERE semester = 'Spring' and year = 2010);

Example 3: Find the total number of(distinct) students who have taken course sections taught by the instructor with ID 10101

SELECT count(distinct ID) from takes
WHERE (course_id, sec_id, semester, year)
in (SELECT course_id, sec_id, semester, year from teaches
WHERE teaches.ID = 10101);

(1)Set Comparision¶

Definition of Some Clause: $C <comp> some r \leftrightarrow \exists t \in r (C <comp> t)$, WHERE can be: $\le, \leq, \ge, \geq, \neq$
Definition of all Clause: $C <comp> all r \leftrightarrow \forall t \in r(C <comp> t)$

(2)Test for Empty Relations¶

The exists construct returns the value true if the argument subquery is non-empty.
- exists $r \leftrightarrow r \neq \emptyset$
- not exists $r = \emptyset$

(3)Test for Absence of Duplicate Tuples¶

The unique construct tests whether a subquery has any duplicate tuples in it's result.

7. Views¶

Provide a mechanism to hide certain data from the view of certain users.

To create a view we use the command below:

CREATE VIEW <v_name> AS
select c1, c2, ... from ...;
CREATE VIEW <v_name> (c1, c2, ...) AS
select e1, e2, ... from ...;

Two purposes of using views:
- Security
- Easy to use
To drop view: DROP VIEW <V_NAME>;

8. Complex Queries¶

(1)Derived Relations¶

Find the average instructors' salaries of those departments where the average salary is greater than $42000.

SELECT dept_name, avg_salary
FROM (SELECT dept_name, avg(salary) FROM instructor
group by dept_name)
as dept_avg(dept_name, avg_salary)
WHERE avg_salary > 42000;

Another one:

SELECT dept_name, avg(salary) as avg_salary FROM instructor
GROUP BY dept_name
HAVING avg(salary) > 42000;

辨析

having后面限制的是MAX，MIN这样的聚合函数
where则是未经处理的attributes

(2)With Clause¶

With clause allows views to be defined locally to a query, rather than globally.

We have an example, find those departments with the maximum budget.

WITH max_budget(value) as (
SELECT max(budget) FROM department)
SELECT budget FROM department, max_budget
WHERE department.budget = max_budget.value;

注意

用with的目的是减少嵌套，接下来展示一种不用with的写法

SELECT * FROM department
WHERE budget = (SELECT MAX(budget) FROM department)

Complex Query using with clause¶

有关with

一个with后面可以跟多个as操作

Example 1: Find out the students who have enrolled more than 10 courses.

SELECT sno FROM enroll
GROUP BY sno
having count(cno) > 10

Example 2: Find out the student names who have enrolled more than 10 courses.

SELECT TT.sno, sname, c_num FROM
(SELECT sno, count(cno) as c_num FROM
enroll group by sno) as TT, student S
where TT.sno=S.sno and c_num>10;

注意

Note: the derived table must have its own alias, like TT above.(不管是否被引用，导出表(或称嵌套表)必须给出别名)

9. Modification of the Database¶

(1)Deletion¶

Delete all tuples in the instructor relation pertaining to instructors in the Finance department:

DELETE FROM instructor
WHERE dept_name = 'Finance';

Formal form: `DELETE FROM [WHERE ]

Let's see an interesting example below:

Delete the records of all instructors with salary below the average at the university.

DELETE FROM instructor
WHERE salary < (select avg(salary) FROM instructor);

Notice

Problem: as we delete tuples from instructor, the average salary changes
Solution actually used in SQL:
1. First, compute avg salary and find all tuples to delete.
2. Next, delete all tuples found above(without recomputing avg or retesting the tuples).
3. 在同一SQL语句内，除非外层查询的元组变量引入内层查询，否则内层查询只进行一次。

(2)Insertion¶

Add a new tuple to course:

INSERT INTO course VALUES(
    'CS-437', 'Database Systems', 'Comp.Sci.', 4
);
#or equivalently
INSERT INTO course(course_id, title, dept_name, credits) VALUES (
    'CS-437', 'Database Systems', 'Comp.Sci.', 4
);

We have a format below:

INSERT INTO <table|view>[(c1, c2, ...)]
VALUES (e1, e2, ...);
INSERT INTO <table|view>[(c1, c2, ...)]
SELECT e1, e2, ... FROM ...;

You can also add a new tuple to course with credits set to NULL
The 'select from where' statement is fully evaluated before any of its results are inserted into the relation. So the statement below is right:INSERT INTO table1 select * from table1;

(3)Updates¶

Format of update statement:

UPDATE <table|view> SET <c1 = e1 [, c2 = e2,...]>
[WHERE <condition>]

An example: all instructors with salary over $100,000 receive a 3 percent raise, whereas all others receive a 5 percent raise. We need two update statements:

UPDATE instructor
SET salary = salary * 1.03
WHERE salary > 100000;
UPDATE instructor
SET salary = salary * 1.05
WHERE salary <= 100000; 

Actually the order is important, and we can't change the order of two UPDATES.

Case Statement for Conditional Updates

Same query as before:

UPDATE instructor
set salary = case
WHEN salary <= 10000
THEN salary * 1.05
ELSE salary * 1.03
END;

(4)Update of a view¶

Example: Create a view of all data in instructor relatino, hiding the salary attribute

CREATE view faculty as
SELECT ID, name, dept_name FROM instructor

Then add a new tuple to faculty:

INSERT INTO faculty values(
    '30675', 'Green', 'Music'
)

This insertion will be translated into:

INSERT INTO instructor values(
    '30765', 'Green', 'Music', null
);

Updates on more complex views are difficult or impossible to translate, and hence are disallowed.

Create view all_customer as
select branch_name, customer_name from account, depositor where …
union  
select branch_name, customer_name from loan, borrower where …;

Insert INTO, DELETE FROM这张表是不被允许的，因为它的视图定义太过复杂

关于view的注意事项

View 是虚表，对其进行的所有操作都转化为对基表的操作。
查询操作时，View与基表没有区别，但对View的更新操作有严格限制。
Most SQL implementations allow updates only on simple views defined on a single relation and without aggregates.

(5)Transanctions¶

Definition: A transaction is a sequence of queries and data update statements executed as a single logical unit.
- Transactions are started implicitly and terminated by one of
  - commit work: makes all updates of the transaction permanent in the database
  - rollback work: undoes all updates performed by the transaction.
Let's talk about commit work(Atomicity):
- If one steps suceeds and the other fails, database is in an inconsistent state
- Therefore, either both steps should suceed or neither.
If any step of a transaction fails, all work done by the transaction can be undone by rollback work.
Rollback of incomplete transactions is done automatically, in case of system failures
The four properties of transaction are required:atomicity, isolation, consistency, durability(see chpt15)
By the way, in some database systems, each SQL statement that executes successfully is automatically committed.(但少见)
- Each transaction would then consist of only a single statement.
- Automatic commit can usually be turned off, allowing multi_statement transactions, but how to do so depends on the database system.
- Another option in SQL:1999: enclose statements within begin atomic …… end

10. Joined Relations¶

Join operations take two relations and return as a result another relation.
Join condition - defines which tuples in the two relations match, and what attributes are present in the result of the join
Join type - defines how tuples in each relation that do not match any tuple in other relation (based on the join condition) are treated.

说明

由于上一个chapter的笔记已经讲了几种join的定义，此处不再赘述：它们分别是Inner Join, Left Out Join, Right Out Join, Full Outer Join, 它们都是natural join.

A classification: - 自然连接: R natural {inner join, left join, right join, full join} S - 非自然连接: R {inner join, left join, right join, full join} S + on <连接条件判别式> or using (<同名的等值连接属性名>) - notice: Key word Inner, outer is optional.

E.g. Find all customers who have either an account or a loan(but not both) at the bank.

SELECT customer_name FROM (depositor natural full outer join borrower)
WHERE account_number is null or loan_number is null