Chapter 3: Introduction to SQL¶
1. Data Definition Language¶
Let's look an example below:
The main functions of DDL: - Define the schema for each relation. - Define the domain of values associated with each attribute. - Define the integrity constraints - Define the physical storage structure of each relation on disk. (e.g. table space in Oracle) - Define the indices to be maintained for each relations. - Define the view on relations
(1)Domain Types in SQL¶
- char(n). Fixed length character string, with user-specified length n.
- varchar(n). Variable length character strings, with user-specified maximum length n.
- int. Integer (a finite subset of the integers that is machine-dependent).
- smallint. Small integer (a machine-dependent subset of the integer domain type).
- numeric(p, d). Fixed point number, with user-specified precision of p digits, with d digits to the right of decimal point.
- real, double precision. Floating point and double-precision floating point numbers, with machine-dependent precision.
- float(n). Floating point number, with user-specified precision of at least n digits.
- Null values are allowed in all the domain types. Declaring an attribute to be not null prohibits null values for that attribute.
- date. Dates, containing a (4 digits) year, month and date
- E.g. date ‘2007-2-27’
- time. Time of day, in hours, minutes and seconds.
- E.g. time ’08:15:30’ time ’08:15:30.75’
- Timestamp. date plus time of day
- E.g. timestamp ‘2007-2-27 08:15:30.75’
notice
Note: the type for date and time is “datetime” in SQL Server 2000.
(2)Create Table Construct¶
An SQL relation is defined using the create table command:
- r is the name of the relation
- each \(A_i\) is an attribute name in the schema of relation r
- \(D_i\) is the data type of values in the domain of attribute \(A_i\)
There are two ways to define a primary key:
way1:
way2:
(3)Drop and Alter Table¶
-
The drop table command deletes all information about the dropped relation from the database.
- e.g.
DROP TABLE instructor2
- e.g.
-
The alter table command is used to add attributes to an existing relation.
where A is the name of the attribute to be added to relation r and D is the domain of A.
- The alter table command can also be used to drop attributes of a relation
The alter table command can also be used to modify the attributes of a relation
(4)Create index¶
- CREATE INDEX <i-name> ON <table-name>;
- CREATE UNIQUE INDEX <i-name> ON <table-name>;
2. Basic Structure of select¶
(1)The select clause¶
This query is equivalent to the relationa algebra expression, \(\prod_{A1, A2, ..., An}(\sigma_P(r_1\times r_2 \times ... \times r_m))\)
something need to be noticed
- SQL does not permit the '-' character in names
- SQL names are case insensitive, hence you can use capital or small letters.
- SQL allows duplicates(重复) in relations as well as in query results.
- To force the elimination of duplicates, insert the keyword distinct after select.
- Find the names of all departments in the instructor relations, and remove duplicates
- The opposite keyword all allow duplicates. select all dept_name from instructor. By default, duplicates are allowed, i.e., ALL is the default.
- An asterisk * in the select clause denotes “all attributes” :
SELECT * FROM instructor. But, in the following example,SELECT ID, name, salary * 1.05 FROM instructor, where * means \(\times\)- Reason: The select clause can contain arithmetic expressions involving the operation, +, –, ∗, and /, and operating on constants or attributes of tuples. (Generalized Projection)
(2)The where clause¶
- The where clause specifies conditions that the result must satisfy, and it corresponds to the selection predicate of the relational algebra.
- In WHERE clause comparison results can be combined using the logical connectives AND, OR, and NOT, and a BETWEEN comparison operator can be used for specifying a range.
A example below:
(3)The from clause¶
- The from clause lists the relations involved in the query
- corresponds to the Cartesian product operation of the relational algebra, if more than one relation is specified in the from clause.
\(\prod_{name, course\_id}(\sigma_{dept\_name}='Biology'(instructor\ \bowtie \ teaches))\)
which is equivalent with:
(4)The Rename Operation¶
- The SQL allows renaming relations and attributes using the as clause:
old_name as new_name- Example: Find the name, loan number and loan amount of all customers; rename the column name loan_number as loan_id.
something about as
- as is optional.
- In SQL Server, allow: new-name = column expression, E.g :
- Tuple variables are defined in the from clause via the use of the as clause.
- Example: For all instructors in the university who have taught some course, find their names and the course ID of all courses they taught.
- an example:
- Another example: Find the names of all instructors whose salary is greater than at least one instructor in the Biology department.
(5)String Operation¶
-
SQL includes a string-matching operator for comparisons on character strings. Patterns are described using two special characters (通配符):
- % - matches any substring. (like * in file system)
- _ - matches any character. (like ? In file system)
-
Let's have a look at an example:
- Exam: Find the names of all departments whose building name includes the substring ‘Watson’.
sql SELECT dept_name FROM department WHERE building LIKE '%Watson%'';- Match the name “Main%”(转义符)
LIKE 'Main\%'escape'\'
-
SQL supports a variety of string operations such as:
- concatenation(using "||")
select '教师名=' || name from instructor where ...(把输出的内容拼接起来)- converting from upper to lower case(and vice versa), using Function: lower(), upper()(转成大小写)
- find string length, extracting substrings
(6)Ordering the Display of Tuples¶
- An example: To list in alphabetic order all instructors in the Physics department
- We may specify desc for descending order or asc for ascending order, for each attribute; ascending order is the default.
- E.g. List the entire instructor relation in descending order of salary. If several instructors have the same salary, we order them in ascending order by name.
两个词
注意,desc和asc分别是descending和ascending的缩写,它们的意思分别是降序和升序
(7)Duplicates¶
- In traditional relational theory, no duplicates, but we need duplicates in practice.
-
Multiset(多重集) versions of some of the relational algebra operators:\(\sigma_{\theta}, \prod_A, x\), which support the multiset.
-
Given multiset relations $$ and $$:
- \(\sigma_\theta(r_1)\): If there are \(c_1\) copies of tuple \(t_1\) in \(r_1\), and \(t_1\) satisfies selections \(\sigma_\theta\),, then there are \(c_1\) copies of \(t_1\) in \(\sigma_\theta(r_1)\).
- \(\prod_A(r)\): For each copy of tuple \(t_1\) in \(r_1\), there is a copy of tuple \(\prod_A(t_1)\) in \(\prod_A(r_1)\) where \(\prod_A(t_1)\) denotes the projection of the single tuple \(t_1\).
- \(r_1 \times r_2\): If there are \(c_1\) copies of tuple \(t_1\) in \(r_1\) and \(c_2\) copies of tuple \(t_2\) in \(r_2\), there are \(c_1 \times c_2\) copies of the tuple \(t_1\). \(t_2\) in \(r_1 \times r_2\)
如何不duplicate
- 在SELECT后面加上distinct即可
3. Set Operations¶
We have three examples below:
- Example 1: To find the set of all courses taught either in Fall 2009 or in Spring 2010, or both:
- Example 2: To find the set of all courses taught in the Fall 2009 as well as in Spring 2010:
- Example 3: To find all courses taught in the Fall 2009 semester but not in the Spring 2010 semester:
4. Aggregate Functions¶
-
These functions operate on the multiset of a column of a relation, and return a value.
- avg(col): average value
- min(col): minimum value
- max(col): maximum value
- sum(col): sum of values
- count(col): number of values
-
example 1: Find the average account balance at the Perryridge branch(基本应用)
- example 2: Find the average salary in each department.(GROUP BY的使用)
- example 3: Find the number of depositors for each branch(distinct)
- example 4: Find departments where the average salary of the instructors is more than $42,000(Having Clause的使用,注意,having clause只能用在GRUOP BY语句之后,只能用聚合语句用来筛选分组后的结果)
5. Null Values¶
- It is possible for tuples to have a null value, denoted by null, for some of their attributes
- null signifies an unknown value or that a value does not exist.
- The result of any arithmetic expression involving null is null.
- 5 + null returns null
- Any comparision with null returns unknown
- 5 < null or null < or > null or null = null
- The predicate is null, is not null can be used to check for null values.
- "P is unknown" evaluates to true if predicate P evaluates to unknown
- However, aggregate functions simply ignore nulls
- Total all salary amounts
- result is null if there is no non_null salary, that is, all values of salary in instructors are null.
- All aggregate operations except count(*) ignore tuples with null values on the aggregated attributes.
6. Nested(嵌套) Subqueries¶
- SQL provides a mechanism for the nesting of subqueries.
- A subquery is a select_from_where expression that is nested within another query.
-
A common use of subqueries is to perform tests for set membership, set comparisons.
-
Example 1: Find all the courses taught in the both the Fall 2009 and Spring 2010 semesters.
- Example 2: Find all the courses taught in the Fall 2009 semester but not in the Spring 2010 semester.
- Example 3: Find the total number of(distinct) students who have taken course sections taught by the instructor with ID 10101
(1)Set Comparision¶
-
Definition of Some Clause: \(C <comp> some r \leftrightarrow \exists t \in r (C <comp> t)\), WHERE
can be: \(\le, \leq, \ge, \geq, \neq\) -
Definition of all Clause: \(C <comp> all r \leftrightarrow \forall t \in r(C <comp> t)\)
(2)Test for Empty Relations¶
- The exists construct returns the value true if the argument subquery is non-empty.
- exists \(r \leftrightarrow r \neq \emptyset\)
- not exists \(r = \emptyset\)
(3)Test for Absence of Duplicate Tuples¶
- The unique construct tests whether a subquery has any duplicate tuples in it's result.
7. Views¶
-
Provide a mechanism to hide certain data from the view of certain users.
-
To create a view we use the command below:
-
Two purposes of using views:
- Security
- Easy to use
-
To drop view:
DROP VIEW <V_NAME>;
8. Complex Queries¶
(1)Derived Relations¶
- Find the average instructors' salaries of those departments where the average salary is greater than $42000.
Another one:
辨析
- having后面限制的是MAX,MIN这样的聚合函数
- where则是未经处理的attributes
(2)With Clause¶
- With clause allows views to be defined locally to a query, rather than globally.
We have an example, find those departments with the maximum budget.
注意
- 用with的目的是减少嵌套,接下来展示一种不用with的写法
Complex Query using with clause¶
有关with
- 一个with后面可以跟多个as操作
Example 1: Find out the students who have enrolled more than 10 courses.
Example 2: Find out the student names who have enrolled more than 10 courses.
注意
- Note: the derived table must have its own alias, like TT above.(不管是否被引用,导出表(或称嵌套表)必须给出别名)
9. Modification of the Database¶
(1)Deletion¶
- Delete all tuples in the instructor relation pertaining to instructors in the Finance department:
- Formal form: `DELETE FROM
[WHERE
]
Let's see an interesting example below:
- Delete the records of all instructors with salary below the average at the university.
Notice
- Problem: as we delete tuples from instructor, the average salary changes
- Solution actually used in SQL:
- First, compute avg salary and find all tuples to delete.
- Next, delete all tuples found above(without recomputing avg or retesting the tuples).
- 在同一SQL语句内,除非外层查询的元组变量引入内层查询,否则内层查询只进行一次。
(2)Insertion¶
- Add a new tuple to course:
- We have a format below:
-
You can also add a new tuple to course with credits set to NULL
-
The 'select from where' statement is fully evaluated before any of its results are inserted into the relation. So the statement below is right:
INSERT INTO table1 select * from table1;
(3)Updates¶
- Format of update statement:
- An example: all instructors with salary over $100,000 receive a 3 percent raise, whereas all others receive a 5 percent raise. We need two update statements:
- Actually the order is important, and we can't change the order of two UPDATES.
Case Statement for Conditional Updates
- Same query as before:
(4)Update of a view¶
- Example: Create a view of all data in instructor relatino, hiding the salary attribute
Then add a new tuple to faculty:
This insertion will be translated into:
- Updates on more complex views are difficult or impossible to translate, and hence are disallowed.
Insert INTO, DELETE FROM这张表是不被允许的,因为它的视图定义太过复杂
关于view的注意事项
- View 是虚表,对其进行的所有操作都转化为对基表的操作。
- 查询操作时,View与基表没有区别,但对View的更新操作有严格限制。
- Most SQL implementations allow updates only on simple views defined on a single relation and without aggregates.
(5)Transanctions¶
-
Definition: A transaction is a sequence of queries and data update statements executed as a single logical unit.
- Transactions are started implicitly and terminated by one of
- commit work: makes all updates of the transaction permanent in the database
- rollback work: undoes all updates performed by the transaction.
- Transactions are started implicitly and terminated by one of
-
Let's talk about commit work(Atomicity):
- If one steps suceeds and the other fails, database is in an inconsistent state
- Therefore, either both steps should suceed or neither.
-
If any step of a transaction fails, all work done by the transaction can be undone by rollback work.
-
Rollback of incomplete transactions is done automatically, in case of system failures
-
The four properties of transaction are required:atomicity, isolation, consistency, durability(see chpt15)
-
By the way, in some database systems, each SQL statement that executes successfully is automatically committed.(但少见)
- Each transaction would then consist of only a single statement.
- Automatic commit can usually be turned off, allowing multi_statement transactions, but how to do so depends on the database system.
- Another option in SQL:1999: enclose statements within
begin atomic …… end
10. Joined Relations¶
- Join operations take two relations and return as a result another relation.
- Join condition - defines which tuples in the two relations match, and what attributes are present in the result of the join
- Join type - defines how tuples in each relation that do not match any tuple in other relation (based on the join condition) are treated.
说明
- 由于上一个chapter的笔记已经讲了几种join的定义,此处不再赘述:它们分别是Inner Join, Left Out Join, Right Out Join, Full Outer Join, 它们都是natural join.
A classification: - 自然连接: R natural {inner join, left join, right join, full join} S - 非自然连接: R {inner join, left join, right join, full join} S + on <连接条件判别式> or using (<同名的等值连接属性名>) - notice: Key word Inner, outer is optional.
E.g. Find all customers who have either an account or a loan(but not both) at the bank.