Chapter 2: Relational Model¶

1. The Introduction of relational model¶

a brief conclusion¶

a brief conclusion

The relational model is very simple and elegant.
A relational database is a collection of one or more relations, which are based on relational model.
A relation is a table with rows and columns.
The major advantages of the relational model are its simple data representation and the ease with which even complex queries can be expressed.
Owing to the great language SQL, the most widely used language for creating, manipulating, and querying relational database.

an example of Relation¶

distinguish

A relationship: an association among several entities.
A relation: is the mathematical concept, referred to a table

2. Basic Structure¶

definition¶

Formally, given sets \(D_1, D_2, ..., D_n\), \(D_i = a_{ij}|_{j=1...k}\)
a relation r is a subset of \(D_1 \times D_2 \times ... \times D_n\)
a Cartesian product of a list of domain \(D_i\)
Thus a relation is a set of n-tuples(\(a_{ij}, a_{2j}, ..., a_{nj}\))

An Example of Cartesian product¶

(1)Attribute Types¶

Each attribute of a relation has a name.
The set of allowed values for each attribute is called the domain (域) of the attribute.
Attribute values are (normally) required to be atomic, that is, indivisible. (1st NF, 第一范式)
E.g. multivalued attribute values are not atomic
E.g. composite attribute values are not atomic
The special value null is a member of every domain.
The null value causes complications in the definition of many operations.
- we shall ignore the effect of null values for the moment and consider their effect later.

(2)Concepts about relation¶

A Relation is concerned with two concepts : relation schema and relation instance.
The relation schema describe the structure of the relation.
The relation instance corresponds to the snapshot of the data in the relation at a given instant in time.
C.f.: Database schema and database instance.

example of relation schema

Instructor-schema = (ID: string, name: string, dept_name: string, salary: int)
Instructor-schema = (ID, name, dept_name, salary)

an one-to-one correspondence

variable <-> relation
variable type <-> relation schema
variable value <-> relation instance

(2-a) Relation Schema¶

\(A_1, A_2, ..., A_n\) are attributes
Formally expressed:
- \(R = (A_1, A_2, ..., A_n)\) is a relation schema
- E.g. Instructor-schema = (ID, name, dept_name, salary)
\(r(R)\) is a relation on the relation schema R

(2-b) Relation Instance¶

The current values (relation instance) of a relation are specified by a table.
An element t of r is a tuple, represented by a row in a table.

Let a tuple variable t stands for a tuple. Then t[name] denotes the value of t on the name attribute.

(3) Relations are Unordered¶

Order of tuples is irrelevant (tuples may be stored in an arbitrary order), and tuples in a relation are no duplicate(完全一样的).
E.g. department(dep_name, building, budget) relation with unordered tuples.

(4) Keys(码、键)¶

Let \(K \subseteq R\)
K is a superkey(超码) of R if values for K are sufficient to identify a unique tuple of each possible relation r(R)
- Example: {instructor-ID, instructor-name} and {instructor-ID} are both superkeys of instructor.
K is a candidate key(候选码) if K is minimal superkey.Example: {instructor-ID} is a candidate key for instructor, since it is a superkey, and no subset of it is a superkey.
K is a primary key(主码), if k is a candidate key and been defined by user explicitly. Primary key is usually marked by underline.

(5) Foreign key(外键，外码)¶

Assume there exists relation r and s: r(A, B, C), s(B, D), we can say that attribute B in relation r is foreign key referencing s, and r is a referencing relation(参照关系) , and s is a referenced relation(被参照关系).

an instance

学生(学号,姓名,性别,^^专业号^^,年龄) - 参照关系
专业(^^专业号^^,专业名称) - 被参照关系 (目标关系)
其中属性专业号称为关系学生的外码。

(6) Schema of the University Database¶

classroom(building,room_number,capacity)
department(dept_name,building,budget)
course(course_id,title,dept_name,credits)
instructor(ID,name,dept_name,salary)
section(course_id,sec_id,semester,year,building,room_number,time_slot_id)
teaches(ID,course_id,sec_id,semester,year)
student(ID,name,dept_name,tot_cred)
takes(ID,course_id,sec_id,semester,year,grade)
advisor(s_ID,i_ID)
time_slot(time_slot_id,day,start_time,end_time)
prereq(course_id,prereq_id)

figure 8. Schema Diagram (模式图) for the University Database

(7)Query Languages¶

Language in which user requests information from the database.
“Pure” languages:
- Relational Algebra - the basis of SQL
- Tuple Relational Calculus (元组关系演算)
- Domain Relational Calculus - (域关系演算) QBE
Pure languages form underlying basis of query languages that people use, e.g. SQL.

3. Relational Algebra¶

Procedural language (in some extent).
Six basic operators
- Select 选择
- Project 投影
- Union 并
- set difference 差（集合差）
- Cartesian product 笛卡儿积
- Rename 改名（重命名）
The operators take one or two relations as inputs and give a new relation as a result.
Additional operations
- Set intersection 交
- Natural join 自然连接
- Division 除
- Assignment 赋值

(1)Select Operation¶

Notation: \(\sigma_p(r)\), \(\sigma\) is pronounced as sigma
p is called the selection predicate
Defined as: \(\sigma_p(r) = {t|t\in r and p(t)}\)
- Where p is a formula in propositional calculus consisting of terms connected by: ∧ (and), ∨ (or), ¬ (not)
- Each term is one of: op or , where op is one of: =, ≠ , >, ≥ , < , ≤
Example of selection: \(\sigma_{dept\_name='Finance'}(department)\)

Let's show an example below:

figure 9. An example of Select Operation

(2)Project Operation¶

Notation:\(\prod_{A1, A2, ... Ak}(r)\), \(\prod\)is pronounced as pi, where \(A_1, … A_k\) are attribute names and r is a relation name
The result is defined as the relation of k columns obtained by erasing the columns that are not listed
Duplicate rows removed from result, since relations are sets.
E.g. To eliminate the building attribute of department, we can use: \(\prod_{building}(department)\)

figure 10. An example of Project Operation

(3)Union Operation¶

Notation: \(r\cup s\)
Defined as: \(r\cup s = {t|t\in r \; or \; t\in s}\)
For \(r\cup s\) to be valid:
- r, s must have the same arity (same number of same attributes)
- The attribute domains must be compatible (e.g., 2nd column of r deals with the same type of values as does the 2nd column of s)

figure 11. An example of Union Operation

(4)Set Difference Operation¶

Notation: \(r - s\)
Defined as: \(r - s = \{t | t\in r \ \ {and} \ \ r \notin s\}\)
Set differences must be taken between compatible relations.
- r and s must have the same arity
- attribute domains of r and s must be compatible

(5)Cartesian Product Operation(广义笛卡尔积)¶

Notation: \(r \times s\)
Defined as: \(r \times s = \{ \{t \ q \}\ t \in r \ \ {and} \ \ q \in s\}\)
Assume that attributes of r(R) and s(S) are disjoint. (That is, \(R \cap S = \emptyset\))
If attributes of r(R) and s(S) are not disjoint, then renaming for attributes must be used.

figure 12. An example of Cartesian Product Operation(disjoint)

figure 13. An example of Cartesian Product Operation(not disjoint)

(6)Composition of Operation¶

Can build expressions using multiple operations

figure 14. An example of Composition of Operation

(7)Rename Operation¶

Allows us to name, and therefore to refer to, results of relational-algebra expressions.
Allow us to refer to a relation by more than one name.
Then we can have two examples below:
- \(\rho_X(E)\), \(\rho\) is pronounced as rho returns the expression E under the name X
- If a relational-algebra expression E has arity n, then \(\rho_{X(A1, A2, ..., An)}(E)\) returns the result of expression E under the name X, and with the attributes renamed to \(A1, A2, ..., An\)
We have an example below:

Basic information:

branch (branch-name, branch-city, assets)

customer (customer-name, customer-street, customer-city)

account (account-number, branch-name, balance)

loan (loan-number, branch-name, amount)

depositor (customer-name, account-number)

borrower (customer-name, loan-number)

Queries:

Find the names of all customers who have a loan at the Perryridge branch.

Query1: \(\prod_{customer-name}(\sigma_{branch-name = "Perryridge"(\sigma_{borrower.loan-number = loan.loan-number}(borrow\times loan))})\)

Query2: \(\prod_{customer-name}(\sigma_{borrower.loan-number=loan.loan-number}(borrower\times (\sigma_{branch-name="Perryridge"}(loan))))\)

Above are two kinds of queries, which one is better?

The second is better: when it comes to \(\times\), the fewer tuples a relation has, the better it participates in the operation.

找最大/最小时，先作笛卡尔积，然后用总集减去其补集

below are some additional operations

(8)Set-Intersection Operation¶

Notation: \(r \cap s\)
Defined as: \(r \cap s = \{t|t\in r \ \ and \ \ t\in s\}\)
Assume:
- r, s have the same arity
- attributes of r and s are compatible
Note: \(r \cap s = r-(r-s)\)

(9)Natural-Join Operation¶

Notation: \(r ⋈ s\)
Example:
- R = (A,==B==,C,==D==); S = (E,==B==,==D==)
- Result schema of natural-join or r and s = (A, ==B==, C, ==D==, E);
- \(r ⋈ s = \prod_{r.A, r.B, r.C, r.D, s.E}(\sigma_{r.B = s.B \land r.D = s.D}(r \times s))\)
它和笛卡尔积的辨析：笛卡尔积对于不同relation中的同名attributes选择了换名都保留，但是这个是只取相同部分

figure 15. An example of Natural Join Operation

Theta join: \(r\ \mathop{\bowtie}\limits_{\theta}\ s = \sigma_{\theta}(r\times s)\)

(10)Division Operation¶

Let r and s be relations on schemas R and S respectively where
- R = (\(A_1\), ..., \(A_m\), \(B_1\), ..., \(B_n\))
- S = (\(B_1\), ..., \(B_n\))
The result of \(r \div s\) is a relation on schema \(R-S = (A_1, ..., A_m)\)
- \(r \div s = \{t \ | \ t\in \prod_{R-S}(r) \ \land \ \forall u \in s (tu\in r)\}\)

figure 16. An example of Natural Join Operation

(11)Assignment Operation¶

The assignment operation (\(\leftarrow\)) provides a convenient way to express complex queries
- Write query as a sequential program consisting of a series of assignments, which is followed by an expression whose value is displayed as a result of the query.
- Assignment must always be made to a temporary relation variable
Example: Write \(r\div s\) as:
- temp1 \(\leftarrow\) \(\prod_{R-S}(r)\)
- temp2 \(\leftarrow\) \(\prod_{R-S}((temp1\times s)-\prod_{R-S,S}(r))\)

4. Extended Relational-Algebra-Operations¶

(1)Generalized Projection¶

Extends the projection operation by allowing arithmetic functions to be used in the projection list: \(\prod_{F1, F2, ..., Fn}(E)\)
E is any relational-algebra expression
Each of \(F_1, F_2, ..., F_n\) are arithmetic expressions involving constants and attributes in the schema of E.

(2)Aggregate Functions and Operations¶

Aggregation function takes a collection of values and returns a single value as a result.
Aggregation operation in relational algebra: \(G_1, G_2, ..., G_n \ g \ F1(A1), F2(A2), ..., Fn(An)(E)\)
- \(E\) is any relational-algebra expression
- \(G_1, G_2, ..., G_n\) is a list of attributes on which to group(can be empty)
- Each \(F_i\) is an aggregate function
- Each \(A_i\) is an attribute name

An example:

(3)Outer Join¶

An extension of the join operation that avoids loss of information.
Computes the join and then adds tuples form one relation that does not match tuples in the other relation to the result of the join.
Uses null values:
- null signifies that the value is unknown or does not exist
- All comparisons involving null are (roughly speaking) false by definition.
  - Will study precise meaning of comparisons with nulls later

Below I will show several kinds of outer join types: An example:

figure 18. An example of Three kinds of outer join operation

figure 19. An example of Full Outer Join

(4)Null Values¶

It is possible for tuples to have a null value, denoted by null, for some of their attributes
null signifies an unknown value or that a value does NOT exist.
The result of any arithmetic expression involving null is null.
Aggregate functions simply ignore null values
- Is an arbitrary decision. Could have returned null as result instead.
- We follow the semantics of SQL in its handling of null values
For duplicate elimination and grouping, null is treated like any other value, and two nulls are assumed to be the same
- Alternative: assume each null is different from each other
- Both are arbitrary decisions, so we simply follow SQL

unknown

If false was used instead of unknown, then not (A < 5) would not be equivalent to A >= 5
Three-valued logic using the truth value unknown:
- OR
  - (unknown or true) = true,
  - (unknown or false) = unknown,
  - (unknown or unknown) = unknown
- AND:
  - (true and unknown) = unknown,
  - (false and unknown) = false,
  - (unknown and unknown) = unknown
- NOT: (not unknown) = unknown

(5)Modification of the Database¶

The content of the database may be modified using the following operations:
- Deletion: \(r \leftarrow r - E\)
- Insersion: \(r \leftarrow r \cup E\)
- Updating: \(r \leftarrow \prod_{F1, F2, ..., Fi}(r)\)