sql.js
内置类型 | 含义 |
---|---|
CHAR(n) or CHARACTER(n) | 定长度字符串 |
VARCHAR(n) or CHARACTER VARYING(n) | 变长度字符串(最大长度为 n ) |
INT or INTEGER | |
S | |
NUMERIC(p, d) | 定点数(p 位十进制小数,小数点后 d 位) |
REAL , DOUBLE PRECISION | 浮点数 |
FLOAT(n) | 浮点数(至少 n 位十进制小数) |
创建 relation:
CREATE TABLE r (
attribute_1 domain_1 <NOT NULL>, ..., attribute_n domain_n <NOT NULL>,
<integrity_constraint_1>, ..., <integrity_constraint_1>
);
其中 NOT NULL
规定该 attribute 不能取空值,integrity_constraint_i
可以是任意 integrity constraints,例如:
PRIMARY KEY (A_{j_1}, ..., A_{j_m}) -- 规定 r 的 m 个 attributes 为 r 的主键,其值唯一且不能为空
FOREIGN KEY (A_{k_1}, ..., A_{k_n}) REFERENCES s -- 规定 r 的 n 个 attributes 值必须为 s 的主键值
删除 relation:
DROP TABLE r; -- 删除 r 及其 schema
DELETE FROM r; -- 只删除 r 中的 tuples
增删 attributes:
ALTER TABLE r ADD Attribute Domain; -- 增加一列,各 tuples 的该属性值为 NULL
ALTER TABLE r DROP Attribute; -- 删除一列
查询单一 attribute:
SELECT dept_name FROM instructor; -- 结果可能含重复 tuples
SELECT DISTINCT dept_name FROM instructor; -- 从上述结果中去除重复
查询多个 attributes 并做算术运算:
SELECT ID, name, salary * 1.1 FROM instructor;
带条件(可用 AND
, OR
, NOT
相连)查询:
SELECT name FROM instructor WHERE dept_name = 'Comp. Sci.' AND salary > 70000;
不同 relations 的同名 attributes 以 relation.attribute
的方式区分:
SELECT name, instructor.dept_name, building
FROM instructor, department
WHERE instructor.dept_name = department.dept_name;
一般形式的查询由三个 clauses 构成:
SELECT attribute_1, ..., attribute_n
FROM relation_1, ..., relation_m
WHERE predicate;
逻辑上分三步:
FROM
-clause 中的 relation
s 的 Cartesian productWHERE
-clause 中的 predicate
筛选上述 Cartesian product 的 tuplesSELECT
-clause 中的 attribute
s of 上述 tuples(可以用 *
表示所有 attributes)AS
— 重命名重命名 attribute:
SELECT name AS instructor_name, course_id
FROM instructor, teaches
WHERE instructor.ID = teaches.ID;
重命名 relations:
SELECT T.name, S.course_id
FROM instructor AS T, teaches AS S
WHERE T.ID = S.ID;
LIKE
— 字符串匹配字符串用单引号界定,字符串内的单引号用双引号代替。
模式匹配:
%
匹配任意子字符串_
匹配任意字符\
表示转义字符例如
SELECT dept_name FROM department
WHERE building LIKE '%Watson%'; -- 含 Watson 的 building
SELECT dept_name FROM department
WHERE building NOT LIKE '%Watson%'; -- 不含 Watson 的 building
ORDER BY
— 输出排序按某个 attribute 升序排列:
SELECT name FROM instructor WHERE dept_name = 'Physics' ORDER BY name;
按多个 attributes 依次排列:
-- 先按 salary 降序排列,再对相同 salary 的 tuples 按 name 升序排列
SELECT * FROM instructor ORDER BY salary DESC, name ASC;
BETWEEN
— 数值范围SELECT name FROM instructor WHERE salary BETWEEN 90000 AND 100000;
-- 等价于
SELECT name FROM instructor WHERE salary <= 100000 AND salary >= 90000;
SELECT name, course id FROM instructor, teaches
WHERE instructor.ID = teaches.ID AND dept_name = 'Biology';
-- 等价于
SELECT name, course id FROM instructor, teaches
WHERE (instructor.ID, dept_name) = (teaches.ID, 'Biology');
UNION
(SELECT ...) UNION (SELECT ...); -- 集合并运算,结果不含重复的 tuples
(SELECT ...) UNION ALL (SELECT ...); -- 结果保留重复的 tuples,重复次数 = SUM(各 queries 中的重复次数)
INTERSECT
(SELECT ...) INTERSECT (SELECT ...); -- 集合交运算,结果不含重复的 tuples
(SELECT ...) INTERSECT ALL (SELECT ...); -- 结果保留重复的 tuples,重复次数 = min(各 queries 中的重复次数)
⚠️ MySQL 不支持 INTERSECT
。
EXCEPT
(SELECT ...) EXCEPT (SELECT ...); -- 集合差运算,结果不含重复的 tuples
(SELECT ...) EXCEPT ALL (SELECT ...); -- 结果保留重复的 tuples,重复次数 = max(0, (query1 中的重复次数) - (query2 中的重复次数))
⚠️ MySQL 不支持 EXCEPT
;Oracle 用 MINUS
代替 EXCEPT
;Oracle-12c 用 MULTISET EXCEPT
代替 EXCEPT ALL
。
WHERE
-clause 中
NULL
的算术运算,结果为 NULL
;NULL
的比较运算,结果为 UNKNOWN
。TRUE AND UNKNOWN -- 结果为 UNKNOWN
FALSE AND UNKNOWN -- 结果为 FALSE
TRUE OR UNKNOWN -- 结果为 TRUE
FALSE OR UNKNOWN -- 结果为 UNKNOWN
NOT UNKNOWN -- 结果为 UNKNOWN
SELECT
-clause 中的 DISTINCT
将两个 NULL
视为相同的值。
COALESCE
以任意多个相同类型为输入,返回第一个非空值:
SELECT ID, COALESCE(salary, 0/* 将 NULL 替换为 0 */) AS salary
FROM instructor;
DECODE
in Oracle不要求类型相同,按第一个匹配替换:
DECODE(value,
match_1, replacement_1,
...,
match_n, replacement_n,
default_replacement);
⚠️ 与一般情形不同,NULL
与 NULL
视为相等。
将 NULL
替换为 N/A
:
SELECT ID, DECODE(salary, NULL, 'N/A', salary) AS salary
FROM instructor;
SQL 提供 5 个基本聚合函数,它们以集合为输入,以单值(的集合)为输出。
AVG
, SUM
的输入必须是数值的集合MIN
, MAX
, COUNT
的输入可以是其他类型数据的集合除 COUNT(*)
外,均忽略 NULL
;作用于空集时,COUNT
返回 0
,其余返回 NULL
。
SELECT AVG(salary) AS avg_salary
FROM instructor WHERE dept_name = 'Comp. Sci.';
SELECT COUNT(DISTINCT ID)
FROM teaches WHERE semester = 'Spring' AND year = 2018;
SELECT COUNT(*) FROM course;
GROUP BY
— 分组按 dept_name
分组,计算各组的 AVG(salary)
:
SELECT dept_name, AVG(salary) AS avg_salary
FROM instructor GROUP BY dept_name;
⚠️ 未出现在 GROUP BY
-clause 里的 attributes,在 SELECT
-clause 中只能作为聚合函数的输入,不能作为输出的 attributes。
HAVING
— 组条件平均工资大于 42000 的系:
SELECT dept_name, AVG(salary) AS avg_salary
FROM instructor
GROUP BY dept_name
HAVING AVG(salary) > 42000;
逻辑顺序:
FROM
-clause 构造 Cartesian productWHERE
-clause 筛选 tuplesGROUP BY
-clause 分组(默认为一组)HAVING
-clause 对各 groups 进行筛选SELECT
-clause 指定的 attributes (of groups)IN
— $\in$这里的“集合”可以是形如 (SELECT ...)
的子查询结果,或形如 (v_1, ..., v_n)
的枚举集。
与 INTERSECT
等价:
SELECT DISTINCT course_id FROM section
WHERE semester = 'Fall' AND year = 2017 AND
course_id IN (SELECT course_id FROM section
WHERE semester = 'Spring' AND year = 2018);
与 EXCEPT
等价:
SELECT DISTINCT course_id FROM section
WHERE semester = 'Fall' AND year = 2017 AND
course_id NOT IN (SELECT course_id FROM section
WHERE semester = 'Spring' AND year = 2018);
SOME
— $\exists$-- salary 大于子查询结果中的某个 salary
SELECT name FROM instructor
WHERE salary > SOME (SELECT salary FROM instructor WHERE dept_name = 'Biology');
⚠️ 与 ANY
为同义词,早期版本的 SQL 只支持 ANY
。
ALL
— $\forall$-- salary 大于子查询结果中的所有 salary
SELECT name FROM instructor
WHERE salary > ALL (SELECT salary FROM instructor WHERE dept_name = 'Biology');
EXISTS
— 集合非空SELECT course_id FROM section AS S
WHERE semester = 'Fall' AND year = 2017 AND
EXISTS (SELECT * FROM section AS T
WHERE semester = 'Spring' AND year = 2018 AND S.course_id = T.course_id);
其中 S
在外层查询定义,可以在内层子查询中使用。作用域规则与高级编程语言类似。
$A\supset B$ 可以表示为
NOT EXISTS (B EXCEPT A)
上过生物系所有课程的学生:
SELECT S.ID, S.name FROM student AS S
WHERE NOT EXISTS (
(SELECT course_id FROM course WHERE dept_name = 'Biology') -- Biology 的所有课程
EXCEPT
(SELECT T.course_id FROM takes AS T WHERE S.ID = T.ID) -- 学号为 S.ID 的学生上过的课程
);
UNIQUE
— 无重复2017 年至多开过一次的课程:
SELECT T.course_id FROM course AS T
WHERE UNIQUE (SELECT R.course_id FROM section AS R
WHERE T.course_id = R.course_id AND R.year = 2017);
等价于
SELECT T.course_id FROM course AS T
WHERE 1 >= (SELECT COUNT(R.course_id) FROM section AS R
WHERE T.course_id = R.course_id AND R.year = 2017);
⚠️ 若 $t_1$ 与 $t_2$ 至少有一个同名 attribute 的值均为 NULL
,其余同名 attributes 的值均非空且相等,则 $t_1=t_2$ 返回 UNKNOWN
;而 UNIQUE
当且仅当存在 $t_1=t_2$ 为 TRUE
时才返回 FALSE
;故在此情形下,UNIQUE
依然返回 TRUE
。
FROM
-clause 中的子查询与 HAVING
等价的写法:
SELECT dept_name, avg_salary
FROM (SELECT dept_name, AVG(salary) AS avg_salary
FROM instructor GROUP BY dept_name)
WHERE avg_salary > 42000;
子查询结果是一个 relation,可将其命名为 dept_avg
,它含有 dept_name
, avg_salary
这两个 attributes:
SELECT dept_name, avg_salary
FROM (SELECT dept_name, AVG(salary) FROM instructor GROUP BY dept_name)
AS dept_avg (dept_name, avg_salary)
WHERE avg_salary > 42000;
⚠️ MySQL 及 PostgreSQL 规定 FROM
-clause 中的子查询结果必须被命名。
自 SQL-2003 起,支持用 LATERAL
访问 FROM
-clause 中已出现过的 relation:
SELECT name, salary, avg_salary
FROM instructor I1, LATERAL (SELECT AVG(salary) AS avg_salary
FROM instructor I2
WHERE I2.dept_name = I1.dept_name);
WITH
— Temporary Relations拥有最大预算的系:
with max_budget (value) -- 创建临时关系 max_budget,其唯一的属性名为 value
AS (SELECT MAX(budget) FROM department)
SELECT dept_name
FROM department, max_budget
WHERE department.budget = max_budget.value;
通常比嵌套的子查询更清晰,且临时关系可在多处复用。
可以创建多个临时关系:
WITH
/* 临时关系 1 */dept_total (dept_name, value)
AS (SELECT dept_name, SUM(salary) FROM instructor GROUP BY dept_name),
/* 临时关系 2 */dept_total_avg(value)
AS (SELECT AVG(value) FROM dept_total)
SELECT dept_name
FROM dept_total, dept_total_avg
WHERE dept_total.value > dept_total_avg.value; -- 总工资 > 平均总工资
返回单值(之集)的子查询,可用在 SELECT
-, WHERE
-, HAVING
-clauses 中接收单值的地方。
查询各系及其讲师人数:
SELECT dept_name,
(SELECT COUNT(*) FROM instructor
WHERE department.dept_name = instructor.dept_name
) AS num_instructors/* 该系讲师人数 */
FROM department;
若含有 WHERE
-clause,则先完成该 clause,再修改 relation。
DELETE FROM
与 SELECT
类似:
DELETE FROM relation WHERE predicate;
每次只能从一个 relation 中删除 tuples。
WHERE
-clause 可以含子查询:
DELETE FROM instructor
WHERE salary < (SELECT AVG(salary) FROM instructor);
INSERT INTO
按 attributes 在 schema 中的顺序插入 values:
INSERT INTO course -- attributes 依次为 course_id, title, dept_name, credits
VALUES ('CS-437', 'Database Systems', 'Comp. Sci.', 4);
或显式给定顺序(可以与 schema 中的不一致):
INSERT INTO course (title, course_id, credits, dept_name)
VALUES ('Database Systems', 'CS-437', 4, 'Comp. Sci.');
更一般的,可以插入查询结果:
-- 从 student 中找到音乐系总学分超过 144 的学生,将他们插入 instructor
INSERT INTO instructor
SELECT ID, name, dept_name, 18000
FROM student
WHERE dept_name = 'Music' AND tot_cred > 144;
UPDATE ... SET
所有讲师涨薪 5%:
UPDATE instructor
SET salary = salary * 1.05;
收入小于平均收入的讲师涨薪 5%:
UPDATE instructor
SET salary = salary * 1.05
WHERE salary < (SELECT AVG(salary) FROM instructor);
条件分支:
UPDATE instructor
SET salary =
CASE
WHEN salary <= 50000 THEN salary * 1.05 -- [0, 50000]
WHEN salary <= 100000 THEN salary * 1.03 -- (50000, 100000]
ELSE salary * 1.01 -- (100000, infty)
END
标量子查询可用于 SET
-clause:
-- 将每个 student 的 tot_cred 更新为已通过(grade 非空不等于 F)课程的学分之和
UPDATE student
SET tot_cred = (
SELECT SUM(credits) -- 若未通过任何课程,则返回 NULL
FROM takes, course
WHERE student.ID = takes.ID AND takes.course_id = course.course_id
AND takes.grade <> 'F' AND takes.grade IS NOT NULL);
CROSS JOIN
表示 Cartesian product,可以用 ,
代替:
SELECT COUNT(*) FROM student CROSS JOIN takes;
-- 等价于
SELECT COUNT(*) FROM student, takes;
NATURAL JOIN
只保留 Cartesian product 中同名 attributes 取相同值的 tuples,且同名 attributes 只保留一个。
SELECT name, course_id FROM student, takes WHERE student.ID = takes.ID;
-- 等价于
SELECT name, course_id FROM student NATURAL JOIN takes;
可以用 JOIN r USING (a)
指定与 r
连接时需相等的 attribute(s):
-- (student NATURAL JOIN takes) 与 course 有两个同名 attributes (course_id, dept_name)
SELECT name, title FROM (student NATURAL JOIN takes)
JOIN course using (course_id); -- 保留 course_id 相等的 tuples
SELECT name, title FROM (student NATURAL JOIN takes)
NATURAL JOIN course; -- 保留 dept_name, course_id 均相等的 tuples
ON
— Conditional JoinSELECT * FROM student, takes WHERE student.ID = takes.ID;
-- 等价于
SELECT * FROM student JOIN takes ON student.ID = takes.ID; -- 同名 attributes 均保留
-- 几乎等价于
SELECT * FROM student NATURAL JOIN takes; -- 同名 attributes 只保留一个
INNER JOIN
以上 JOIN
s 都是 INNER JOIN
,其中 INNER
可以省略。
OUTER JOIN
OUTER JOIN
为没有参与 INNER JOIN
的单侧 tuple 提供 NULL
值配对,即:允许来自一侧 tuple 在另一侧中缺少与之匹配的 tuple。在连接后的 tuple 中,缺失的值置为 NULL
。
在连接结果中保留没有选课的学生,其选课信息置为 NULL
:
-- LEFT OUTER JOIN 允许 left tuple 缺少与之匹配的 right tuple
SELECT * FROM student NATURAL LEFT OUTER JOIN takes;
-- RIGHT OUTER JOIN 允许 right tuple 缺少与之匹配的 left tuple
SELECT * FROM takes NATURAL RIGHT OUTER JOIN student;
x FULL OUTER JOIN y
-- 等价于
(x LEFT OUTER JOIN y) UNION (x RIGHT OUTER JOIN y)
OUTER JOIN
也可以配合 ON
使用:
SELECT * FROM student LEFT OUTER JOIN takes ON student.ID = takes.ID; -- 除 ID 保留两次外,几乎等价于 NATURAL LEFT OUTER JOIN
SELECT * FROM student LEFT OUTER JOIN takes ON (1 = 1); -- 等价于 cross join(所有 tuples 均参与 inner join,不提供 NULL 值配对)
SELECT * FROM student LEFT OUTER JOIN takes ON (1 = 1) WHERE student.ID = takes.ID; -- 等价于 NATURAL JOIN
with
-clause 可在单个 query 内创建临时关系。
CREATE VIEW
CREATE VIEW view_name AS <query_expression>;
CREATE VIEW view_name(attribute_1, ..., attribute_n) AS <query_expression>;
各系系名及该系讲师的总工资:
CREATE VIEW department_total_salary(dept_name, total_salary) AS
SELECT dept_name, SUM(salary) FROM instructor GROUP BY dept_name;
为避免数据过期,view 通常在被使用时才会去执行 query。
为节省时间,某些数据库系统支持 materialized view,以负责预存并(在 query 中的 relation(s) 被更新时)更新 view 中的数据。存在多种更新策略:
满足以下条件的 view 可以被更新:
FROM
-clause 只含 1 个实际 relationSELECT
-clause 只含 attribute names,不含表达式、聚合函数、DISTINCT
修饰NULL
值GROUP BY
或 HAVING
💡 推荐用 trigger 机制更新 view。
每个 transaction 由一组不可分的 statements 构成,整体效果为 all-or-nothing,只能以以下两种方式之一结束:
MySQL、PostgreSQL 默认将每一条 statement 视为一个 transaction,且执行完后自动提交。
为创建含多条 statements 的 transaction,必须关闭自动提交机制。
BEGIN ATOMIC ... END
中,以创建 transaction。BEGIN
但不支持 END
,必须以 COMMIT
或 ROLLBACK
结尾。从 Alice’s 账户向 Bob’s 账户转账 100 元,所涉及的两步 UPDATE
操作是不可分的:
BEGIN;
UPDATE accounts SET balance = balance - 100.00 WHERE name = 'Alice';
UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
COMMIT; -- 如果 Alice's 账户余额为负或其他故障,可以用 ROLLBACK 回滚到交易前的状态
PostgreSQL 支持更精细的提交/回滚控制:
BEGIN;
UPDATE accounts SET balance = balance - 100.00 WHERE name = 'Alice';
SAVEPOINT my_savepoint;
UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob';
-- oops ... forget that AND use Wally's account
ROLLBACK TO my_savepoint; -- 在 my_savepoint 之后的 savepoints 将被自动释放
UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Wally';
COMMIT;
可以在 CREATE TABLE
时给定,也可以向已有的 relation 中添加:
ALTER TABLE relation ADD <integrity_constraint>;
NOT NULL
— 非空值默认 NULL
属于所有 domains;若要从某个 domain 中排除 NULL
,可在 domain 后加 NOT NULL
:
name VARCHAR(20) NOT NULL
budget NUMERIC(12,2) NOT NULL
PRIMARY KEY
默认为 NOT NULL
。
UNIQUE
— SuperkeyUNIQUE (A_1, ..., A_n) -- 这组 attributes 构成一个 superkey,即不同 tuples 的取值不能重复
⚠️ NULL
不等于任何值,参见 NULL = NULL
。
CHECK
— 条件检查CREATE TABLE department
(...,
budget NUMERIC(12,2) CHECK (budget > 0)/* 预算值必须为正 */,
...);
CREATE TABLE section
(...,
semester VARCHAR(6),
CHECK (semester IN ('Fall', 'Winter', 'Spring', 'Summer')),
...);
⚠️ 除 CHECK(TRUE)
外,CHECK(UNKNOWN)
亦返回 TRUE
。
⚠️ SQL 标准支持 CHECK
中含 subquery,但多数系统尚未支持。
REFERENCES
— 外键约束FOREIGN KEY (dept_name) REFERENCES department -- PRIMARY KEY by default
FOREIGN KEY (dept_name) REFERENCES department(dept_name/* PRIMARY KEY or superkey */)
亦可在 attribute 定义中使用:
CREATE TABLE course (
...,
dept_name VARCHAR(20) REFERENCES department,
...
);
违反约束的操作默认被拒绝(transaction 回滚),但 FOREIGN KEY
允许设置 CASCADE
等操作:
FOREIGN KEY (dept_name) REFERENCES department
ON DELETE CASCADE/* 若 department 中的某个 tuple 被删除,则 course 中相应的 tuples 亦被删除 */
ON UPDATE CASCADE/* 若 department 中的某个 tuple 被更新,则 course 中相应的 tuples 亦被更新 */
除 CASCADE
外,还支持 SET NULL
或 SET DEFAULT
操作。
⚠️ 含有 NULL
的 tuple 默认满足约束。
💡 借助 triggers 可实现更一般的 referential integrity constraints。
CONSTRAINT
— 约束命名CREATE TABLE instructor
(...,
salary NUMERIC(8,2), /* 命名的约束 */CONSTRAINT minsalary CHECK (salary > 29000),
...);
ALTER TABLE instructor DROP CONSTRAINT minsalary; -- 删除该约束
某些场景必须临时违反约束,例如:
-- 夫妻二人均以对方姓名为外键,先 insert 任何一人都会违反外键约束
CREATE TABLE person (
name VARCHAR(20),
spouse VARCHAR(20),
PRIMARY KEY (name),
FOREIGN KEY (spouse) REFERENCES person(name)
);
SQL 标准支持
INITIALLY DEFERRED
修饰约束,表示该约束延迟到 transaction 末尾才检查。DEFERRABLE
修饰约束,表示该约束默认立即检查,但可以在 transaction 中用SET CONSTRAINTS <constraint_1, ..., constraint_n> DEFERRED
延迟到末尾。
ASSERTION
CREATE ASSERTION <assertion_name> CHECK <predicate>;
$\forall$ 学生,其 tot_cred
= 其已通过课程的学分之和:
CREATE ASSERTION credits_earned_constraint CHECK (
NOT EXISTS (
SELECT ID FROM student WHERE tot_cred <> (
SELECT COALESCE(SUM(credits), 0)
FROM takes NATURAL JOIN course
WHERE student.ID = takes.ID
AND grade IS NOT NULL AND grade<> 'F'
)
)
);
💡 SQL 不支持 $\forall x, P(x)$,但可以等价的表示为 $\nexists x, \lnot P(x)$。
⚠️ 因开销巨大,多数系统尚未支持 ASSERTION
。
DATE '2018-04-25'
TIME '09:30:00' -- time(3) 表示秒精确到 3 位小数,默认 0 位小数
TIMESTAMP '2018-04-25 10:29:01.45' -- 默认 6 位小数
抽取信息:
EXTRACT(f/* year, month, day, hour, minute, second */ FROM d/* date or time */)
获取当前时间:
CURRENT_DATE
CURRENT_TIME -- 含时区信息
LOCALTIME -- 不含时区信息
CURRENT_TIMESTAMP
LOCALTIMESTAMP
CAST(e AS t)
将表达式 e
转化为类型t
:
SELECT CAST(ID/* 原为 VARCHAR(5) */ AS NUMERIC(5)) AS inst_id
FROM instructor
ORDER BY inst_id -- 按数值比较
MySQL:
format
TO_CHAR(timestamp '2002-04-20 17:31:12.66', 'HH12:MI:SS') → 05:31:12
TO_CHAR(interval '15h 2m 12s', 'HH24:MI:SS') → 15:02:12
TO_CHAR(125, '999') → 125
TO_CHAR(125.8::real, '999D9') → 125.8
TO_CHAR(-125.8, '999D99S') → 125.80-
TO_DATE('05 Dec 2000', 'DD Mon YYYY') → 2000-12-05
TO_NUMBER('12,454.8-', '99G999D9S') → -12454.8
TO_TIMESTAMP('05 Dec 2000', 'DD Mon YYYY') → 2000-12-05 00:00:00-05
DEFAULT
— 默认值CREATE TABLE student (
ID VARCHAR (5),
name VARCHAR (20) NOT NULL,
dept_name VARCHAR (20),
tot_cred NUMERIC(3,0) DEFAULT 0,
PRIMARY KEY (ID)
);
INSERT INTO student(ID, name, dept_name)
VALUES ('12789', 'Newman', 'Comp. Sci.'/* 缺省 tot_cred 值,以 0 补之 */);
*LOB
— Large OBjectCLOB
— Character LOBBLOB
— Binary LOB可以定义 LOB attributes:
book_review CLOB(10KB)
image BLOB(10MB)
movie BLOB(2GB)
⚠️ LOB 的读写效率很低,一般以其 locator 作为 attribute,而非对象本身。
CREATE TYPE
美元与英镑不应当能直接比较、算术运算,可通过定义类型加以区分:
CREATE TYPE Dollars AS NUMERIC(12,2) final;
CREATE TYPE Pounds AS NUMERIC(12,2) final;
CREATE TABLE department (
dept_name VARCHAR (20),
building VARCHAR (15),
budget Dollars
);
CREATE DOMAIN
SQL-92 支持自定义 domain,以施加完整性约束、默认值:
CREATE DOMAIN DDollars AS NUMERIC(12,2) NOT NULL;
CREATE DOMAIN YearlySalary NUMERIC(8,2)
CONSTRAINT salary_value_test CHECK(value >= 29000.00);
⚠️ 不同自定义 domain 的值直接可以直接比较、算术运算。
CREATE TABLE instructor (
ID number(5) GENERATED ALWAYS AS IDENTITY/* 总是由系统自动生成 ID 值 */,
...,
PRIMARY KEY (ID)
);
INSERT INTO instructor(name, dept_name, salary)
VALUES ('Newprof', 'Comp. Sci.', 100000); -- 缺省 ID 值
若 always
替换为 BY DEFAULT
,则允许用户给定 ID 值。
CREATE TABLE instructor (
ID number(5) AUTO_INCREMENT,
...,
PRIMARY KEY (ID)
);
CREATE TABLE instructor (
ID SERIAL,
...,
PRIMARY KEY (ID)
);
相当于
CREATE SEQUENCE inst_id_seq AS INTEGER;
CREATE TABLE instructor (
ID INTEGER DEFAULT nextval('inst_id_seq')
...,
PRIMARY KEY (ID)
);
ALTER SEQUENCE inst_id_seq OWNED BY instructor.ID;
CREATE TABLE temp_instructor LIKE instructor; -- ⚠️ 尚未实现
由查询结果推断 schema:
CREATE TABLE t1 AS (SELECT * FROM instructor WHERE dept_name = 'Music')
WITH DATA/* 多数实现默认带数据,哪怕 WITH DATA 被省略 */;
CREATE SCHEMA
数据库系统 | 操作系统 |
---|---|
catalog | home directory of a user |
schema | a directory in ~ |
relations, views | files |
connect to a DBS | log into a OS |
default catalog & schema | ~ |
catalog5.univ_schema.course | /home/username/filename |
CREATE SCHEMA hollywood
CREATE TABLE films (title text, release date, awards text[])
CREATE VIEW winners AS
SELECT title, release FROM films WHERE awards IS NOT NULL;
DROP SCHEMA hollywood;
等价于
CREATE SCHEMA hollywood;
CREATE TABLE hollywood.films (title text, release date, awards text[]);
CREATE VIEW hollywood.winners AS
SELECT title, release FROM hollywood.films WHERE awards IS NOT NULL;
DROP SCHEMA hollywood;
Index 将一组 attributes 组合为一个 search key,用来避免遍历所有 tuples 从而加速查找。
Index 与物理层相关,而 SQL 标准限于逻辑层,故没有提供 index 定义命令;但很多数据库系统提供了以下命令:
CREATE INDEX <index_name> ON <relation_name> (<attribute_list>);
DROP INDEX <index_name>;
最高权限属于数据库管理员 (DataBase Administrator, DBA),其权限包括授权、重构数据库等。
GRANT <privilege_list>
ON <relation_name/view_name>
TO <user_list/role_list>;
REVOKE <privilege_list>
ON <relation_name/view_name>
FROM <user_list/role_list>;
其中
privilege_list
可以包括
SELECT
,相当于文件系统中的 read 权限。INSERT
,可以在其后附加 (attribute_list)
,表示 INSERT
时只允许提供这些 attributes 的值。UPDATE
,可以在其后附加 (attribute_list)
,表示 UPDATE
时只允许修改这些 attributes 的值。REFERENCES
,可以在其后附加 (attribute_list)
,表示这些 attributes 可以被用作 FOREIGN KEY
或出现在 CHECK
约束中。DELETE
ALL PRIVILEGES
(创建 relation
的 user
自动获得 ALL PRIVILEGES
)。user_list
可以包括
PUBLIC
,表示当前及将来所有用户同类用户应当拥有相同权限。
CREATE ROLE instructor;
GRANT SELECT ON takes TO instructor;
Role 可以被赋予某个具体的 user 或其他 role:
CREATE ROLE dean;
GRANT instructor TO dean; -- 继承 instructor 的权限
GRANT dean TO Robert;
默认当前 session 的 role 为 NULL
,但可显式指定:
SET ROLE role_name;
此后赋权时可附加 GRANTED BY CURRENT_ROLE
,以避免 cascading revocation。
默认不允许转移权限,但可以用 WITH GRANT OPTION
赋予某个 user/role 传递权限的权限:
GRANT SELECT ON department TO Alice WITH GRANT OPTION;
REVOKE OPTION GRANT FOR SELECT ON department FROM Alice;
某个权限的权限传递关系构成一个 directed graph:以 users/roles 为 nodes(其中 DBA 为 root)、以权限传递关系为 edges,每个 user/role 有一条或多条来自 root 的路径。
撤回某个 user/role 的权限可能导致其下游 users/roles 的权限亦被撤销:
REVOKE SELECT ON department FROM Alice; -- 允许 cascading revocation
REVOKE SELECT ON department FROM Alice restrict; -- 如有 cascading revocation 则报错
Java DataBase Connectivity (JDBC)
import java.sql.*;
public static void JDBCexample(String userid, String passwd) {
try (
/* try-with-resources since Java 7 */
Connection conn = DriverManager.getConnection(
"<protocol>@<url>:<port>:<database>", userid, passwd);
Statement stmt = conn.createStatement();
/* 否则需要手动 conn.close(); stmt.close(); */
) {
try {
stmt.executeUpdate("<INSERT|UPDATE|DELETE_statement>");
} catch (SQLException sqle) {
System.out.println("Could not insert tuple. " + sqle);
}
ResultSet rset = stmt.executeQuery("<SELECT_statement>");
while (rset.next()/* for each tuple */) {
System.out.println(rset.getString("<attribute_name>") + " "
+ rset.getFloat(2/* (1-based) 2nd attribute */));
}
} catch (Exception sqle) {
System.out.println("Exception : " + sqle);
}
}
若要在 Java 程序中推断某个 relation 的 schema,可以从 ResultSet
对象中提取元数据:
ResultSetMetaData rsmd = rset.getMetaData();
for(int i = 1; i <= rsmd.getColumnCount(); i++) {
System.out.println(rsmd.getColumnName(i));
System.out.println(rsmd.getColumnTypeName(i));
}
💡 推荐用 prepareStatement
方法(由 SQL 系统完成代入并处理转义),以替代更危险的 String
串联操作:
PreparedStatement pStmt = conn.prepareStatement(
"INSERT INTO instructor VALUES(?, ?, ?, ?)");
pStmt.setString(1, "88877");
pStmt.setString(2, "Perry");
pStmt.setString(3, "Finance");
pStmt.setInt(4, 125000);
pStmt.executeUpdate(); // INSERT INTO instructor VALUES(88877, Perry, Finance, 125000);
pStmt.setString(1, "88878");
pStmt.executeUpdate(); // INSERT INTO instructor VALUES(88878, Perry, Finance, 125000);
类似地,可参数化 SQL 函数、过程调用:
// 需用 registerOutParameter() 注册返回类型
CallableStatement cStmt1 = conn.prepareCall("{? = call some function(?)}"); CallableStatement cStmt2 = conn.prepareCall("{call some procedure(?, ?)}");
psycopg2
is the most popular PostgreSQL database adapter for the Python programming language. pyodbc
is an open source Python module that makes accessing ODBC databases simple.import psycopg2
def PythonDatabaseExample(userid, passwd):
try:
conn = psycopg2.connect(host, port, dbname, user, password)
cur = conn.cursor()
try:
cur.execute("INSERT INTO instructor VALUES(%s, %s, %s, %s)",
("77987", "Kim", "Physics", 98000))
conn.commit()
except Exception as sqle:
print("Could not insert tuple. ", sqle)
conn.rollback()
cur.execute("""SELECT dept_name, AVG(salary)
FROM instructor GROUP BY dept_name""")
for dept in cur:
print dept[0], dept[1]
except Exception as sqle:
print("Exception : ", sqle)
Open Database Connectivity (ODBC)
void ODBCexample() {
RETCODE error;
HENV env; SQLAllocEnv(&env); /* environment */
HDBC conn; SQLAllocConnect(env, &conn); /* database connection */
SQLConnect(conn,
"db.yale.edu", SQL_NTS/* 表示前一个实参是以 '\0' 结尾的字符串 */,
"avi", SQL_NTS, "avipasswd", SQL_NTS);
{
HSTMT stmt; SQLAllocStmt(conn, &stmt); /* statement */
char * sqlquery = "SELECT dept_name, SUM(salary) FROM instructor GROUP BY dept_name";
error = SQLExecDirect(stmt, sqlquery, SQL_NTS);
if (error == SQL_SUCCESS) {
char deptname[80]; int lenOut1;
SQLBindCol(stmt, 1/* 第 1 个 attribute */, SQL_C_CHAR, deptname,
80/* 最大长度 */, &lenOut1/* 实际长度(负值表示 null)的地址 */);
float salary; int lenOut2;
SQLBindCol(stmt, 2/* 第 2 个 attribute */, SQL_C_FLOAT, &salary, 0, &lenOut2);
while (SQLFetch(stmt) == SQL_SUCCESS) {
printf(" %s %g∖n", deptname, salary);
}
}
SQLFreeStmt(stmt, SQL_DROP); /* 所有 allocated 资源都要被 freed */
}
SQLDisconnect(conn);
SQLFreeConnect(conn);
SQLFreeEnv(env);
}
Transactions 相关:
SQLSetConnectOption(conn, SQL_AUTOCOMMIT, 0);
SQLTransact(conn, SQL_COMMIT);
SQLTransact(conn, SQL_ROLLBACK);
⚠️ 实际数据库系统给出的具体实现不同于 SQL 标准(本节)。
DECLARE <variable_name> <type>; -- 声明变量
SET <variable_name> = <value> -- 变量赋值
BEGIN <SQL_statements> END -- 复合语句
BEGIN ATOMIC <SQL_transaction> END -- 不可分的复合语句
循环(LEAVE
相当于 break
,ITERATE
相当于 continue
):
WHILE boolean_expression DO
sequence_of_statements;
END WHILE
REPEAT
sequence_of_statements;
UNTIL boolean_expression
END REPEAT
DECLARE n INTEGER DEFAULT 0;
FOR r AS -- for each row in the table
SELECT budget FROM department;
DO
SET n = n - r.budget;
END FOR
条件分支:
IF boolean_expression THEN
statement_or_compound_statement
ELSEIF boolean_expression THEN
statement_or_compound_statement
ELSE
statement_or_compound_statement
END IF
DECLARE out_of_classroom_seats CONDINTION -- 内置 SQLEXCEPTION, SQLWARNING, NOT FOUND.
DECLARE EXIT/* 或 CONTINUE */ HANDLER FOR out_of_classroom_seats
BEGIN
...
SIGNAL out_of_classroom_seats -- 抛出异常
...
END
输出某系讲师人数:
CREATE FUNCTION dept_count(dept_name VARCHAR(20))
RETURNS INTEGER
BEGIN
DECLARE d_count INTEGER;
SELECT COUNT(*) INTO d_count
FROM instructor
WHERE instructor.dept_name = dept_name
RETURN d_count;
END
-- 或等价的 PROCEDURE
CREATE PROCEDURE dept_count_proc(IN dept_name VARCHAR(20),
OUT d_count INTEGER)
BEGIN
SELECT COUNT(*) INTO d_count
FROM instructor
WHERE instructor.dept_name = dept_count_proc.dept_name
END
-- 调用 PROCEDURE 前,需先声明返回值:
DECLARE d_count INTEGER;
CALL dept_count_proc('Physics', d_count);
输出某系讲师信息:
CREATE FUNCTION instructor_of(dept_name VARCHAR(20))
RETURNS TABLE (ID VARCHAR (5), name VARCHAR (20),
dept_name VARCHAR (20), salary NUMERIC (8,2))
RETURN TABLE (
SELECT ID, name, dept_name, salary
FROM instructor
WHERE instructor.dept_name = instructor_of.dept_name
);
⚠️ 可以同名:
PROCEDURE
s 的 arguments 个数必须不同。FUNCTION
s 的 arguments 个数可以相同,但至少有一个 argument 的类型不同。CREATE FUNCTION dept_count(dept_name VARCHAR(20))
RETURNS INTEGER
LANGUAGE C
EXTERNAL NAME 'path_to_dept_count'
CREATE PROCEDURE dept_count_proc(IN dept_name VARCHAR(20),
OUT d_count INTEGER)
LANGUAGE C
EXTERNAL NAME 'path_to_dept_count_proc'
用例:规定某商品库存的最小值,当售出该商品导致库存量小于最小值时,自动下单订购该商品。
定义 trigger 需指定:
CREATE TRIGGER timeslot_check1
/* Event: */AFTER INSERT ON section
REFERENCING NEW ROW AS nrow FOR EACH ROW -- 遍历 each inserted row
/* Condition: */WHEN (
/* inserted time_slot_id 不属于 time_slot */
nrow.time_slot_id NOT IN (SELECT time_slot_id FROM time_slot)
)
/* Action: */BEGIN ROLLBACK END;
CREATE TRIGGER timeslot_check2
/* Event: */AFTER DELETE ON timeslot
REFERENCING OLD ROW AS orow FOR EACH ROW -- 遍历 each deleted row
/* Condition: */WHEN (
/* deleted time_slot_id 不属于 time_slot */
orow.time_slot_id NOT IN (SELECT time_slot_id FROM time_slot)
AND
/* 且依然被 section 中的 tuple(s) 引用 */
orow.time_slot_id IN (SELECT time_slot_id FROM section)
)
/* Action: */BEGIN ROLLBACK END;
UPDATE
触发的 trigger 可以指定 attributes:
CREATE TRIGGER credits_earned
AFTER UPDATE OF takes ON grade
REFERENCING NEW ROW AS nrow
REFERENCING OLD ROW AS orow
FOR EACH ROW
WHEN (/* 新成绩及格且非空 */nrow.grade <> 'F' AND nrow.grade IS NOT NULL)
AND (/* 旧成绩不及格或为空 */orow.grade = 'F' OR orow.grade IS NULL)
BEGIN ATOMIC
UPDATE student SET tot_cred = tot_cred +
(SELECT credits FROM course WHERE course.course_id = nrow.course_id)
WHERE student.id = nrow.id;
END;
涉及的所有称为 transition tables:
REFERENCING NEW TABLE AS ntbl
REFERENCING OLD TABLE AS otbl
FOR EACH STATEMENT
⚠️ 只能用于 AFTER
triggres。
DISABLE
and ENABLE
Triggers 在创建时默认为启用的。可手动停用或启用:
ALTER TRIGGER <trigger_name> DISABLE;
ALTER TRIGGER <trigger_name> ENABLE;
用例:找到某一课程的所有(直接或间接)先修课程。
创建递归的临时表:
WITH RECURSIVE rec_prereq(course_id, prereq_id) AS (
/* base query */SELECT course_id, prereq_id FROM prereq
UNION
/* recursive query */SELECT rec_prereq.course_id, prereq.prereq_id
FROM rec_prereq, prereq
WHERE rec_prereq.prereq_id = prereq.course_id
)
SELECT * FROM rec_prereq WHERE rec_prereq.course_id = 'CS-347';
若以 CREATE RECURSIVE VIEW
代替 WITH RECURSIVE
,则创建递归的 view。
某些数据库系统允许省略 RECURSIVE
。
Recursive query 必须是单调的,即 $V_1\subset V_2 \implies f(V_1)\subset f(V_2)$,因此不能含有
NOT EXISTS
作用在用到 recursive view 的 subquery 上EXCEPT
右端项含有 recursive view假设 studentgr_grades
有每个学生的 ID
及其 GPA
,按 GPA
降序排序并输出排名:
SELECT ID, RANK() OVER (ORDER BY (GPA) DESC) AS s_rank
FROM student_grades ORDER BY s_rank;
默认将 NULL
视为最大值,可手动设为最小值:
SELECT ID, RANK() OVER (ORDER BY (GPA) DESC NULLS LAST) AS s_rank
FROM student_grades ORDER BY s_rank;
假设有 dept_grades(ID, dept_name, GPA)
,则可先按 dept_name
分组,再对各组按 GPA
排名:
SELECT ID, dept_name,
RANK() OVER (PARTITION BY dept_name ORDER BY GPA DESC) AS dept_rank
FROM dept_grades
ORDER BY dept_name, dept_rank;
其他排名函数:
PERCENT_RANK
定义为分数 $(r-1)/(n-1)$,其中 $n$ 为 tuples 个数,$r$ 为 RANK
结果。CUME_DIST
定义为 $p/n$,其中 $n$ 为 tuples 个数,$p$ 为排名 $\le$ 当前值的个数。ROW_NUMBER
相当于先对各 rows 排序,在输出各 row 的序号。NTILE(n)
将 tuples 按顺序均匀(各桶 tuples 数量至多相差 1
)分入 n
个桶,返回每个 tuple 的桶号。假设 tot_credits(year, num_credits)
含有每年的总学分。
对 (year-3, year]
的值取平均:
SELECT year,
AVG(num_credits) OVER (ORDER BY year ROWS 3 PRECEDING)
AS avg_total_credits
FROM tot_credits;
对 (year-3, year+2)
的值取平均:
SELECT year,
AVG(num_credits) OVER (ORDER BY year ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING)
AS avg_total_credits
FROM tot_credits;
对每年及之前所有年份的值取平均:
SELECT year,
AVG(num_credits) OVER (ORDER BY year ROWS UNBOUNDED PRECEDING) AS avg_total_credits
FROM tot_credits;
Windowing 也支持按 PARTITION
执行:
SELECT dept_name, year,
AVG(num_credits)
OVER (PARTITION BY dept_name ORDER BY year ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
AS avg_total_credits
from tot_credits_dept;
R
导出的 table T
,其中 R
的某个 attribute A
的值被 T
用作 attribute names,相应的值通常取某些聚合函数的返回值。A
。假设有 sales(name, size, color, quantity)
,以下语句得到以 (name, size, dark, pastel, white)
为 attributes 的 pivot-table:
SELECT * FROM sales
PIVOT(
SUM(quantity) -- operations for getting new attribute values
FOR color -- the pivot attribute
IN ('dark', 'pastel', 'white') -- new attribute names
);
GROUP BY ROLLUP(attribute_list)
表示以 attribute_list
的每个 prefix 作为 GROUP BY
的 attributes 列表,再对所有结果取 UNION
:
SELECT item_name, color, SUM(quantity)
FROM sales
GROUP BY ROLLUP(item_name, color);
-- 等价于
(SELECT item_name, color, SUM(quantity)
FROM sales GROUP BY(item_name, color))
UNION
(SELECT item_name, NULL AS color, SUM(quantity)
FROM sales GROUP BY(item_name))
UNION
(SELECT NULL AS item_name, NULL AS color, SUM(quantity)
FROM sales);
GROUP BY CUBE(attribute_list)
表示以 attribute_list
的每个 subset 作为 GROUP BY
的 attributes 列表,再对所有结果取 UNION
:
SELECT item_name, color, size, SUM(quantity)
FROM sales
GROUP BY CUBE(item_name, color, size);
GROUPING
函数可判断 NULL
值是否由 ROLLUP
或 CUBE
产生:
SELECT -- 最终得到 (item_name, color, quantity)
(CASE WHEN GROUPING(item_name) = 1 THEN 'all' ELSE item_name END) AS item_name,
(CASE WHEN grouping(color) = 1 THEN 'all' ELSE color END) AS color,
SUM(quantity) AS quantity
FROM sales GROUP BY ROLLUP(item_name, color);