5 SQL Set Operations
Coding
SQL
Database
Set Operations
This lecture discusses SQL set operations, including UNION, INTERSECT, and EXCEPT. It also covers the difference between bag and set semantics in SQL.
Duplicates in SQL
- A table can have duplicate tuples, unless this would violate an integrity constraint.
- And
SELECT-FROM-WHERE (SFW)
statements leave duplicates in, unless you say not to! - Why?
- Getting rid of duplicates is expensive!
- We may want the duplicates because they tell us how many times something occurred.
Bags in SQL
- SQL treats tables as “bags” (or “multisets”) rather than sets.
- Bags are just like sets, but duplicates are allowed.
Tip 1: Example: Bag Semantics
- \(\{6, 2, 7, 1, 9\}\) is a set and a bag
- \(\{6, 2, 7, 1, 9, 1\}\) is not a set but a bag
- Let sets, order doesn’t matter: \(\{6, 2, 7, 1, 9, 1\}=\{1, 1, 2, 6, 7, 9\}\)
- Operations \(\cap\), \(\cup\), and \(-\) with bags:
- For \(\cap\), \(\cup\), and \(-\) the number of occurrences of a tuple in the result requires some thought.
- Suppose tuple \(t\) occurs:
- \(m\) times in relation \(R\), and
- \(n\) times in relation \(S\)
Operation | Number of Occurrences of \(t\) in tuples |
---|---|
\(R\cap S\) | \(\min(m,n)\) |
\(R\cup S\) | \(m+n\) |
\(R-S\) | \(\max(m-n,0)\) |
- \(\cap\), \(\cup\), and \(-\) in SQL:
<subquery>) UNION (<subquery>)
(<subquery>) INTERSECT (<subquery>)
(<subquery>) EXCEPT (<subquery>) (
- The parentheses
()
are mandatory - The operands must be queries; you can’t simply use a relation name.
Tip 2: Example: Set Operations in SQL
SELECT sid
(FROM Took
WHERE grade > 95)
UNION
SELECT sid
(FROM Took
WHERE grade < 50);
- Bag vs. Set Semantics: which is used and when
- A
SELECT-FROM-WHERE
statement uses bag semantics by default.- Duplicates are kept in the result
- The set (
INTERSECT/UNION/EXCEPT
) operations use set semantics by default- Duplicates are eliminated from the result
- A
- Motivation: Efficiency
- When doing projection, it is easier not to eliminate duplicate
- Just work one tuple at a time
- For intersection or difference, it is most efficient to sort the relations first.
- At that point you may was well eliminate the duplicates anyway
- When doing projection, it is easier not to eliminate duplicate
- However, we can control which semantic is used.
- We can force the result of a
SFW
query to be a set by usingSELECT DISTINCT
- We can force the result of a set operation to be a bag by using
ALL
.
- We can force the result of a
Tip 3: Example: Force to Use Set Operations in SQL
SELECT sid
(FROM Took
WHERE grade > 95)
UNION ALL
SELECT sid
(FROM Took
WHERE grade < 50);
Tip 4: Example: Comparison of Set and Bag Semantics
- A single occurrence of a value for x in B wipes out all occurrences of it from A:
SELECT x FROM A) EXCEPT (SELECT x FROM B) (
- With
EXCEPT ALL
, we match up the value one by one:
SELECT x FROM A) EXCEPT (SELECT x FROM B) (
Views
The idea
- A view is a relation defined in terms of stored tables (called base tables) and possibly also other views.
- Access a view like any base table.
- Two kinds of view:
- Virtual: no tuples are stored; view is just a query for constructing the relation when needed.
- Materialized: actually constructed and stored. Expensive to maintain.
- Views are particularly important when you want to give different access rights (i.e. permissions) to different users viewing data in your tables!
Tip 5: Example: Application of Views
Canvas student page vs. instructor page
Tip 6: Example: Creating a View
- A view for students who earned an 80 or higher in a CSC course:
CREATE VIEW topresults AS
SELECT firstname, surname, cnum
FROM Student, Took, Offering
WHERE
= Took.sid AND
Student.sid oid = Offering.oid AND
Took.>= 80 AND dept = 'CSC'; grade
Uses of Views
- Break down a large query
- Provide another way of looking at the same data, e.g. for one category of user
- Wrap commonly used complex queries