Find us on GitHub

Filtering

Objectives

  • Write queries that select records that satisfy user-specified conditions.
  • Explain the order in which the clauses in a query are executed.

One of the most powerful features of a database is the ability to filter data, i.e., to select only those records that match certain criteria. For example, suppose we want to list all the books in the catalogue that have been published by Wiley. We can select these records from the Works table by using a WHERE clause in our query:

%load_ext sqlitemagic
%%sqlite swclib.db
SELECT * FROM Works WHERE Publisher="Wiley";
2 SQL for dummies 9781118607961 2013 Hoboken Wiley 8th ed.
8 Discovering SQL 9781118002674 2011 Indianapolis Wiley 400
11 SQL bible 9780470229064 2008 Indianapolis Wiley 2nd ed. 857
13 SQL for dummies 9780470557419 2010 Hoboken Wiley 7th ed. 440
15 Beginning SQL 0764577328 2005 Indianapolis Wiley 501
17 SQL all-in-one 9780470929964 2011 Hoboken Wiley 2nd ed. 708
18 Access 2013 all-in-one 9781118510551 2013 Hoboken Wiley 760

The database manager executes this query in two stages. First, it checks at each row in the Works table to see which ones satisfy the WHERE. It then uses the column names following the SELECT keyword to determine what columns to display.

This processing order means that we can filter records using WHERE based on values in columns that aren't then displayed:

%%sqlite survey.db
SELECT Title FROM Works WHERE Publisher="Wiley";
SQL for dummies
Discovering SQL
SQL bible
SQL for dummies
Beginning SQL
SQL all-in-one
Access 2013 all-in-one

We can use many other Boolean operators to filter our data. For example, we can display only titles that have been published by Wiley in 2011 or later:

%%sqlite swclib.db
SELECT * FROM Works WHERE (Publisher='Wiley') AND (Date>='2011');
2 SQL for dummies 9781118607961 2013 Hoboken Wiley 8th ed.
8 Discovering SQL 9781118002674 2011 Indianapolis Wiley 400
17 SQL all-in-one 9780470929964 2011 Hoboken Wiley 2nd ed. 708
18 Access 2013 all-in-one 9781118510551 2013 Hoboken Wiley 760

(The parentheses around the individual tests aren't strictly required, but they help make the query easier to read.)

Most database managers have a special data type for dates. In fact, many have two: one for dates, such as "May 31, 1971", and one for durations, such as "31 days". SQLite doesn't: instead, it stores dates as either text (in the ISO-8601 standard format "YYYY-MM-DD HH:MM:SS.SSSS"), real numbers (the number of days since November 24, 4714 BCE), or integers (the number of seconds since midnight, January 1, 1970). If this sounds complicated, it is, but not nearly as complicated as figuring out historical dates in Sweden. In our library example, we make this even easier by storing only years (as integers) instead of full dates.

If we want to list all books published either by Wiley or O'Reilly, we can combine the query using OR. Note in passing how the apostrophe in the name O'Reilly is escaped:

%%sqlite swclib.db
SELECT * FROM Works WHERE Publisher='Wiley' OR Publisher='O''Reilly';
1 SQL in a nutshell 9780596518844 2009 Sebastopol O'Reilly 3rd ed. 578
2 SQL for dummies 9781118607961 2013 Hoboken Wiley 8th ed.
3 PHP & MySQL 9781449325572 2013 Sebastopol O'Reilly 2nd ed. 532
4 Using SQLite 9780596521189 2010 Sebastopol O'Reilly 1st ed. 503
8 Discovering SQL 9781118002674 2011 Indianapolis Wiley 400
11 SQL bible 9780470229064 2008 Indianapolis Wiley 2nd ed. 857
12 Learning SQL 9780596520830 2009 Sebastopol O'Reilly 2nd ed. 320
13 SQL for dummies 9780470557419 2010 Hoboken Wiley 7th ed. 440
15 Beginning SQL 0764577328 2005 Indianapolis Wiley 501
17 SQL all-in-one 9780470929964 2011 Hoboken Wiley 2nd ed. 708
18 Access 2013 all-in-one 9781118510551 2013 Hoboken Wiley 760
19 SQL in a nutshell 0596004818 2004 Cambridge O'Reilly 2nd ed. 691
20 MySQL in a nutshell 0596007892 2005 Sebastopol O'Reilly 1st ed. 321

Alternatively, we can use IN to see if a value is in a specific set:

%%sqlite swclib.db
SELECT * FROM Works WHERE Publisher IN('Wiley','O''Reilly');
1 SQL in a nutshell 9780596518844 2009 Sebastopol O'Reilly 3rd ed. 578
2 SQL for dummies 9781118607961 2013 Hoboken Wiley 8th ed.
3 PHP & MySQL 9781449325572 2013 Sebastopol O'Reilly 2nd ed. 532
4 Using SQLite 9780596521189 2010 Sebastopol O'Reilly 1st ed. 503
8 Discovering SQL 9781118002674 2011 Indianapolis Wiley 400
11 SQL bible 9780470229064 2008 Indianapolis Wiley 2nd ed. 857
12 Learning SQL 9780596520830 2009 Sebastopol O'Reilly 2nd ed. 320
13 SQL for dummies 9780470557419 2010 Hoboken Wiley 7th ed. 440
15 Beginning SQL 0764577328 2005 Indianapolis Wiley 501
17 SQL all-in-one 9780470929964 2011 Hoboken Wiley 2nd ed. 708
18 Access 2013 all-in-one 9781118510551 2013 Hoboken Wiley 760
19 SQL in a nutshell 0596004818 2004 Cambridge O'Reilly 2nd ed. 691
20 MySQL in a nutshell 0596007892 2005 Sebastopol O'Reilly 1st ed. 321

We can combine AND with OR, but we need to be careful about which operator is executed first. If we don't use parentheses, we get this:

%%sqlite swclib.db
SELECT * FROM Works WHERE Publisher='Wiley' OR Publisher='O''Reilly' AND Date>=2011;
2 SQL for dummies 9781118607961 2013 Hoboken Wiley 8th ed.
3 PHP & MySQL 9781449325572 2013 Sebastopol O'Reilly 2nd ed. 532
8 Discovering SQL 9781118002674 2011 Indianapolis Wiley 400
11 SQL bible 9780470229064 2008 Indianapolis Wiley 2nd ed. 857
13 SQL for dummies 9780470557419 2010 Hoboken Wiley 7th ed. 440
15 Beginning SQL 0764577328 2005 Indianapolis Wiley 501
17 SQL all-in-one 9780470929964 2011 Hoboken Wiley 2nd ed. 708
18 Access 2013 all-in-one 9781118510551 2013 Hoboken Wiley 760

which is a list of the books published either by Wiley at any given time and those by O'Reilly in 2011 or later. We probably want this instead:

%%sqlite swclib.db
SELECT * FROM Works WHERE (Publisher='Wiley' OR Publisher='O''Reilly') AND Date>=2011;
2 SQL for dummies 9781118607961 2013 Hoboken Wiley 8th ed.
3 PHP & MySQL 9781449325572 2013 Sebastopol O'Reilly 2nd ed. 532
8 Discovering SQL 9781118002674 2011 Indianapolis Wiley 400
17 SQL all-in-one 9780470929964 2011 Hoboken Wiley 2nd ed. 708
18 Access 2013 all-in-one 9781118510551 2013 Hoboken Wiley 760

Finally, we can use DISTINCT with WHERE to give a second level of filtering:

%%sqlite swclib.db
SELECT DISTINCT Date, Publisher FROM Works WHERE (Publisher='Wiley' OR Publisher='O''Reilly') AND Date>=2011;
2013 Wiley
2013 O'Reilly
2011 Wiley

But remember: DISTINCT is applied to the values displayed in the chosen columns, not to the entire rows as they are being processed.

What we have just done is how most people "grow" their SQL queries. We started with something simple that did part of what we wanted, then added more clauses one by one, testing their effects as we went. This is a good strategy—in fact, for complex queries it's often the only strategy—but it depends on quick turnaround, and on us recognizing the right answer when we get it.

The best way to achieve quick turnaround is often to put a subset of data in a temporary database and run our queries against that, or to fill a small database with synthesized records. For example, instead of trying our queries against an actual bibliographic database of 5 million items, we could run it against a sample of ten thousand, or write a small program to generate ten thousand random (but plausible) records and use that.

Challenges

  1. The library wants to weed its collection and you need to review all books published after 2004 (earlier books have already been reviewed) but before 2010. Write a query that selects the titles that need to be reviewed.

  2. The SQL test *column-name* LIKE *pattern* is true if the value in the named column matches the pattern given; the character '%' can be used any number of times in the pattern to mean "match zero or more characters".

    Expression Value
    'a' like 'a' True
    'a' like '%a' True
    'b' like '%a' False
    'alpha' like 'a%' True
    'alpha' like 'a%p%' True

    The expression *column-name* NOT LIKE *pattern* inverts the test. Using LIKE, write a query that lists all books in the database that have the word "SQL" in their title and that have been published after 2010.

Key Points

  • Use where to filter records according to Boolean conditions.
  • Filtering is done on whole records, so conditions can use fields that are not actually displayed.