Find us on GitHub

Sorting and Removing Duplicates

Objectives

  • Write queries that display results in a particular order.
  • Write queries that eliminate duplicate values from data.

Data is often redundant, so queries often return redundant information. For example, say we were interested in listing all the publishers that are represented in our catalogue. If we select the Publisher from the Works table, we get this:

%load_ext sqlitemagic
%%sqlite swclib.db
SELECT Publisher from Works;
O'Reilly
Wiley
O'Reilly
O'Reilly
Faber & Faber
Belknap Press
McGraw-Hill
Wiley
Peachpit
South-Western
Wiley
O'Reilly
Wiley
Apress
Wiley
Sams
Wiley
Wiley
O'Reilly
O'Reilly

We can eliminate the redundant output to make the result more readable by adding the DISTINCT keyword to our query:

%%sqlite swclib.db
SELECT DISTINCT Publisher from Works;
O'Reilly
Wiley
Faber & Faber
Belknap Press
McGraw-Hill
Peachpit
South-Western
Apress
Sams

If we select more than one column—for example, both the place and publisher—then the distinct pairs of values are returned:

%%sqlite swclib.db
SELECT DISTINCT Place, Publisher FROM Works;
Sebastopol O'Reilly
Hoboken Wiley
London Faber & Faber
Cambridge Belknap Press
New York McGraw-Hill
Indianapolis Wiley
Berkeley Peachpit
Mason South-Western
Berkeley Apress
Indianapolis Sams
Cambridge O'Reilly

Notice in both cases that duplicates are removed even if they didn't appear to be adjacent in the database. Again, it's important to remember that rows aren't actually ordered: they're just displayed that way.

Challenge

  1. Write a query that displays all the distinct years in which items were purchased for the library. Hint, look at the Aquired column in the Items table.

As we mentioned earlier, database records are not stored in any particular order. This means that query results aren't necessarily sorted, and even if they are, we often want to sort them in a different way, e.g., in alphabetical order instead of the order in which they were written into the database. We can do this in SQL by adding an ORDER BY clause to our query:

%%sqlite swclib.db
SELECT DISTINCT Publisher FROM Works ORDER BY Publisher;
Apress
Belknap Press
Faber & Faber
McGraw-Hill
O'Reilly
Peachpit
Sams
South-Western
Wiley

By default, results are sorted in ascending order (i.e., from least to greatest or for A to Z). We can sort in the opposite order using DESC (for "descending"):

%%sqlite swclib.db
SELECT DISTINCT Publisher FROM Works ORDER BY Publisher DESC;
Wiley
South-Western
Sams
Peachpit
O'Reilly
McGraw-Hill
Faber & Faber
Belknap Press
Apress

(And if we want to make it clear that we're sorting in ascending order, we can use ASC instead of DESC.)

We can also sort on several fields at once. For example, this query sorts the Works table first by Date (in descending order), then by Publisher in ascending order:

%%sqlite swclib.db
SELECT Title, Date, Publisher FROM Works ORDER BY Date DESC, Publisher ASC;
Capital in the 21st century 2014 Belknap Press
Geek sublime 2014 Faber & Faber
PHP & MySQL 2013 O'Reilly
Microsoft SQL server 2012 2013 Sams
SQL for dummies 2013 Wiley
Access 2013 all-in-one 2013 Wiley
Discovering SQL 2011 Wiley
SQL all-in-one 2011 Wiley
Using SQLite 2010 O'Reilly
SQL for dummies 2010 Wiley
SQL 2009 McGraw-Hill
SQL in a nutshell 2009 O'Reilly
Learning SQL 2009 O'Reilly
Beginning SQL queries 2008 Apress
A guide to SQL 2008 South-Western
SQL bible 2008 Wiley
MySQL in a nutshell 2005 O'Reilly
SQL 2005 Peachpit
Beginning SQL 2005 Wiley
SQL in a nutshell 2004 O'Reilly

Challenge

  1. Write a query that displays the Personal and Family name of the authors in the Authors table, ordered by Family name.

Key Points

  • The records in a database table are not intrinsically ordered: if we want to display them in some order, we must specify that explicitly.
  • The values in a database are not guaranteed to be unique: if we want to eliminate duplicates, we must specify that explicitly as well.