Real-world data is never complete—there are always holes. Databases represent these holes using special value called NULL
. NULL
is not zero, False
, or the empty string; it is a one-of-a-kind value that means "nothing here". Dealing with NULL
requires a few special tricks and some careful thinking.
To start, let's have a look at the Works
table. There are a total of 20 records. The page number is missing for records #2 and #26—for them, the Pages
field is null:
%load_ext sqlitemagic
%%sqlite swclib.db SELECT Work_ID, Title, Pages FROM Works;
1 | SQL in a nutshell | 578 |
2 | SQL for dummies | |
3 | PHP & MySQL | 532 |
4 | Using SQLite | 503 |
5 | Geek sublime | 258 |
6 | Capital in the 21st century | 685 |
7 | SQL | 534 |
8 | Discovering SQL | 400 |
9 | SQL | 460 |
10 | A guide to SQL | 309 |
11 | SQL bible | 857 |
12 | Learning SQL | 320 |
13 | SQL for dummies | 440 |
14 | Beginning SQL queries | 218 |
15 | Beginning SQL | 501 |
16 | Microsoft SQL server 2012 | |
17 | SQL all-in-one | 708 |
18 | Access 2013 all-in-one | 760 |
19 | SQL in a nutshell | 691 |
20 | MySQL in a nutshell | 321 |
Null doesn't behave like other values. If we select the works that have less than 300 pages:
%%sqlite swclib.db SELECT Work_ID, Title, Pages FROM Works WHERE Pages<300
5 | Geek sublime | 258 |
14 | Beginning SQL queries | 218 |
we get two results, and if we select the ones that have 300 or more pages:
%%sqlite swclib.db SELECT Work_ID, Title, Pages FROM Works WHERE Pages>=300
1 | SQL in a nutshell | 578 |
3 | PHP & MySQL | 532 |
4 | Using SQLite | 503 |
6 | Capital in the 21st century | 685 |
7 | SQL | 534 |
8 | Discovering SQL | 400 |
9 | SQL | 460 |
10 | A guide to SQL | 309 |
11 | SQL bible | 857 |
12 | Learning SQL | 320 |
13 | SQL for dummies | 440 |
15 | Beginning SQL | 501 |
17 | SQL all-in-one | 708 |
18 | Access 2013 all-in-one | 760 |
19 | SQL in a nutshell | 691 |
20 | MySQL in a nutshell | 321 |
we get sixteen results. Records #2 and #16 aren't in either set of results. The reason is that NULL<300
is neither true nor false: null means, "We don't know," and if we don't know the value on the left side of a comparison, we don't know whether the comparison is true or false. Since databases represent "don't know" as null, the value of NULL<300
is actually NULL
. NULL>=300
is also null because we can't answer to that question either. And since the only records kept by a WHERE
are those for which the test is true, records #2 and #16 aren't included in either set of results.
Comparisons aren't the only operations that behave this way with nulls. 1+NULL
is NULL
, 5*NULL
is NULL
, log(NULL)
is NULL
, and so on. In particular, comparing things to NULL with = and != produces NULL:
%%sqlite swclib.db SELECT Work_ID, Title, Pages FROM Works WHERE Pages=NULL;
%%sqlite swclib.db SELECT Work_ID, Title, Pages FROM Works WHERE Pages!=NULL;
To check whether a value is NULL
or not, we must use a special test IS NULL
:
%%sqlite swclib.db SELECT Work_ID, Title, Pages FROM Works WHERE Pages IS NULL
2 | SQL for dummies | |
16 | Microsoft SQL server 2012 |
or its inverse IS NOT NULL
:
%%sqlite swclib.db SELECT Work_ID, Title, Pages FROM Works WHERE Pages IS NOT NULL
1 | SQL in a nutshell | 578 |
3 | PHP & MySQL | 532 |
4 | Using SQLite | 503 |
5 | Geek sublime | 258 |
6 | Capital in the 21st century | 685 |
7 | SQL | 534 |
8 | Discovering SQL | 400 |
9 | SQL | 460 |
10 | A guide to SQL | 309 |
11 | SQL bible | 857 |
12 | Learning SQL | 320 |
13 | SQL for dummies | 440 |
14 | Beginning SQL queries | 218 |
15 | Beginning SQL | 501 |
17 | SQL all-in-one | 708 |
18 | Access 2013 all-in-one | 760 |
19 | SQL in a nutshell | 691 |
20 | MySQL in a nutshell | 321 |
Null values cause headaches wherever they appear. For example, suppose we want to find all books about SQL that have less than 500 pages. It's natural to write the query like this:
%%sqlite swclib.db SELECT * FROM Works WHERE Title LIKE "%SQL%" AND Pages<500;
8 | Discovering SQL | 9781118002674 | 2011 | Indianapolis | Wiley | 400 | |
9 | SQL | 0321334175 | 2005 | Berkeley | Peachpit | 2nd ed. | 460 |
10 | A guide to SQL | 9780324597684 | 2008 | Mason | South-Western | 8th ed. | 309 |
12 | Learning SQL | 9780596520830 | 2009 | Sebastopol | O'Reilly | 2nd ed. | 320 |
13 | SQL for dummies | 9780470557419 | 2010 | Hoboken | Wiley | 7th ed. | 440 |
14 | Beginning SQL queries | 9781590599433 | 2008 | Berkeley | Apress | 218 | |
20 | MySQL in a nutshell | 0596007892 | 2005 | Sebastopol | O'Reilly | 1st ed. | 321 |
but this query filters omits the records where we don't know the number of pages. Once again, the reason is that when Pages
is NULL
, the <
comparison produces NULL
, so the record isn't kept in our results. If we want to keep these records we need to add an explicit check:
%%sqlite swclib.db SELECT * FROM Works WHERE Title LIKE "%SQL%" AND (Pages<500 OR Pages IS NULL);
2 | SQL for dummies | 9781118607961 | 2013 | Hoboken | Wiley | 8th ed. | |
8 | Discovering SQL | 9781118002674 | 2011 | Indianapolis | Wiley | 400 | |
9 | SQL | 0321334175 | 2005 | Berkeley | Peachpit | 2nd ed. | 460 |
10 | A guide to SQL | 9780324597684 | 2008 | Mason | South-Western | 8th ed. | 309 |
12 | Learning SQL | 9780596520830 | 2009 | Sebastopol | O'Reilly | 2nd ed. | 320 |
13 | SQL for dummies | 9780470557419 | 2010 | Hoboken | Wiley | 7th ed. | 440 |
14 | Beginning SQL queries | 9781590599433 | 2008 | Berkeley | Apress | 218 | |
16 | Microsoft SQL server 2012 | 9780132977661 | 2013 | Indianapolis | Sams | ||
20 | MySQL in a nutshell | 0596007892 | 2005 | Sebastopol | O'Reilly | 1st ed. | 321 |
We still have to decide whether this is the right thing to do or not. If we want to be absolutely sure that we aren't including any books with less than 500 pages, we need to exclude all the records for which we don't know the number of pages.
Write a query that sorts the records in Works
by the number of pages, omitting entries for which this information is not known (i.e., is null).
What do you expect the query:
SELECT * FROM Works WHERE Edition in ('1st ed.', null);
to produce? What does it actually produce?
Some database designers prefer to use a sentinel value to mark missing data rather than NULL
. For example, they will use the date "0000-00-00" to mark a missing date, or -1 to mark a missing page number. What does this simplify? What burdens or risks does it introduce?
null
to represent missing information.null
produces null
as a result.null
are is null
and is not null
.