Find us on GitHub

Calculating New Values

Objectives

  • Write queries that calculate new values for each selected record.

The library is embarking on a digitization project. In order to estimate how long it would take to scan books, you want to look up what the total number of pages for all library books are. Prior experience has shown that due to poor quality binding, up to 2% of the pages will fail to be scanned for books published by Wiley and will have to be scanned anew. The total number of pages has therefore to be adjusted accordingly for this publisher:

%load_ext sqlitemagic
%%sqlite swclib.db
SELECT 1.02*Pages FROM Works WHERE Publisher = "Wiley"
 
408.0
874.14
448.8
511.02
722.16
775.2
Remember those empty cells we had in the database? Seems like one of them is showing up again here. We'll have to deal with it later so that it doesn't mess up our computations.

When we run the query, the expression 1.02 * Pages is evaluated for each row. Expressions can use any of the fields, all of usual arithmetic operators, and a variety of common functions. (Exactly which ones depends on which database manager is being used.) For example, knowing that scanning two pages takes exactly 13 seconds, we can compute how many hours would be necessary to scan each Wiley book. Since this is an estimate, more than one decimal place doesn't make much sense and so we use a rounding function:

%%sqlite swclib.db
SELECT Title, round(13/2*(1.02*Pages)/3600,1) FROM Works WHERE Publisher = "Wiley"
SQL for dummies
Discovering SQL 0.7
SQL bible 1.5
SQL for dummies 0.7
Beginning SQL 0.9
SQL all-in-one 1.2
Access 2013 all-in-one 1.3

We can also combine values from different fields, for example by using the string concatenation operator ||:

%%sqlite swclib.db
SELECT Personal || " " || Family FROM Authors LIMIT 5;
Kevin E. Kline
Daniel Kline
Brand Hunt
Allen G. Taylor
Brett McLaughlin

Challenges

  1. After further testing, we realize that the average time to scan every two pages of books published by O'Reilly is 25 seconds, because the operator seems to like reading a few paragraphs now and then. Fortunately, they do not suffer from the binding issue, though. Write a query that returns the number of hours necessary to scan each O'Reilly book, given those circumstances.

  2. The UNION operator combines the results of two queries:

%%sqlite swclib.db
SELECT * FROM Works WHERE Publisher='Peachpit' UNION SELECT * FROM Works WHERE Publisher='Faber & Faber';
5 Geek sublime 9780571310302 2014 London Faber & Faber 258
9 SQL 0321334175 2005 Berkeley Peachpit 2nd ed. 460

Use UNION to create a consolidated list of Wiley and O'Reilly titles, along with the estimated time it would take to digitize them according to the two formulas discussed above. The output should be something like:

Access 2013 all-in-one 1.3
Beginning SQL 0.9
Discovering SQL 0.7
Learning SQL 1.0
MySQL in a nutshell 1.0
PHP & MySQL 1.0
SQL all-in-one 1.2
SQL bible 1.5
SQL for dummies
SQL for dummies 0.7
SQL in a nutshell 1.0
SQL in a nutshell 2.0
Using SQLite 1.0
  1. The Works table contains ISBN numbers that are either in 10-digit or 13-digit format. The 13-digit numbers all start with the '978' prefix. The two digits following the prefix (or the first two digits in ISBN-10) form the "registration group element" (to simplify: language or country), the next four digits are the "registrant element" (publisher) and the next three are the "publication element" (title). The last digit in both formats is a checksum character.

    ISBN Details.svg
    "ISBN Details" by Sakurambo at English Wikipedia - Own work, based on en::Image:ISBN Details.jpg. Licensed under CC BY-SA 3.0 via Wikimedia Commons.

    The substring function substr(X, I, L) returns the substring of X starting at index I and of length L (L is optional). The length() function can be used in a WHERE clause to test against the length of a field. Use these functions and the UNION statement to output a list of publishers and the 4-digit publisher codes (registrant elements) appearing in the Works table. The result should help dissolve whatever hope you still held on the usefulness of ISBNs for collection analysis...

Key Points

  • SQL can perform calculations using the values in a record as part of a query.