SQL Antipatterns: Avoiding the Pitfalls of Database Programming (Pragmatic Programmers)

Author: Bill Karwin
All Stack Overflow 30

SQL Antipatterns: Avoiding the Pitfalls of Database Programming (Pragmatic Programmers)


Review Date:


by anonymous   2018-05-09
Please allow this question to serve as an example of a pitfall. Why we do not store comma separated lists is discussed in Chapter 2 Jaywalking in Bill Karwin's book "SQL Antipatterns", available at Amazon and other fine booksellers. https://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557
by anonymous   2017-12-18
@MattiaFrusca: We typically reserve `SET` datatype for special cases of strictly predetermined values. And we don't modify the definition of the column when a user request can have any new value that they want to store. Running an ALTER TABLE can kill concurrency; and we typically don't grant ALTER TABLE priv to the application (its more problems than it solves.) Also, storing comma separated lists is an anti-pattern, see Chapter 2 in Bill Karwin's book https://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557(available from amazon and other fine booksellers.)
by anonymous   2017-11-27
The first pattern is the normative relational pattern. Consider a future use case, for example, find all recipe with "oil" as an ingredient. Or if you needed to enforce a foreign key constraint... you would find the second pattern to be unsuitable. Bill Karwin has written an excellent book which has a chapter that covers the topic of "storing lists" in a column. "SQL Antipatterns: Avoiding the Pitfalls of Database Programming" available from Amazon and other fine booksellers. https://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557/ (See Chapter 2. Jaywalking)
by anonymous   2017-08-20

According to this answer by @axiac, better solution in terms of compatibility and performance is shown below.

It is also explained in SQL Antipatterns book, Chapter 15: Ambiguous Groups.

To improve performance, combined index is also added for (group_id, price, id).


SELECT a.id, a.name, a.group_id, a.price
FROM items a
LEFT JOIN items b 
ON a.group_id = b.group_id 
AND (a.price > b.price OR (a.price = b.price and a.id > b.id))
WHERE b.price is NULL;

See explanation on how it works for more details.

By accident as a side-effect this query works in my case where I needed to include ALL records with group_id equals to NULL AND one item from each group with the lowest price.


| id | name   | group_id | price |
|  1 | Item A |     NULL | 10.00 | 
|  2 | Item B |     NULL | 20.00 | 
|  3 | Item C |     NULL | 30.00 | 
|  4 | Item D |        1 | 40.00 | 
|  5 | Item E |        2 | 50.00 | 


| id | select_type | table | type | possible_keys                 | key                | key_len | ref                        | rows | Extra                    |
|  1 | SIMPLE      | a     | ALL  | NULL                          | NULL               | NULL    | NULL                       |    7 |                          | 
|  1 | SIMPLE      | b     | ref  | PRIMARY,id,items_group_id_idx | items_group_id_idx | 5       | agi_development.a.group_id |    1 | Using where; Using index | 
by anonymous   2017-08-20

I assume you have some primary key field on each joined table, since UserID is not unique. I'll assume your primary key is called ID. We'll take the records with the lowest ID. This meets your "first" criteria.

SELECT  Users.UserID, Users.FirstName, Users.LastName, hp.HomePhone,
        wp.WorkPhone, fn.FaxNumber
FROM Users
LEFT JOIN HomePhone hp ON hp.UserID = Users.UserID
LEFT JOIN HomePhone hp2 ON hp2.UserID = Users.UserID AND hp2.ID < hp.ID
LEFT JOIN WorkPhone wp ON wp.UserID = Users.UserID
LEFT JOIN WorkPhone wp2 ON wp2.UserID = Users.UserID AND wp2.ID < wp.ID
LEFT JOIN FaxNumber fn ON fn.UserID = Users.UserID
LEFT JOIN FaxNumber fn2 ON fn2.UserID = Users.UserID AND fn2.ID < fn.ID

There is a whole chapter on this type of issue, called "Ambiguous Gruops", in the book SQL Antipatterns.

by anonymous   2017-08-20

This question has been asked and answered before, but I don't want to hunt for it; this question should be closed as a duplicate. But, to answer your question:

The commas in the string, the column value, are just characters. Those are part of the string. They aren't seen as "separators" between values in the SQL text. The way SQL sees it, the column contains a single value, not a "list" of values.

So, in your query, the IN (field) is equivalent to an equals comparison. It's equivalent to comparing to a string. For example:

... WHERE 2316 = '1908,2315,2316'

And those aren't equal, so the row isn't returned. The "surprisingly" finding of a match, in the case of:

... WHERE 1908 IN ('1908,2315,2316')

that's explained because that string is being evaluated in a numeric context. That is, the comparison returns true, because all of these also true:

... WHERE 1908 = '1908,2315,2316' + 0
... WHERE 1908 = '1908xyz' + 0
... WHERE 1908 = '1907qrs' + 1

(When evaluated in a numeric context, a string gets converted to numeric. It just happens that the string evaluates to a numeric value that equals the integer value it's being comparing to.)

You may be able to make use of the MySQL FIND_IN_SET function. For example:

... WHERE FIND_IN_SET(2316,'1908,2315,2316')

But, please seriously reconsider the design of storing comma separated list. I recommend Bill Karwin's "SQL Antipatterns" book...