Archive for the ‘SQL weaknesses’ Category

Problems with the SQL operator Average

What is the average of this set of values? 4, 8, 6

The average is 6, the operation is (4+8+6) / 3.

If we use the SQL operator AVG() with that set of values the result is the same six, unless at least one more value appears in the set, I mean the “value” null, then the set become: 4, 8, 6, null

The operation would be? (4+8+6+null) / 3.

Whatever null means in this context, the common sense tells us that this operation using SQL should not give us a numeric answer because of null, maybe some other kind of answer like for ex. “I don’t know”.

The problem is that when a SQL query with operator AVG() has this later set of values the answer is 6, it means that SQL doesn’t consider null as another value, and worse it doesn’t tell us anything else, with the consequence that the answer may be misinterpreted.

We have to be consious about this problem. We can have a database table with a hundred of tuples with one numeric attribute that has the values 4 and 2, and the rest just losts of nulls,

4, 2, null, null, …, null a hundred times

then in an SQL AVG() operation we obtain the result of 3.

Obviously the answer cannot be 3, it makes no sense. Well Ok, SQL calculates (4+2) / 2 = 3, but may SQL tell us that there are losts of nulls not being considered in the operation, but no, SQL doesn’t tell us anything else but the result three.

Some other problems about the operator sum vs. count can be found in my publication:

A detailed explanation plus some other problems found with the SQL aggregate operators is found in my book:

Domain (Data Type) and null

E. F. Codd (The father of the Relational Model) defined a tree valued logic that has True, False and Unknown, this later logic value is represented in current DBMSs with the null mark that, in general, represents the concept of Missing Information. He also classified Unknown into two possible Missing Information interpretations, one is applicable and the other is inapplicable.


The only possible mark that we can use in current DBMSs is null which represents Missing applicable Information; it means that when we don’t have a value for an attribute, null is put in the attribute.


The problem is that as we have seen in my post, null can not only represent Missing Information but also inapplicable information, and maybe there are some other meanings this mark could have in a database context.


In Database literature the null mark is not considered part of an attribute domain, then null could be considered an external mark from the attribute domain, that has a meaning in the context of the tuple that is Missing Applicable Information or Missing Inapplicable Information. The point is that we have to be conscious about the troubles null can cause in order to avoid them at the time of query construction.

There is an alternative that had been raised, the especial values which had been considered instead of the null mark to deal with Missing Applicable Information and Missing Inapplicable Information, for which we can give a meaning depending on the semantics an attribute have in a database context. For example, we can have “U” to represent missing information and “I” to represent not applicable, and some other possibilities that may exists. The problem is that the DBMS will not treat these values with the meaning we are giving, so we have to consider in queries conditions that let us differentiate these values from others.

 After this analysis we could have the following two alternatives: 


  1. Be conscious about the problems null mark can cause, and build the queries accordingly, but with the problem that we cannot differentiate the meaning of one null from other.
  2. Use especial values with the advantage that those values have a concrete meaning for us, with the problem that we have to consider them in the query conditions.

 You can find a more complete analysis of the null mark in my book:

In the case of NOT IN (null)

Take a look at the example:

Let’s analyse the result of the first example in the mentioned link:

The query: “Get the Bank names which doesn’t have ATMs in Madrid”

We have that the subquery (SELECT B.BNK FROM ATM B WHERE B.CITY = ‘Madrid’) generate the result null, and can we say that A.BANK isn’t in the list where there exists only null?.


Ovbiously not, what we can say is that it’s not known if any of the Banks in the table BANK exists in the list generated by the subquery, which has the only element null.

  A detailed explanation is found in my book:

Some notes about null

Take a look at the examples:

The following are some practical issues related to what Mr. C.J. Date said about null. Let’s analyse the results in a couple of cases.Consider that we have a couple of attributes att1 and att2.

Case 1 (simple conditions in the example):

att1 = 3 and att2 = null

if we have the condition att1 = att2, the answer is the same as false. But, is the value 3 different from null?, not neccesarily, attibute att2 could be 3, then the correct answer should be unknown.

Case 2 (Intersect in the example):

The intersection of att1 = null in one table and the same att1 = null in the other table.

In this case of the Intersect, null is considered a value, that is why the intersection take place. But null is not a value is a symbol that represent unknown or some other meaning (soon an analysis of this aspect), that`s not correct, because the att1 value from one table  could or could not be the same as att1 from the other table.

A detailed explanation is found in my book:

Some SQL weaknesses explained in a video clip

In the following video clip from I explain a couple of the weaknnesses in a graphical way.

A simple SQL weakness with the Intersect operation


A more complex SQL weakness with the conditions NOT IN vs. NOT EXISTS

(If needed see the example database)

A simple SQL weaknesses with operator avg()