Archive for June, 2009|Monthly archive page

Problems with the SQL operator Average

What is the average of this set of values? 4, 8, 6

The average is 6, the operation is (4+8+6) / 3.

If we use the SQL operator AVG() with that set of values the result is the same six, unless at least one more value appears in the set, I mean the “value” null, then the set become: 4, 8, 6, null

The operation would be? (4+8+6+null) / 3.

Whatever null means in this context, the common sense tells us that this operation using SQL should not give us a numeric answer because of null, maybe some other kind of answer like for ex. “I don’t know”.

The problem is that when a SQL query with operator AVG() has this later set of values the answer is 6, it means that SQL doesn’t consider null as another value, and worse it doesn’t tell us anything else, with the consequence that the answer may be misinterpreted.

We have to be consious about this problem. We can have a database table with a hundred of tuples with one numeric attribute that has the values 4 and 2, and the rest just losts of nulls,

4, 2, null, null, …, null a hundred times

then in an SQL AVG() operation we obtain the result of 3.

Obviously the answer cannot be 3, it makes no sense. Well Ok, SQL calculates (4+2) / 2 = 3, but may SQL tell us that there are losts of nulls not being considered in the operation, but no, SQL doesn’t tell us anything else but the result three.

Some other problems about the operator sum vs. count can be found in my publication:

A detailed explanation plus some other problems found with the SQL aggregate operators is found in my book:

Domain (Data Type) and null

E. F. Codd (The father of the Relational Model) defined a tree valued logic that has True, False and Unknown, this later logic value is represented in current DBMSs with the null mark that, in general, represents the concept of Missing Information. He also classified Unknown into two possible Missing Information interpretations, one is applicable and the other is inapplicable.


The only possible mark that we can use in current DBMSs is null which represents Missing applicable Information; it means that when we don’t have a value for an attribute, null is put in the attribute.


The problem is that as we have seen in my post, null can not only represent Missing Information but also inapplicable information, and maybe there are some other meanings this mark could have in a database context.


In Database literature the null mark is not considered part of an attribute domain, then null could be considered an external mark from the attribute domain, that has a meaning in the context of the tuple that is Missing Applicable Information or Missing Inapplicable Information. The point is that we have to be conscious about the troubles null can cause in order to avoid them at the time of query construction.

There is an alternative that had been raised, the especial values which had been considered instead of the null mark to deal with Missing Applicable Information and Missing Inapplicable Information, for which we can give a meaning depending on the semantics an attribute have in a database context. For example, we can have “U” to represent missing information and “I” to represent not applicable, and some other possibilities that may exists. The problem is that the DBMS will not treat these values with the meaning we are giving, so we have to consider in queries conditions that let us differentiate these values from others.

 After this analysis we could have the following two alternatives: 


  1. Be conscious about the problems null mark can cause, and build the queries accordingly, but with the problem that we cannot differentiate the meaning of one null from other.
  2. Use especial values with the advantage that those values have a concrete meaning for us, with the problem that we have to consider them in the query conditions.

 You can find a more complete analysis of the null mark in my book:

When subquery is empty in EXISTS (subquery)

Take a look at the example:

Let’s analyse the result of the second example in the mentioned link:

The query: “Get the Bank names which doesn’t have ATMs in Madrid”

The result of the EXISTS subquery is empty, I mean that the subquery generates no tuples because the only possible tuple is the one that has Madrid in the attribute City an null in the attribute BNK, and the condition NOT EXISTS (subquery) generates True, that’s why we get a result with all the Bank names from de table BANK.

Then, how does the condition EXISTS behave from the perspective of SQL? Is this behaviour the same as in Logic?.

A detailed explanation is found in my book:

In the case of NOT IN (null)

Take a look at the example:

Let’s analyse the result of the first example in the mentioned link:

The query: “Get the Bank names which doesn’t have ATMs in Madrid”

We have that the subquery (SELECT B.BNK FROM ATM B WHERE B.CITY = ‘Madrid’) generate the result null, and can we say that A.BANK isn’t in the list where there exists only null?.


Ovbiously not, what we can say is that it’s not known if any of the Banks in the table BANK exists in the list generated by the subquery, which has the only element null.

  A detailed explanation is found in my book: