# SQL how many "deserved"?

Recommended for you: Get network issues from WhatsUp Gold. Not end users.

------------------------------------ gorgeous split line --------------------------------

The main purpose of the invention is to provide SQL access method for shield data storage scheme for structured data, so SQL uses a lot of English vocabulary and grammar in order to reduce the difficulty of writing and understanding. As the basic theory of relational algebra SQL is a self-contained computing system, can in principle to calculate everything. So, we should use SQL computing needs to complete a variety of data.
But, in spite of the relational database has achieved great success, but SQL apparently did not reach the original intention, apart from a few simple query by the user terminal using SQL complete, the vast majority of SQL users are technical personnel, and even many complex queries on the technical staff is not easy.

To examine SQL in calculating the shortcomings through a very simple example.
Suppose a is composed of three field sales (table sales_amount to simplify the problem, save the date information):
Sales         The salesman name, assuming no name
Product    Sales of products
Amount    The salesperson sales in the product.

Now we want to know the air conditioning and television sales are on the list of top 10 sales. This problem is very simple, people will naturally derived as follows: a design process, first sort by air-conditioning sales, find out the top 10; B, then sort by TV sales, find out the top 10; the intersection of C, a, B results.

Using SQL to do:
a, Find out the air-conditioning sales of the top 10, this is very simple:
select top 10 sales from sales_amount where product='AC' order by amount desc
b, Find the TV sales before 10, action.:
select top 10 sales from sales_amount where product='TV' order by amount desc
c, Intersection of 1, 2. This is a bit of a hassle, because SQL does not support the steps, the two step results cannot be saved, so only again:
select * from
( select top 10 sales from sales_amount where product='AC' order by amount desc )
intersect
( select top 10 sales from sales_amount where product='TV' order by amount desc )
A 3 step of simple calculation with SQL written in this way, while the routine calculation of up to 10 a few steps meet the eye everywhere, this is clearly beyond many people's acceptability.
So, we know that the first important drawback of the SQL does not support the steps. The calculation of the complex step can reduce the difficulty of the problem, to a large extent, in turn, the multi-step computing merged into one step greatly increased the difficulty of the problem.
If the teacher asked the pupils to do problems can only make a formula, the children will be how distressed (of course, there are some clever children make).

SQL queries cannot step by step, but the stored procedure used to write SQL can step by step, then use the stored procedure can easily solve the problem.?
Not to mention the technical environment to use stored procedures with multiple complex (this is enough to make most people stop) the difference and the database created is not compatible, we only from a theoretical point of view using the split step SQL if we can make the calculation more simple.
a, Calculation of air conditioning sales before 10. The statement is the same, but we need to save the results for the third step, while SQL can only use the table to store the collection of data, so we want to build a temporary table:
create temporary table x1 as
select top 10 sales from sales_amount where product='AC' order by amount desc
b, Calculation of TV sales of the top 10. Similarly:
create temporary table x2 as
select top 10 sales from sales_amount where product='TV' order by amount desc
c, In front of the intersection, trouble, this step is simple.:
select * from x1 intersect x2
Step by step ideas clear, but use the temporary table is still tedious. In a batch of structured data calculation, temporary collection as the intermediate results are fairly common, if a temporary table to store, not only the operation efficiency is low, also not intuitive.

Moreover, SQL does not allow a field value is set (i.e., a temporary table), so, some even tolerate the tedious calculation can not do.
If we put the problem to calculate all product sales are in the top 10 sales, imagine how to calculate, to continue the use is easy to think of ideas:
1. Data will be grouped by product, will each sort, remove the top 10,
2. The former 10 takes the intersection of all,
Because we do not know in advance will have more than one product, so need to block results are stored in a temporary table, the table has a field to store the corresponding group members, this is not supported by SQL, measures will not work.
If a window function (SQL2003) support, can convert ideas, grouped by product, calculated for each sales in all subgroups of the top 10 appear in the number, if the same and product number, said the sales in all product sales in the top 10 were before.
select sales
from ( select sales,
from ( select sales,
rank() over (partition by product order by amount desc ) ranking
from sales_amount)
where ranking <=10 )
group by sales
having count(*)=(select count(distinct product) from sales_amount)
So SQL, how many people will write.?
Moreover, the window function in many database does not support. So, can only write cycle are calculated for each product by storage process before 10, do overlap with the previous result. This process is programmed with the high-level language is not simple, but still have to face the complicated temporary table.
Now, we know that SQL second important shortcomings: not completely set. Although SQL has set concept, but did not set as a basic data type provides, this makes a lot of set operations require translation in thinking and writing.

We use the top keyword in the above calculation, in fact the relational algebra theory does not have this thing (it can be other combinations are calculated out), this is not a SQL standard.
We see no top for the top 10 there will be more difficult?
The idea is this: to find out the number of members of a larger than ourselves as ranking, and then remove the member ranking of not more than 10, write the SQL as follows:
select sales
from ( select A.sales sales, A.product product,
(select count(*)+1 from sales_amount
where A.product=product AND A.amount<=amount) ranking
from sales_amount A )
where product='AC' AND ranking<=10
Or
select sales
from ( select A.sales sales, A.product product, count(*)+1 ranking
from sales_amount A, sales_amount B
where A.sales=B.sales and A.product=B.product AND A.amount<=B.amount
group by A.sales,A.product )
where product='AC' AND ranking<=10
The SQL statement that, professional and technical personnel may not be able to write well! But only the calculation of a top 10.

Step back and say, even with top, that also is only the front part easily removed. If we put the question to sixth to 10, or to find more than a sales of more than 10% of sales, difficulties still exist.
The reasons for this phenomenon is the SQL of the third important shortcomings: lack of ordered set of support, SQL inherits an unordered collection of mathematics, this directly led to the calculation and order about quite difficult, but one can imagine, calculation and order related just how common (such as compared to last month, compared with the same period last year, the top 20%, ranking).
Window function increases in the standard SQL2003 offers some of the order of computing power, which makes some of these questions can be the solution is simple, alleviate the problem of SQL in some degree. But the use of window functions is often accompanied by sub queries, and not allow the user to directly access the members of a collection order, there will still be many orderly operation is difficult to solve.

We now want to pay attention to the gender ratio calculated above "good" sales, is that men and women have much. In general, the gender information sales will be recorded on the roster (employee employee table) rather than performance, simplified as follows:
name        Employee name, assuming no name
gender     Employee gender
We have calculated the "good" sales list, the idea is to use the list to find the roster of gender, then it counts, but in SQL to cross table to obtain the information needed to connect with the table, thus, then the initial results, SQL will be written as:
select employee.gender,count(*)
from employee,
( ( select top 10 sales from sales_amount where product='AC' order by amount desc )
intersect
( select top 10 sales from sales_amount where product='TV' order by amount desc ) ) A
where A.sales=employee.name
group by employee.gender
Only an association table would be so tedious, quite a lot and the reality of information across the table storage, and often a multilayer. For example, salespeople have a department, department managers, now we want to know what "good" sales staff to managers, it should have three tables connected, to the calculation of where and group write clearly is not a easy job.
This is what we want to say SQL are fourth important problems: lack of object reference mechanism, the relationship between objects in relational algebra by the same foreign key value to maintain, this not only in the search efficiency is very low, the attribute and the record could not be members of foreign key point as the record seriously, imagine, the above sentence can be written like this:
select sales.gender,count(*)
from (…)                                       // …The calculation is in front of "good" salesman SQL
group by sales.gender
Obviously, this sentence is not only more clearly, at the same time, the computational efficiency will be higher (no connection calculation).

Through a simple example analyzes four important problems in SQL, I think this is the main reason of SQL did not reach the real heart. Process to solve business problems in a computing system based on, is in fact the process of business problems translated into formal computational Grammar (similar to primary school students solving application problems, the problem of the form four operations). But before to overcome these difficulties, SQL model system is not in line with people's natural habits of thinking, a great obstacle to cause problems in translation, so that the SQL is difficult to scale should be used for business data calculation.
A programmer easy to understand for example, do the data calculated by SQL, similar to finish four operations on the use of assembly language. We can easily make 3+5*7 such a formula, but if the use of assembly language (in the case of X86), will be written:
mov ax,3
mov bx,5
mul bx,7