Order of SQL

袁晗 | Luo, Yuan Han
8 min readMay 6, 2021
img src: https://www.teepublic.com/sticker/2076545-retro-sql-programming-icon

If you are coming from a programming background, and feeling like your hands are tied when using SQL, this article will untie your hands. The article is highly conceptual assuming that you are familiar with programming and basic SQL already. The goal, therefore, is to help you better understand errors and debug with ease. Hence, I will not go over syntactical details besides a few photo illustration.

SQL is a rather high level programming language that it looks more like a frame work rather than a language. It’s formally categorized as a query language by Wikipedia for a reason and that will be discussed later. So forget about setting variables, running loops, and classes for now.

Coming from C++, Ruby, and Python I just wanted to write a loop for every problem I see in SQL. I saw columns as arrays, and tables as 2 dimensional arrays. So naturally I wanted to run a loop to return desired results. This way of thinking isn’t wrong. It’s an art of its own and a big part of SQL, altering tables, creating tables, and deleting tables. But another big part of SQL, the beginners aren’t aware of, is queries. That means, the goal is not to alter the table in any way, but rather to retrieve information with high agility and dexterity.

If this sounds like that SQL is very similar to jQuery, that’s because it is. Coming from a typical programming world, your goal might be to get the information by all means. This is because the means are not typically expensive. However, when we are dealing with large databases, Pinterest user table for example, the operation becomes very computationally expensive. And the worst part is that there might be a way easier approach to get the same result.

Now you might be convinced that the proper query skills are critical, but there is still an inner voice somewhere telling you to set variables and run loops is more important. That is true, for data engineers. As a data analyst or scientist your focus is to find trends and replicate a function out of it. Of course I am not saying that you should be completely ignoring data engineering. On the contrary, a good data scientist should be good at data engineering, if not, know a thing or two. But as a data scientist, the pro of retrieving data with agility outweighs the ability to design a database, specially if you are in a big company with clear division of labor. So lets put on our SQL glasses and see what’s the deal.

Order matters

The most important thing in approaching SQL is to understand the execution order.

If I get to redesign SQL, I would design the coding order same as the execution order. SQL won’t change, for now, so we have to deal with it. Remember the chart above, it will be useful later. Let’s go over this one by one.

From

For demonstration purposes, let’s pretend we are reading someone else’s SQL code instead of writing our own. I promise you that once you can read, the writing will come.

All “FROM” statement must be followed by the table name

First step is to skip the “SELECT” statement and read the SQL codes starting with “FROM”. This should be very intuitive. Imagine you are searching for a small town on a world map from a top-down perspective. You start from a board category, a continent for example, and slowly narrow down your focus to find what you are looking for. And this whole process begins with what table(s) we are working with.

2 Kinds of JOINs

I know that the execution order chart did not have a row for “JOIN”, but if you look at the “function” column in the “from” row, you will see “JOIN”. Let’s ignore the imperfect chart and take a look at the “JOIN” statement

All “JOIN” statement must be followed by the table we are joining with and precedes the “ON” statement

The next thing to look for is to find out if there are any other tables we are working with. The presence of “JOIN” statement tells us yes, and the quantity of the table depends on how many “JOIN” statements are presence. In theory you can join as many tables as you like. I know there are many kinds of “JOIN”, but all you have to remember is 2: Inner join and everything else, outer join.

If you are a mathematician, this is where you can flex with unions and sets. It’s the same idea.

Inner Join

When you see “ON table_1.column_X = table_2.column_Y”, what this statement is telling you is that return the rows from both tables where the values in column_X equal to the values in column_Y, and ignore the rows where 2 values don’t equal to each other.

In math, the combined data is the subset resulting from the intersection ‘∩’ between two tables.

ON Checks.Vendor_ID = Vendors.Vendor_ID

Notice that vendor ID “B” appeared 2 times in the output table “Dynaset (combined)”. Remember, contrary to the word choice of many SQL tutorials you might encounter, the second “B” row is not a duplicate value in the perspective of the output table. It is a complete different transaction with a different date and different amount of money by the same person. This is as complicated as “INNER JOIN” can get, let’s move on to outer joins.

Outer Joins

Outer joins combines the columns with matching column values and un-matching values depend on the kind of outer join: Left Join, Right Join, Full Outer Join.

In math, this is similar to the Union symbol ‘∪’.

Left/Right/Full Join

“LEFT JOIN” combines all rows with equal values in the selected columns and all the rows that follows the “FROM” statement. The “RIGHT JOIN” is exactly the opposite, so I will not go into it.

Notice that even though “CountryID” 4 does not exist in the right table, It is still in the output table (bottom table). And of course “ID” value 2 in the right table is not even a matching value, so it will not be in the output table. You can remember it as return rows with matching column value + all rows from the left/right table depends on which “JOIN” we use.

The “FULL JOIN” statement returns all the matching and un-matching column values.

img src: https://www.omnisci.com/technical-glossary/relational-database

As of now, the practical application of “FULL JOIN” is rather scarce. It is limited to a few, such as counting the overlaps between two tables.

The beauty of relational data base and SQL would not be realized without the “JOIN” statement. It turns a huge database into many smaller modular tables.

As you can see, not only this makes readers life way easier, it also turns a huge monster into many smaller manageable tables.

WHERE

The next statement to watch for is “WHERE”. “WHERE” is a filter statement that follows “JOIN”. It returns rows where the condition is TRUE. And like all conditions, it can tolerate operators such as =, !=, AND, OR, and “LIKE”. “LIKE” is basically SQL version of regular expression. I will explain other more complicated concepts that follows “WHERE”, such as “CASE” and subqueries, later.

Keep in mind the execution order, this is where people get confused. You do not filter a statement and join, you join and then filter.

GROUP BY & HAVING

You should be looking for “GROUP BY” after “WHERE”. Think of it as something that categorizes a giant table into many smaller tables. The statement that follows “GROUP BY” tells us what column we are categorizing.

For our example, we categorized the ‘genre’ column.

This way of thinking makes aggregate functions, SUM(), COUNT(), or MAX(), easier to understand. They work with individual smaller tables independently. That means, for our example above, SUM() returns 7 for red, 3 for yellow, and 8 for blue. In truth, they still exist in one table, just all ordered together. But breaking a table into many smaller tables is an abstract idea that help us to understand aggregate function. It is not what happened with SQL under the hood.

Remember order matters, if you want to filter the table after “GROUP BY”, you have to use “HAVING”. “WHERE” statement after “GROUP BY” will get you error.

SELECT/ORDER BY/LIMIT

Phew, finally we can go back to “SELECT”. As you can see, at this point the table is already very clear in our head. In other words, we narrowed down our focus.

After that, just polish with order and limit. This is the last step on the order list.

Even though “ORDER BY” and “LIMIT” can seem very minor, it’s creative usage can yield rewarding results, especially in subqueries.

CASE AND SUBQUERIES

I put “CASE” and subqueries last because they can almost follow any statement. You can put them after “SELECT”, “FROM”, “WHERE”, even in another “CASE” if it’s necessary.

SUBQUERIES

If the result of our first narrowing down isn’t sufficient, try subqueries. Subqueries is basically a whole complete query inside parenthesis. That means all queries can exist in the form of subqueries, just add parenthesis.

img src: https://www.essentialsql.com/introduction-to-subqueries/

You can insert your subqueries in the “SELECT” statement as well as “FROM”. Don’t over complicate this subject because it can get very nasty fast. Subqueries obeys exactly the same rules as any queries, just inside the parenthesis. When you are inside a subquery, repeat the execution order from step one.

CASE

Think of “CASE” statement as a function that returns a whole column. This is very important, agile usage of “CASE” requires you to understand this concept. “CASE” evaluate conditions that follows “WHEN” and outputs values that follows “THEN”. So “WHEN column_X > 5 THEN ‘yes’ ELSE ‘no’ END AS more_than5” creates a new binary column with the alias “more_than5”. “CASE” decide what values goes into “more_than5” base on the condition “column_X > 5”. If it’s true, it outputs “yes”, otherwise “no”. The end result is a brand new column name “more_than5”

The implication of “CASE” is immense. With an additional column base of the conditions and values we set, we can group, add, or count. The possibilities are limitless.

CONCLUSION

These are the most important commands of SQL. Of course I do not have the time to covered all of it. And every year you will hear some new bells and whistles that return neat results, but they are nevertheless more fitting for expert SQL users who already understood the basics.

SQL is not hard if you respect its rules: execution order. Ignore it, you will run right into errors or returning crazy things at best.

When liberty destroys order, the hunger for order will destroy liberty.

-Will Durant

--

--