Upgrade to Get Unlimited Access
($10 One Off Payment)

Get Started with SQL JOINs

To perform advanced analytical processing and data discovery, one table is often not enough to bring valuable insights, hence combining multiple tables together is unavoidable. SQL, as a tool to communicate with relational database, provides the functionality to build relationships among tables. This article introduces how to use SQL to link tables together. If you want to learn more about the basics of SQL, I suggest have a read of my first article about learning SQL in everyday language. It gives a comprehensive SQL introduction for absolute beginners.

Why we need to learn SQL JOIN?

Maybe you haven’t even realized, we frequently come across joining in Excel as well. This is achieved by VLOOKUP function as shown below. VLOOKUP function allows us to perform matching of one column and return the corresponding reference values from another column. In this case, we are able to find the country_name related to each criterion by matching the country_code to the fips_code in Country Code table.

VLOOKUP in Excel
VLOOKUP in Excel

We may encounter many situations like this when the information in one table cannot suffice. SQL join uses the exact same logic, yet it is more powerful since it is matching and merging two or more tables together rather than two columns. In the following section, we will dive deeper into how to use SQL to address this problem. Another very practical and realistic reason for learning SQL join is INTERVIEW! That’s right, SQL join is almost an inevitable question in any data scientist or data analyst interview. Some questions are more theoretical and explicit, “Can you tell me the difference between LEFT JOIN and INNER JOIN?”; others are more practical and implicit “Can you write down a SQL statement to find the name of this employee’s manager?” So it gives you another reason to learn and distinguish the implementation of each kinds of join.

How to Create JOIN?

SELECT <attributeName>, <attributeName>, <attributeName>, ...
FROM <TableA> 
ON <TableA.commonAttribute> = <TableB.commonAttribute>

SQL join follows this syntax and I break it down into three components: 1) select the attributes and tables; 2) determine the join condition; 3) choose the appropriate join type.

Attributes and Tables

Just like other SQL statements, it is necessary to specify the attribute and table name in the form of SELECT <attribute> FROM <table>. But the difference is that more than one table is required to join. If same attribute names exist in more than one table, then simply referring to the attribute by name will be ambiguous since the database is uncertain about which table you are selecting this attribute from. To solve this problem, we need to use the table name as the attribute prefix.

For example, if “name” attribute exists in both Customer and Country tables, and we only select the name of customers. Then we refer to the attribute as “Customer.name”.

JOIN Condition

Tables are joined together by at least one common attribute. This common attribute is often referred to as a foreign key. As demonstrated in the excel example, VLOOKUP function also takes advantage of this shared attribute. We use this shared attribute (foreign key) as the matching point where we can find corresponding information of each row in another table. This common attribute needs to be explicitly indicated as the join condition in the SQL statement.

Let’s continue with this country code example. The aim of this exercise is to find the country name of each criterion in the “Google_Ads_GeoTargets” table. The datasets are from Google Public Dataset. Have an exploration and try to implement it in the BigQuery if you are interested.

In the left table, we have already got the country_code for each criterion and the Country_Code table provides us with the country name of fips_code. Therefore, the logic is to match the country_code in the GeoTarget table to the fips_code in the Country_Code table and find the corresponding name. country_code and fips_code are the common attributes that build the relationship between two tables.

We write down the following statement to indicate the join condition. Notice that it is better to use the table name as the attribute prefix and don’t forget the keyword “ON”.

ON  Google_Ads_GeoTargets.country_code = Country_Code.fips_code


There are more five major join types: left join, inner join, full join, self join, and cross join. In order to communicate to the database, we need to explicitly or implicitly indicate which join type in the statement. This is achieved by using keywords “LEFT JOIN”, “INNER JOIN” or “FULL OUTER JOIN” etc. Each type has its distinct use cases. Hopefully, the comparison below will help you to distinguish their subtle differences.

1. Left Join

Left join is the most similar to VLOOKUP in excel. The table on the right can be seen as a reference table or a dictionary from which we are extending the existing knowledge stored in the left table. Therefore, left join is adopted to return all records in the left table and reference the corresponding values from the right table. Left join can also provide us with more insights on why some values failed to find a match. E.g. Is it a result of incorrect records or typos in the left table or is it because of inexhaustive data in the right table?

We use this statement to generate LEFT JOIN result shown in the picture:

SELECT criteria_id, country_code, country_name, fips_code
FROM Google_Ads_GeoTargets gt
LEFT JOIN Country_Code cc
ON gt.country_code = cc.fips_code;
LEFT JOIN result