SQL tutorial: Learn SQL on PostgreSQL

Follow this step-by-step guide to install PostgreSQL, load data, connect to a database, and execute SQL queries including table joins

SQL tutorial: Learn SQL on PostgreSQL
2488716 via Pixabay (CC0)

PostgreSQL is an open-source, object-relational (also called extended relational) database management system. Modern relational database features in PostgreSQL include complex queries, foreign keys, triggers, updatable views, transactional integrity, and multi-version concurrency control. Users can extend PostgreSQL with new data types, functions, operators, aggregate functions, index methods, and procedural languages.

With more than 20 years of development and deployment behind it, PostgreSQL is a solid open-source database that rivals even commercial relational databases in many respects. You can install it on Linux (all recent distributions), Windows (Windows 2000 SP4 and later), FreeBSD, OpenBSD, NetBSD, MacOS, AIX, HP/UX, and Solaris. You can also find a hosted high-performance version of PostgreSQL in Amazon Aurora, and a wire-compatible distributed implementation in CockroachDB.

From here on in, I will use the names PostgreSQL and Postgres (not all-caps) interchangeably, as is the general practice. I can pronounce Postgres; I never know whether to say Post-gres-Q-L or Post-gres-sequel.

In this tutorial, we will step through installing PostgreSQL, connecting to the database, loading some data, and running a variety of SQL queries. To cap it all off, we’ll execute a query against two related tables by using SQL join clauses. Along the way, I will suggest some GUI clients for Postgres you might want to use, and provide a brief overview of useful PostgreSQL extensions. Finally, I’ll leave you with a few good references to help you take your next steps with Postgres and SQL. 

Install PostgreSQL

My InfoWorld colleague Serdar Yegulalp wrote a strong tutorial on installing and configuring PostgreSQL, “Get started with PostgreSQL 10.” I used that to inform my installation on an iMac, and wound up downloading and running the Mac-only Postgres.App shown in the screenshot below.

postgres app IDG

In addition to installing the latest production database version (10.5) with default configuration settings, I added the Postgres command-line utilities to my computer's path and installed both the cross-platform pgAdmin 4 and the Mac-only Postico GUI tools, as shown below.

postgres cli utils IDG

It’s useful to add the command-line utilities for Postgres to your path.

pgadmin IDG

pgAdmin 4 displays its UI in a web browser. The transactions being monitored in the postgres database are for maintenance purposes.

postico IDG

Postico is a commercial Postgres client product for MacOS. The free trial has limits, but it is usable and does not expire.

Since I have an active JetBrains subscription, I also installed DataGrip, a multi-database client tool. If I were doing this exercise on Windows, I might well use the visual SQL query builder and SQL command line in Alpha Anywhere.

Connect to a PostgreSQL database

The key things to know to connect to your Postgres database are its host, its assigned port, the database that you’d like to use initially, and your username and password. Several of these have well-known default values.

If you install Postgres on the same computer as the client, the host will be localhost. By default, there will be two visible databases, one named postgres and one using your user name. The port will be 5432 unless it was changed, typically at database startup.

The information above is independent of the client you use. For most purposes, the psql command line works just fine at the beginning. Later on, when you're trying to construct queries with complicated joins, you might want to use a graphical client, primarily to save yourself a lot of time and error-prone typing.

To connect with psql if your database uses the defaults and is on the computer you are using, make sure your database is running, and type psql <enter> at the command line:

Martins-iMac:~ mheller$ psql
psql (10.5)
Type "help" for help.

If you instead see an error message to the effect that the command was not found, you may either add the Postgres command-line utilities to your path and restart your shell, or fully qualify the location of the executable.

Once you have psql running, if you type help <enter> you’ll see something like this:

mheller=# help
You are using psql, the command-line interface to PostgreSQL.
Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit
mheller=#

The prompt tells you what database you are using. You can find out more with \conninfo, and change the connection with \c:

mheller=# \conninfo
You are connected to database "mheller" as user "mheller" via socket in "/tmp" at port "5432".
From the command help (\?):
\c[onnect] {[DBNAME|- USER|- HOST|- PORT|-] | conninfo}
                 connect to new database (currently "mheller")

Let’s try changing databases:

mheller=# \c postgres
You are now connected to database "postgres" as user "mheller".
postgres=#

And now let’s switch back, in my case to database mheller:

postgres=# \c mheller
You are now connected to database "mheller" as user "mheller".
mheller=#

Your default database and user names will differ from mine.

To execute SQL commands and queries, add a semicolon (;) when the query is complete and then press <enter>. Here we create a new database, connect to it, connect back to our default database, and finally drop the new database:

mheller=# create database test;
CREATE DATABASE
mheller=# \c test
You are now connected to database "test" as user "mheller".
test=# \c mheller
You are now connected to database "mheller" as user "mheller".
mheller=# drop database test
mheller-# ;
DROP DATABASE
mheller=#

Database creates and drops can also be accomplished from the system shell. The relevant command-line utilities are createdb and dropdb.

Load data into a PostgreSQL database

Database design is an art, not a science, and takes considerable experience to get right. Even expert database designers often have to modify their designs over time to match the application data usage pattern, for example by adding indexes for common queries, or by adding foreign key constraints to enforce referential integrity.

Rather than taking weeks (or months) to learn database design, let’s load a stock database that’s already in Postgres format. I wanted to use one of the PgFoundry Sample Databases, but alas, that site is no longer active.

One alternative would be to use the three-table database from the PostgreSQL Exercises site; it’s simple enough, but I’ll leave that for you to work through later, since you can do all the exercises online, and I want you to use a database locally now. The SportsDB and DVD Rental (Pagila) samples are interesting, but a bit too complicated for a first go-round.

Instead, I want you to open https://github.com/pthom/northwind_psql and either clone or download and unpack the repository. Northwind was originally a Microsoft sample, but GitHub user pthom has adapted it for Postgres. I like Northwind mostly because I used to use it to teach SQL to developers, and became familiar with the data. The database diagram is as below:

entityrelation model IDG

Entity-relationship diagrams like this one come out of tools; this image looks a lot like it was generated by SQL Server Management Studio. Essentially, the rectangles are tables, the columns are shown within the rectangles, and the primary and foreign keys are marked with key icons. The connections between tables are the foreign key contraints.

You can load the database into PostgreSQL using the create_db script, assuming that you are on a machine with a Bash shell. If you are running on Windows, try renaming create_db.sh to create_db.cmd and deleting the first line of the script.

Open a shell in the directory where you cloned or unpacked northwind_psql and run the script. A successful database load will look something like this:

Martins-iMac:northwind_psql mheller$ ./create_db.sh
NOTICE:  database "northwind" does not exist, skipping
NOTICE:  role "northwind_user" does not exist, skipping
SET
SET
SET
SET
SET
SET
SET
SET
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1

ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
CREATE ROLE
ALTER ROLE
GRANT
GRANT

Now we can try querying the database.

Run SQL queries in PostgreSQL

To begin with, let’s try looking at a single table. From a command line run psql against the northwind database. Your default username will probably work; if not, use the credentials user northwind_user and password thewindisblowing.

What are the regions defined in the database? The select query should look like the following. I got the table name from the diagram above.

northwind=# select * from region;
region_id | region_description
-----------+--------------------
         1 | Eastern
         2 | Western
         3 | Northern
         4 | Southern
(4 rows)

That was easy. The asterisk notation says to return all fields in the table. Suppose we just wanted to see the region names in alphabetical order. That requires specifying the desired field name, the name of which we learned from the previous query, and adding an order by clause. Ascending order is the default.

northwind=# select region_description from region order by region_description;
 region_description
--------------------
 Eastern
 Northern
 Southern
 Western
(4 rows)

Now let’s look at the product categories:

northwind=# select * from categories;
 category_id | category_name  |                        description                         | picture
-------------+----------------+------------------------------------------------------------+---------
           1 | Beverages      | Soft drinks, coffees, teas, beers, and ales                | \x
           2 | Condiments     | Sweet and savory sauces, relishes, spreads, and seasonings | \x
           3 | Confections    | Desserts, candies, and sweet breads                        | \x
           4 | Dairy Products | Cheeses                                                    | \x
           5 | Grains/Cereals | Breads, crackers, pasta, and cereal                        | \x
           6 | Meat/Poultry   | Prepared meats                                             | \x
           7 | Produce        | Dried fruit and bean curd                                  | \x
           8 | Seafood        | Seaweed and fish                                           | \x
(8 rows)

And let’s look at the beverage products, which we just saw have category_id=1. That requires a where clause.

northwind=# select * from products where category_id=1;
 product_id |       product_name        | supplier_id | category_id |  quantity_per_unit  | unit_price | units_in_stock | units_on_order | reorder_level | discontinued
------------+---------------------------+-------------+-------------+---------------------+------------+----------------+----------------+---------------+--------------
          1 | Chai                      |           8 |           1 | 10 boxes x 30 bags  |         18 |             39 |              0 |            10 |            1
          2 | Chang                     |           1 |           1 | 24 - 12 oz bottles  |         19 |             17 |             40 |            25 |            1
         24 | Guaraná Fantástica        |          10 |           1 | 12 - 355 ml cans    |        4.5 |             20 |              0 |             0 |            1
         34 | Sasquatch Ale             |          16 |           1 | 24 - 12 oz bottles  |         14 |            111 |              0 |            15 |            0
         35 | Steeleye Stout            |          16 |           1 | 24 - 12 oz bottles  |         18 |             20 |              0 |            15 |            0
         38 | Côte de Blaye             |          18 |           1 | 12 - 75 cl bottles  |      263.5 |             17 |              0 |            15 |            0
         39 | Chartreuse verte          |          18 |           1 | 750 cc per bottle   |         18 |             69 |              0 |             5 |            0
         43 | Ipoh Coffee               |          20 |           1 | 16 - 500 g tins     |         46 |             17 |             10 |            25 |            0
         67 | Laughing Lumberjack Lager |          16 |           1 | 24 - 12 oz bottles  |         14 |             52 |              0 |            10 |            0
         70 | Outback Lager             |           7 |           1 | 24 - 355 ml bottles |         15 |             15 |             10 |            30 |            0
         75 | Rhönbräu Klosterbier      |          12 |           1 | 24 - 0.5 l bottles  |       7.75 |            125 |              0 |            25 |            0
         76 | Lakkalikööri              |          23 |           1 | 500 ml              |         18 |             57 |              0 |            20 |            0
(12 rows)

There are a lot of fields in employees, so let’s just look at a few of them. Here we are ordering by two fields, title and last_name.

northwind=# select first_name, last_name, title from employees order by title,last_name;
 first_name | last_name |          title          
------------+-----------+--------------------------
 Laura      | Callahan  | Inside Sales Coordinator
 Steven     | Buchanan  | Sales Manager
 Nancy      | Davolio   | Sales Representative
 Anne       | Dodsworth | Sales Representative
 Robert     | King      | Sales Representative
 Janet      | Leverling | Sales Representative
 Margaret   | Peacock   | Sales Representative
 Michael    | Suyama    | Sales Representative
 Andrew     | Fuller    | Vice President, Sales
(9 rows)

I happen to know that there are a lot of customers, so let’s just look at a few of them, using a limit clause. This table uses alphabetic customer IDs, so the default order will make sense.

northwind=# select * from customers limit 10;
 customer_id |            company_name            |    contact_name    |    contact_title     |            address            |    city     | region | postal_code | country |     phone      |      fax       
-------------+------------------------------------+--------------------+----------------------+-------------------------------+-------------+--------+-------------+---------+----------------+----------------
 ALFKI       | Alfreds Futterkiste                | Maria Anders       | Sales Representative | Obere Str. 57                 | Berlin      |        | 12209       | Germany | 030-0074321    | 030-0076545
 ANATR       | Ana Trujillo Emparedados y helados | Ana Trujillo       | Owner                | Avda. de la Constitución 2222 | México D.F. |        | 05021       | Mexico  | (5) 555-4729   | (5) 555-3745
 ANTON       | Antonio Moreno Taquería            | Antonio Moreno     | Owner                | Mataderos  2312               | México D.F. |        | 05023       | Mexico  | (5) 555-3932   |
 AROUT       | Around the Horn                    | Thomas Hardy       | Sales Representative | 120 Hanover Sq.               | London      |        | WA1 1DP     | UK      | (171) 555-7788 | (171) 555-6750
 BERGS       | Berglunds snabbköp                 | Christina Berglund | Order Administrator  | Berguvsvägen  8               | Luleå       |        | S-958 22    | Sweden  | 0921-12 34 65  | 0921-12 34 67
 BLAUS       | Blauer See Delikatessen            | Hanna Moos         | Sales Representative | Forsterstr. 57                | Mannheim    |        | 68306       | Germany | 0621-08460     | 0621-08924
 BLONP       | Blondesddsl père et fils           | Frédérique Citeaux | Marketing Manager    | 24, place Kléber              | Strasbourg  |        | 67000       | France  | 88.60.15.31    | 88.60.15.32
 BOLID       | Bólido Comidas preparadas          | Martín Sommer      | Owner                | C/ Araquil, 67                | Madrid      |        | 28023       | Spain   | (91) 555 22 82 | (91) 555 91 99
 BONAP       | Bon app'                           | Laurence Lebihan   | Owner                | 12, rue des Bouchers          | Marseille   |        | 13008       | France  | 91.24.45.40    | 91.24.45.41
 BOTTM       | Bottom-Dollar Markets              | Elizabeth Lincoln  | Accounting Manager   | 23 Tsawassen Blvd.            | Tsawassen   | BC     | T2F 8M4     | Canada  | (604) 555-4729 | (604) 555-3745
(10 rows)

How many rows are actually in the customers table? The count function, one of many aggregates available in SQL, can tell us, without generating lots of output:

northwind=# select count(*) from customers; 
count
-------
    91
(1 row)

Use SQL joins in PostgreSQL

At this point, we are going to create and run a query against multiple tables using join clauses, almost always inner join clauses (the default). Join clauses can get squirrely to write by hand even if you refer to the database diagram, so I suggest using a graphical database client with a SQL builder.

The general pattern for a join clause is that you list the two related tables, and then specify the fields that must match. In the example below we are relating employees and territories through an intermediate table, employee_territories. That’s a typical database design pattern when tables have a many to many relationship: One employee can have many territories, and one territory can belong to many employees. The on portion of the join clause defines the variables that must match for the rows to be included. Inner joins require that both sides be present in order to return the row in the result; outer joins allow either side to be present; and left and right joins require only the respective side to be present.

The query to list employees and their territories looks like this:

select last_name, first_name, territory_description from employees
join employee_territories on employees.employee_id=employee_territories.employee_id
join territories on employee_territories.territory_id = territories.territory_id
order by last_name, first_name, territory_description

In the DataGrip IDE, the query (top right) and result table (bottom left) look like this:

datagrip query 1 IDG

49 rows were returned; I could scroll through the window to see them all.

Understand PostgreSQL extensions

Beyond standard SQL, PostgreSQL supports several advanced features: views, foreign keys, transactions, window functions, and schema inheritance. In addition to these, it supports geometric types, text-search types, JSON and JSONB types, and array types. When you look at the internals of PostgreSQL, you’ll find generic index types that make it reasonably easy to add custom data types and still search them efficiently.

Views are essentially saved queries, which can be treated as though they were tables. Foreign keys enforce table relationships, for example (in the case of school enrollments) forbidding you from adding a non-existent student to a class roster, or from deleting a class roster that has students enrolled.

Transactions allow a series of SQL statements to be rolled back if any of them fail, for example (in the case of bank transfers) rolling back a credit if the corresponding debit fails (e.g. for lack of funds). Window functions are like aggregates, but still display the rows being aggregated. Schema inheritance sounds convenient for defining tables, but currently has limitations that restrict its utility.

JSON, JSONB, and array types allow PostgreSQL to be used as a MongoDB-like document database. Geometric types support spatial and geographic information. Text-search types allow PostgreSQL to be used for Google-like full-text search. There’s also an XML type, but the implementation is currently incomplete.

Take your next steps with PostgreSQL and SQL

To learn database design and SQL in depth, you will want to work through at least one book or course, and probably more than one. I happen to like Database Design for Mere Mortals and SQL Queries for Mere Mortals, but you can’t really go wrong with any of the books from Addison-Wesley or O’Reilly. You also can’t really go wrong with any of the relevant online courses from reputable publishers and academic institutions.

Note that there are many dialects of SQL that differ from the Postgres dialect, especially when you get into specialized areas such as GIS and JSON support, or support for stored procedures. An inclusive list of Postgres books can be found on the PostgreSQL site.

For Postgres-specific information, an essential reference is the PostgreSQL documentation. You can get some of the same information locally at the command line with man postgres, and from within psql with the \? and \h commands, such as the following:

mheller=# \h create table
Command:     CREATE TABLE
Description: define a new table
Syntax:
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name ( [
  { column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
    | table_constraint
    | LIKE source_table [ like_option ... ] }
    [, ... ]
] )
[ INHERITS ( parent_table [, ... ] ) ]
[ PARTITION BY { RANGE | LIST } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ]
[ WITH ( storage_parameter [= value] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE tablespace_name ]

One very useful free interactive online resource is the PostgreSQL Exercises site. Additional online resources are listed on the PostgreSQL site.

Copyright © 2018 IDG Communications, Inc.