HKS

PostgreSQL Data Types – A Comprehensive Guide

Learn about the most common data types in PostgreSQL.

Author avatar

Hemanta Sundaray

Published on Saturday, August 3rd, 2024

The data types you’ll encounter the most in PostgreSQL can be categorized into three types:

  1. Characters
  2. Numbers
  3. Dates and times

In addition, I’m also going to cover Array, boolean and uuid data types.

Let's get started!

Characters

PostgreSQL provides two primary character types: varchar(n) and text.

varchar(n)

  • A variable-length character data type, where n represents the maximum number of characters that can be stored in the column.
  • If you attempt to insert fewer characters than the specified maximum, PostgreSQL stores the characters as-is, without adding any extra spaces.
  • If you attempt to insert more characters than the specified maximum, PostgreSQL will raise an error. There's one exception to this rule: if the excess characters are all spaces, PostgreSQL will truncate the spaces to the maximum length.

text

  • A variable-length character data type used to store strings of unlimited length.

Note

You can’t perform mathematical operations on numbers stored in character columns. Store numbers as character types only when they represent phone numbers, ZIP codes and similar data.

Numbers

PostgreSQL offers a variety of data types to handle whole numbers and decimal numbers.

Integers

Integers are whole numbers, both positive and negative, including zero.

The SQL standard provides three integer types:

  1. smallint
  2. integer
  3. bigint

Auto-incrementing Integers

Auto-incrementing integers are commonly used as primary keys. You can implement auto-incrementing integers in two ways:

  1. serial types
  2. identity columns
Serial

There are three serial types, which correspond to three integer types:

  1. smallserial (corresponds to smallint)
  2. serial (corresponds to integer)
  3. bigserial (corresponds to bigint)

Note that these are not true data types, but rather a notational convenience. When you use a serial type, PostgreSQL actually creates a sequence and sets up the column to use that sequence as its default value.

When you add a column with a serial type, PostgreSQL will auto-increment the value each time you insert a row, starting with 1, up to the maximum of each integer type.

Note

You can manually insert your own values into columns of serial type. If you do so, the sequence associated with the serial column is not automatically updated. For example, if the current highest value in a serial column is 5, and you manually insert a row with the value 100, the next automatic insertion will still use 6, not 101. And if you insert a value that already exists, you’ll encounter unique constraint violations, especially if the serial column is used as a primary key.

Identity

There're two types of identity columns:

  1. generated always as identity: PostgreSQL will always generate a value for this column. If you try to insert a value explicitly, it will raise an error.

  2. generated by default as identity: PostgreSQL will generate a value if you don't provide one, but it allows you to insert your own values. And since you can insert your own values, it's possible to insert duplicate values if you're not careful.

Decimals

Decimals numbers are handled by fixed-point and floating-point data types.

Fixed-point

  • Also called arbitrary precision types. Arbitrary precision means you can specify exactly how many digits you want to store, both before and after the decimal point.
  • These include numeric and decimal (which are equivalent in PostgreSQL)
  • The syntax is numeric(precision, scale) or decimal(precision, scale), where precision is the total number of significant digits in the entire number, both before and after the decimal point. Scale is the number of decimal digits in the fractional part of the number. For example, numeric(7,2) can store numbers with up to 7 total digits, 5 before the decimal point and 2 after.
  • Recommended for storing monetary types and other quantities where exactness is required.

Floating-point

  • Also called variable precision types because they don’t have a fixed number of decimal places.
  • There are two floating point types: real and double precision.
  • The real type can accurately represent numbers with at least 6 significant digits. The double precision type can accurately represent numbers with at least 15 significant digits.
  • Note that floating point numbers should not be used to handle money due to the potential for rounding errors.

Note

Here’s a breakdown of what makes a digit significant:

  • All non-zero digits are significant.
  • Zeros between non-zero digits are always significant.
  • Leading zeros (zeros before the first non-zero digit) are never significant.
  • Trailing zeros after a decimal point are significant.

Dates and times

PostgreSQL’s date and time support includes four major data types:

  1. timestamp: Records date and time. You'll almost always want to add the keywords with time zone. Timestamps with time zone store date and time values in UTC (Coordinated Universal Time) internally. As a result, when you retrieve a value, PostgreSQL automatically converts the value to the client’s time zone. This means applications in different time zones will see the time correctly adjusted for their local time.

    The format timestamp with time zone is part of the SQL standard. You can specify the same data type in PostgreSQL with timestamptz.

  2. date: Records just the date.

  3. time: Records just the time.

  4. interval: Holds a value representing a unit of time expressed in the format quantity unit. For example: 12 days or 8 hours.

JSON

PostgreSQL offers two JSON data types: json and jsonb. Both store JSON (JavaScript Object Notation) data.

When you store data using the json data type in PostgreSQL, the database saves the JSON text exactly as it was input, preserving all formatting, whitespace, and the order of keys.

Every time you need to access or manipulate this JSON data (e.g., when executing a query that processes the JSON), PostgreSQL has to read through the entire JSON text and convert it into a usable structure in memory. This process is called parsing.

This reparsing happens each time you execute a function or query that needs to work with the JSON data. This can lead to slower performance, especially for large JSON documents or frequent operations.

In contrast, the jsonb type stores the data in a pre-parsed binary format. This means the initial storage might take slightly longer (due to the parsing and conversion to binary). Subsequent operations are much faster because the data is already in a format that PostgreSQL can directly use, without needing to parse it again.

Note

JSONB is generally preferred over JSON for several reasons:

  1. jsonb is faster to process and analyze.
  2. jsonb supports indexing, which can significantly speed up queries.

Array

An array data type is named by appending square brackets ([]) to the data type name of the array elements. For example: integer[], text[], and so on.

Boolean

A boolean can have one of three possible values: true, false, or null (unknown).

In PostgreSQL, you can use the following values to represent the true state: true, yes, on, 1. For false: false, no, off, 0.

UUID

The uuid data type stores Universally Unique Identifiers (UUID). It's written as a sequence of lower-case hexadecimal digits, in five groups (in the form of 8-4-4-4-12) separated by hyphens.

An example of a UUID in this standard form is: 123e4567-e89b-12d3-a456-426614174000.

Last updated on Tuesday, May 14th, 2024