The data types you’ll encounter the most in PostgreSQL can be categorized into three types:
- Characters
- Numbers
- Dates and times
In addition, I’m also going to cover Array, boolean
and uuid
data types.
Let's get started!
Characters
PostgreSQL provides two primary character types: varchar(n)
and text
.
varchar(n)
- A variable-length character data type, where
n
represents the maximum number of characters that can be stored in the column. - If you attempt to insert fewer characters than the specified maximum, PostgreSQL stores the characters as-is, without adding any extra spaces.
- If you attempt to insert more characters than the specified maximum, PostgreSQL will raise an error. There's one exception to this rule: if the excess characters are all spaces, PostgreSQL will truncate the spaces to the maximum length.
text
- A variable-length character data type used to store strings of unlimited length.
Note
You can’t perform mathematical operations on numbers stored in character columns. Store numbers as character types only when they represent phone numbers, ZIP codes and similar data.
Numbers
PostgreSQL offers a variety of data types to handle whole numbers and decimal numbers.
Integers
Integers are whole numbers, both positive and negative, including zero.
The SQL standard provides three integer types:
smallint
integer
bigint
Auto-incrementing Integers
Auto-incrementing integers are commonly used as primary keys. You can implement auto-incrementing integers in two ways:
serial
typesidentity
columns
Serial
There are three serial
types, which correspond to three integer
types:
smallserial
(corresponds tosmallint
)serial
(corresponds tointeger
)bigserial
(corresponds tobigint
)
Note that these are not true data types, but rather a notational convenience. When you use a serial
type, PostgreSQL actually creates a sequence and sets up the column to use that sequence as its default value.
When you add a column with a serial
type, PostgreSQL will auto-increment the value each time you insert a row, starting with 1, up to the maximum of each integer type.
Note
You can manually insert your own values into columns of serial
type. If you
do so, the sequence associated with the serial
column is not automatically
updated. For example, if the current highest value in a serial
column is 5,
and you manually insert a row with the value 100, the next automatic insertion
will still use 6, not 101. And if you insert a value that already exists,
you’ll encounter unique constraint violations, especially if the serial
column is used as a primary key.
Identity
There're two types of identity
columns:
-
generated always as identity: PostgreSQL will always generate a value for this column. If you try to insert a value explicitly, it will raise an error.
-
generated by default as identity: PostgreSQL will generate a value if you don't provide one, but it allows you to insert your own values. And since you can insert your own values, it's possible to insert duplicate values if you're not careful.
Decimals
Decimals numbers are handled by fixed-point and floating-point data types.
Fixed-point
- Also called arbitrary precision types. Arbitrary precision means you can specify exactly how many digits you want to store, both before and after the decimal point.
- These include
numeric
anddecimal
(which are equivalent in PostgreSQL) - The syntax is
numeric(precision, scale)
ordecimal(precision, scale)
, whereprecision
is the total number of significant digits in the entire number, both before and after the decimal point. Scale is the number of decimal digits in the fractional part of the number. For example,numeric(7,2)
can store numbers with up to 7 total digits, 5 before the decimal point and 2 after. - Recommended for storing monetary types and other quantities where exactness is required.
Floating-point
- Also called variable precision types because they don’t have a fixed number of decimal places.
- There are two floating point types:
real
anddouble precision
. - The
real
type can accurately represent numbers with at least 6 significant digits. Thedouble precision
type can accurately represent numbers with at least 15 significant digits. - Note that floating point numbers should not be used to handle money due to the potential for rounding errors.
Note
Here’s a breakdown of what makes a digit significant:
- All non-zero digits are significant.
- Zeros between non-zero digits are always significant.
- Leading zeros (zeros before the first non-zero digit) are never significant.
- Trailing zeros after a decimal point are significant.
Dates and times
PostgreSQL’s date and time support includes four major data types:
-
timestamp: Records date and time. You'll almost always want to add the keywords
with time zone
. Timestamps with time zone store date and time values in UTC (Coordinated Universal Time) internally. As a result, when you retrieve a value, PostgreSQL automatically converts the value to the client’s time zone. This means applications in different time zones will see the time correctly adjusted for their local time.The format
timestamp with time zone
is part of the SQL standard. You can specify the same data type in PostgreSQL withtimestamptz
. -
date: Records just the date.
-
time: Records just the time.
-
interval: Holds a value representing a unit of time expressed in the format
quantity unit
. For example: 12 days or 8 hours.
JSON
PostgreSQL offers two JSON data types: json
and jsonb
. Both store JSON (JavaScript Object Notation) data.
When you store data using the json
data type in PostgreSQL, the database saves the JSON text exactly as it was input, preserving all formatting, whitespace, and the order of keys.
Every time you need to access or manipulate this JSON data (e.g., when executing a query that processes the JSON), PostgreSQL has to read through the entire JSON text and convert it into a usable structure in memory. This process is called parsing.
This reparsing happens each time you execute a function or query that needs to work with the JSON data. This can lead to slower performance, especially for large JSON documents or frequent operations.
In contrast, the jsonb
type stores the data in a pre-parsed binary format. This means the initial storage might take slightly longer (due to the parsing and conversion to binary). Subsequent operations are much faster because the data is already in a format that PostgreSQL can directly use, without needing to parse it again.
Note
JSONB is generally preferred over JSON for several reasons:
jsonb
is faster to process and analyze.jsonb
supports indexing, which can significantly speed up queries.
Array
An array data type is named by appending square brackets ([]) to the data type name of the array elements. For example: integer[]
, text[]
, and so on.
Boolean
A boolean
can have one of three possible values: true
, false
, or null
(unknown
).
In PostgreSQL, you can use the following values to represent the true state: true
, yes
, on
, 1
. For false: false
, no
, off
, 0
.
UUID
The uuid
data type stores Universally Unique Identifiers (UUID). It's written as a sequence of lower-case hexadecimal digits, in five groups (in the form of 8-4-4-4-12) separated by hyphens.
An example of a UUID in this standard form is: 123e4567-e89b-12d3-a456-426614174000
.
Last updated on Tuesday, May 14th, 2024