pg-id: generates random-looking never repeating primary key ids and more
This library contains PostgreSQL functions to generate bigint ids having the following format (examples):
# Full BIGINT versions (default)
EssssRRRRRRRRRRRRRR - 100 trillion ids per microshard
^ ^^^^^^^^^^^^^^
4 14
# Full BIGINT versions, 2 digits environment
EEssssRRRRRRRRRRRRR - 10 trillion ids
^ ^ ^^^^^^^^^^^^^
2 4 13
# MAX_SAFE_INTEGER versions
EssssRRRRRRRRRRR - 100 billion ids
^ ^^^^^^^^^^^
4 11
# MAX_SAFE_INTEGER versions, 2 digits environment
EEssssRRRRRRRRRR - 10 billion ids
^ ^ ^^^^^^^^^^
2 4 10
Also, id_gen_uuid()
generates UUIDs v4 looking like (examples):
# UUID v4, one digit environment
EssssRRR-RRRR-4RRR-NRRR-RRRRRRRRRRRR
# UUID v4, two digit environment
EEssssRR-RRRR-4RRR-NRRR-RRRRRRRRRRRR
Here,
- "E" or "EE" is an "environment number" aka "cluster number" (e.g. denoting
dev, test, staging, production, or different production regions like
us-east-1, us-west-2 etc.). The current environment number (aka cluster
number) should be returned by function
id_env_no()
: you need to create it beforehand in the schema where you install the library (or, alternatively, pass it inPG_ID_ENV_NO
environment variable before installing). - "ssss" is a 4-digit "microshard number". The range is 0..9999 by default (but
you can allocate less or more digits for it). The current value of the
microshard should be returned by function
id_shard_no()
: you need to create it beforehand in the schema where you install the library (or, alternatively, the schema name should end with that number; then, the library will auto-infer the shard number from it). - "RRRRRRRRRRRRRR" is a value which is unique within the chosen "environment
number" and "microshard number". Depending on the function used, it is either
a randomly-looking number (for
id_gen()
andid_gen_uuid()
), a number based on the current timestamp (forid_gen_timestampic()
) or just an auto-incrementing number (forid_gen_monotonic()
).
You can customize the length of "E" and "ssss" sequences in your
pg-id.config.sql
file, see below.
The range of "E", "EE", "EEE" etc. must satisfy and not overflow the MAX_BIGINT=9223372036854775807 and MAX_SAFE_INTEGER=9007199254740991 decimal prefixes. The library validates that at the installation time. Examples:
- "E" is 1..8
- "EE" is 10..89
- "EEE" is 100..899
All ids generated will have exactly the same number of decimal digits in them.
Installation
First, copy pg-id.config.sql.example
to pg-id.config.sql
in the root of your
app and update CONST_MUL
, CONST_SUM
and CONST_MOD
with some random numbers
that only you know. Those numbers will play the role of crypto constants which
will not allow people to easily guess the pattern of ids generated by
id_gen()
.
--
-- Format of id: EssssRRRRR...; for the full BIGINT:
-- * E: e.g. 1..8 (1 digit)
-- * ssss: e.g. 0000...9999 (4 digits)
-- * RRR...: e.g. remaining 14 decimal positions
-- * 2^63 = 9_2233_72036854775808, so we have 19 decimal positions in total
--
-- Once set, NEVER change the values below, or the database will blow up!!!
--
--
-- Define the length of the id prefix for all id_gen*() functions.
--
\set CONST_ENV_MUL 10
\set CONST_SHARD_MUL 10000
--
-- The minimal timestamp for id_gen_timestampic().
-- Default is "2020-01-01 00:00:00 UTC", makes ids work till 2051-01-01
-- (the difference should fit in 9 decimal digits). The world won't live
-- that long anyway due to the future AI apocalypse.
--
\set CONST_RND_TS_START 1577836800
--
-- Crypto-constants for id_gen(): all other methods don't use them. Based on
-- https://wiki.postgresql.org/wiki/Pseudo_encrypt
-- The values below are very secret. The more random and longer they are, the
-- better. Customize them once and don't change!
--
\set CONST_MUL 17141763000000
\set CONST_SUM 13795571000000
\set CONST_MOD 19458232000000
WARNING: the above numbers (with 000000
suffixes) are given only as
examples. Do not use those values as they are! Instead, generate your own
random numbers (better if they don't have too many common factors, but even if
they do, it's also fine). Then save them in pg-id.config.sql
file as explained
above.
Then run in psql console:
-- Selects the schema you want to install the library to.
SET search_path TO your_schema_1234;
-- Optional: if you don't want to pass PG_ID_ENV_NO environment variable,
-- create this configuration function manually.
--CREATE OR REPLACE FUNCTION id_env_no() RETURNS integer LANGUAGE sql
-- SET search_path FROM CURRENT AS 'SELECT 1';
-- Optional: if your schema name doesn't end up with a microshard number,
-- create this configuration function manually.
--CREATE OR REPLACE FUNCTION id_shard_no() RETURNS integer LANGUAGE sql
-- SET search_path FROM CURRENT AS 'SELECT 123';
-- Install the library to the current schema. It will load crypto constants
-- from pg-id.config.sql, get the environment number from PG_ID_ENV_NO
-- environment variable and infer microshard number from the numeric
-- suffix of the current schema.
\ir .../pg-id-up.sql
The pg-id-up.sql
installation script will search for pg-id.config.sql
file
in the parent folders (it will fail if the file is not found).
By default, environment number is read from PG_ID_ENV_NO
shell env variable.
This way, you can assign it to e.g. 1 for development, 2 for staging and 3 for
production environments. If for some reason you don't want this behavior, just
define id_env_no()
function manually.
Also, unless you created id_shard_no()
manually, the installation script will
try to infer the microshard number from the 1st sequence of numbers in the
schema name. E.g. if your schemas are named sh0000
, sh0042
,
your_schema_1234
etc., then you're good with that default behavior.
id_gen([seq_name regclass])
Generates next globally-unique randomly-looking id. The main idea is to not let external people infer the rate at which the ids are generated, even when they look at some ids sample.
By default, the function uses id_seq
sequence in the function's schema to get
the information about the next available number. (You can also pass your own
sequence name if needed.) Then, it utilizes Feistel cipher to generate a
randomly-looking non-repeating id based of it.
Examples of ids generated (underscores are just for illustration):
2_0000_17217633124378
: "environment 2, shard 0, number 17217633124378"1_0238_17493700363834
: "environment 1, shard 238, number 17493700363834"
The function always utilizes the full bigint range (19 decimal digits), so if
you store such an id in a JavaScript variable, make sure that you're using
string
type for it (JavaScript's number
doesn't have enough precision). By
default, node-postgres (and other) libraries return bigint
as a string
, exactly
as needed here.
id_gen_max_safe_integer([seq_name regclass])
Similar to id_gen()
, but produces only 16 decimal digits (all ids are less
than Number.MAX_SAFE_INTEGER
). You can assign such an id to a JavaScript
number
variable and not lose any digits.
id_gen_timestampic([seq_name regclass])
Similar to id_gen()
, but instead of generating randomly looking ids, prepends
the "sequence" part of the id with the current 9-decimal-digit timestamp (the
number of seconds passed since 2020-01-01 UTC; this makes the ids generation
work till 2051-01-01). The function also reserves up to 5 decimal digits for the
number part of the id, so within each second, up to 100k unique ids can be
generated. In total, the "random" part of the id will have 9+5=14 decimal
digits, and thus, we need the whole bigint range for such ids.
By default, the function uses id_seq_timestampic
sequence to get the
information about the next available number. You can also pass your own sequence
name instead.
The benefit of this function is performance: increasing ids are more friendly to heavy INSERTs since they maximize the chance for btree index to reuse the newly created leaf pages. At the same time, having timestamp in the prefix doesn't allow to infer the number of objects existing in the database so far.
Example of id generated (underscores are just for illustration):
2_0001_435044939_00029
: "environment 2, shard 1, seconds 435044939, num 29"
The function always utilizes the full bigint range (19 decimal digits). There is no MAX_SAFE_INTEGER-safe version, since we need at least 9 digits for the timestamp part alone.
id_gen_monotonic([seq_name regclass])
The simplest and fastest function among the above: generates next globally-unique monotonic id, without using any timestamps as a prefix. Monotonic ids are more friendly to heavy INSERTs since they maximize the chance for btree index to reuse the newly created leaf pages.
The function implicitly uses id_seq_monotonic
sequence to get the information
about the next available number.
Example of id generated (underscores are just for illustration):
2_0001_00000000000003
: "environment 2, shard 1, number 3"
The downside is that the ids of this format basically expose the number of unique objects which were created in the database so far.
The function always utilizes the full bigint range (19 decimal digits), so if
you store such an id in a JavaScript variable, make sure that you're using
string
type for it (JavaScript's number
doesn't have enough precision). By
default, node-postgres (and other) libraries return bigint
as a string
,
exactly as needed here.
id_gen_monotonic_max_safe_integer([seq_name])
Similar to id_gen_monotonic()
, but produces only 16 decimal digits (all ids
are less than Number.MAX_SAFE_INTEGER
). You can assign such an id to a
JavaScript number
variable and not lose any digits.
id_gen_uuid()
An example UUID generated by that function:
10246RRR-RRRR-4RRR-NRRR-RRRRRRRRRRRR
Here, 1 is environment number (e.g. production) and 0246 is microshard number.
Internally, the function uses PostgreSQL built-in gen_random_uuid(), but replaces the first digits with environment and microshard numbers. This trick doesn't cut too much of the UUID's entropy, but allows to use UUIDs in microsharded environment.