Sharding Table with postgres_fwd on PostgreSQL-12

In previous post, we have create table partitioning to scale out database. Table partition is stored in the same server, data will store in the single database. In case, we need to distribute data across remote database, we need to use sharding to distribute data in the table. Bellow is the figure of sharding concept.

Fig 1 Sharding Table

Environment Setup

Environment setup is set up as below.I have set up my environment as below.

  • shard01 (sharding server 01): shard01.itstikk.pro
  • shard02 (sharding server 02): shard02.itstikk.pro

Then we install postgresql to each server.

  • Benefit of Sharding - Allow system to scale out and improve performance of query.

  • Drawback of Sharding - Limitation on some certian type of queries.

Here is my deployed architecture.

Fig 02 Deploy sharding to Environment

Install PostgreSQL and Foreign Data Wrapper for Sharding

In this post, we will use pg_shard to implement sharding in PostgreSQL 12. First, we need to install PostgreSQL 12 to shard01 and shard02 server. In order to install PostgreSQL 12 you can refer here. After installing PostgreSQL, we need to install contribute package which included module for postgres_fdw.

After installing PostgreSQL 12 to Server, we need to configure postgres.conf and pg_hba.conf to all connection and permission for remote connection in PostgreSQL 12 server shard01 and shard02.

[root@shard01 ~]#vi /var/lib/pgsql/12/data/postgresql.conf
# - Connection Settings -

listen_addresses = '*'          # what IP address(es) to listen on;
                                        # comma-separated list of addresses;
                                        # defaults to 'localhost'; use '*' for all
                                        # (change requires restart)

For allow remote connection, we need to edit pg_hba.conf

[root@shard01 ~]#vi /var/lib/pgsql/12/data/pg_hba.conf
# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             0.0.0.0/0            md5
# IPv6 local connections:
host    all             all             ::/0                 md5
# Allow replication connections from localhost, by a user with the

We apply same configuration in PostgreSQL Server shard02 as below.

[root@shard02 ~]#vi /var/lib/pgsql/12/data/postgresql.conf
# - Connection Settings -

listen_addresses = '*'          # what IP address(es) to listen on;
                                        # comma-separated list of addresses;
                                        # defaults to 'localhost'; use '*' for all
                                        # (change requires restart)

For allow remote connection, we need to edit pg_hba.conf

[root@shard02 ~]#vi /var/lib/pgsql/12/data/pg_hba.conf
# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             0.0.0.0/0            md5
# IPv6 local connections:
host    all             all             ::/0                 md5
# Allow replication connections from localhost, by a user with the

Next we need to install contribute package which require for postgres_fdw in shard01

[root@shard01 ~]#yum install -y postgresql10-contrib.x86_64

postgres_fdw is database object, so we have to create database and create postgres_fdw in the database created at shard01

[root@shard01 ~]#su - postgres
-bash-4.2$psql -h shard01 -U postgres postgres
postgres=# create database db01;
postgres=# create user dbusr01 with encrypted password 'secret';
postgres=# grant all privileges on database db01 to dbusr01;
postgres=# \c db01

Now, we need to create database and user to store data on PostgreSQL server Shard02.

[root@shard02 ~]#su - postgres
-bash-4.2$psql -h shard01 -U postgres postgres
postgres=# create database db01;
postgres=# create user dbusr01 with encrypted password 'secret';
postgres=# grant all privileges on database db01 to dbusr01;
postgres=# \c db01

After creating database and user on shard02 PostgreSQL Server, we can create postgres_fdw, user mapping on shard01

  • Create postgres_fdw - to access remote data in external PostgreSQL server, postgres_fdw (Foreign Data Wrapper) is used. To create foreign data wrapper in PostgreSQL on shard01

    [root@shard01 ~]#su - postgres
    -bash-4.2$psql -h shard01 -U postgres -d db01
    db01=#CREATE EXTENSION postgres_fdw;
  • Create server - to create a foreign server using CREATE SERVER.

    db01=#CREATE SERVER foreign_server
            FOREIGN DATA WRAPPER postgres_fdw
            OPTIONS (host 'shard02.itstikk.pro', port '5432', dbname 'db01');
  • Create user mapping - Then we create user mapping for accessing remote external PostgreSQL

    db01=#CREATE USER MAPPING FOR dbusr01 -- local user (shard01)
            SERVER foreign_server
            OPTIONS (user 'dbusr01', password 'qf48d8uv'); -- remote user (shard02)

Create foreign table

Before you create foreign table in the shard01 server, you need to create table in the database (db01) on shard02 PostgreSQL

[root@shard02 ~]#su - postgres
-bash-4.2$psql -h shard01 -U dbusr01 -d db01
db01=>CREATE TABLE sampletbl (
        id integer NOT NULL,
        data varchar(128)
);

Now we can create foreign table in the shard01 server

[root@shard02 ~]#su - postgres
-bash-4.2$ psql -h shard01 -U postgres -d db01
db01=#CREATE FOREIGN TABLE sampletbl ( -- local table name
        id integer NOT NULL,
        data varchar(128)
)
        SERVER foreign_server
        OPTIONS (schema_name 'public', table_name 'sampletbl'); -- remote schema and table name

Next is to test whether sharding is working or not. To test our sharding, we can insert data into foreign table sampletbl created in shard01 as below

-bash-4.2$ psql -h shard01 -U postgres -d db01
db01=#insert into sampletbl (id,data) values
(1,'a'),
(2,'b'),
(3,'c'),
(4,'d'),
(5,'e');

db02=# select * from sampletbl;
 id | data
----+------
  1 | a
  2 | b
  3 | c
  4 | d
  5 | e
(5 rows)

db02=#

Then on shard02, you can select from sampletbl and see the sample result which in fact data has been inserted to sampletbl on shard02 and fetch from shard01

[root@shard02 ~]#su - postgres
-bash-4.2$ psql -h shard01 -U dbusr01 -d db01
db02=> select * from sampletbl;
 id | data
----+------
  1 | a
  2 | b
  3 | c
  4 | d
  5 | e
(5 rows)

Combining Table Partitioning and FDW

Finally, to scale out our PostgreSQL and distribute our data across remote server. We create table partition and distribute created partition on remote external PostgreSQL as following concept.

Fig 03 Table Partition with Foreign Data Wrapper

First, we create table that we will map with the partition on remote extend PostgreSQL

[root@shard02 ~]#su - postgres
-bash-4.2$ psql -h shard01 -U dbusr01 -d db01
db01=>CREATE TABLE temperatures_2016 (
    at      date,
    city    text,
    mintemp integer,
    maxtemp integer
);

db01=>CREATE TABLE temperatures_2017 (
    at      date,
    city    text,
    mintemp integer,
    maxtemp integer
);

On shard01 (master), we create table with partition and foreign tables as below

[root@shard01 ~]#su -postgres
-bash-4.2$ psql -h shard01 -U postgres -d db01
db01=#CREATE TABLE temperatures (
    at      date,
    city    text,
    mintemp integer,
    maxtemp integer
)
PARTITION BY RANGE (at);

db01=#CREATE FOREIGN TABLE temperatures_2016
    PARTITION OF temperatures
    FOR VALUES FROM ('2016-01-01') TO ('2017-01-01')
    SERVER foreign_server;
db01=#CREATE FOREIGN TABLE temperatures_2017
    PARTITION OF temperatures
    FOR VALUES FROM ('2017-01-01') TO ('2018-01-01')
    SERVER foreign_server;

Then insert data into the table in shard01 as below.

db01=#INSERT INTO temperatures (at, city, mintemp, maxtemp)  VALUES 
('2016-08-10', 'London', 63, 73),
('2017-06-12', 'London', 59, 70),
('2016-04-03', 'London', 63, 70),
('2017-03-20', 'London', 64, 75),
('2016-09-11', 'London', 54, 60),
('2016-05-09', 'London', 53, 65),
('2017-11-21', 'London', 61, 70),
('2016-12-15', 'London', 54, 66);

db02=# select * from temperatures;
     at     |  city  | mintemp | maxtemp
------------+--------+---------+---------
 2016-08-10 | London |      63 |      73
 2016-04-03 | London |      63 |      70
 2016-09-11 | London |      54 |      60
 2016-05-09 | London |      53 |      65
 2016-12-15 | London |      54 |      66
 2017-06-12 | London |      59 |      70
 2017-03-20 | London |      64 |      75
 2017-11-21 | London |      61 |      70
(8 rows)

On shard02, You can check whether data inserted in shard02 as below

[root@shard02 ~]#su -postgres
-bash-4.2$ psql -h shard01 -U dbusr01 -d db01
db01=> select * from temperatures_2016;
     at     |  city  | mintemp | maxtemp
------------+--------+---------+---------
 2016-08-10 | London |      63 |      73
 2016-04-03 | London |      63 |      70
 2016-09-11 | London |      54 |      60
 2016-05-09 | London |      53 |      65
 2016-12-15 | London |      54 |      66
(5 rows)

db01=> select * from temperatures_2017;
     at     |  city  | mintemp | maxtemp
------------+--------+---------+---------
 2017-06-12 | London |      59 |      70
 2017-03-20 | London |      64 |      75
 2017-11-21 | London |      61 |      70
(3 rows)

No comments:

Post a Comment

Feature Recently

Running Wildfly Application Server in Domain Mode

  Wildfly application server provides two modes of how to run application one wildfly application server. It is very simple if you run your ...

Most Views