I’ve found a number of articles on the web for setting up streaming replication in PostgreSQL, but none of them seemed to actually put everything together that I needed, and none of them seemed to use the new pg_basebackup in PostgreSQL 9.1 and later. So with that in mind, here are a set of steps you can use to set up streaming replication, over the internet if you wish, using an encrypted SSL connection. We’re not going to set up log archiving – we’re going to rely solely on the streaming replication for now.
I’m assuming you have a master server set up on Ubuntu 10.04 or 12.04, running PostgreSQL 9.2.x, and you have a new slave server set up on the same OS and pg version. The IP of the master is 1.2.3.4, and the IP of the slave is 5.6.7.8.
First, create the replication user on the master:
Note that we are using REPLICATION permissions, rather than creating a superuser.
Next, configure the master for streaming replication. Edit postgresql.conf (on Ubuntu, this is at /etc/postgresql/9.2/main/postgresql.conf):
We’re configuring 8 WAL segments here; each is 16MB. If you expect your database to have more than 128MB of changes in the time it will take to make a copy of it across the network to your slave, or in the time you expect your slave to be down for maintenance or something, then consider increasing those values.
Then edit the access control on the master to allow the connection from the slave; in pg_hba.conf (/etc/postgresql/9.2/main/pg_hba.conf on Ubuntu):
In this case, 5.6.7.8 is the IP address of the slave that will be connecting for replication, and hostssl means this host can only connect via SSL.
You’ll need to restart the master after making the above changes.
Now on to the slave. In the slave’s postgresql.conf, add the following:
Then restart the slave. No changes are required in the slave’s pg_hba.conf specifically to support replication. You’ll still need to make whatever entries you need in order to connect to it from your application and run queries, if you wish.
That’s all the initial setup we need to do. After you’ve done the above configuration, running the following script on the slave will copy the database over and begin replication (1.2.3.4 is the IP of the master):
That script will stop the slave, delete the old slave cluster directory, run pg_basebackup connecting to the master to copy over the databases (you’ll see the progress as it goes), create a new recovery.conf file, and start the slave. If you look at the logs after this script completes, you should hopefully see messages about it having reached a consistent recovery state.
Be careful – that script is going to delete the old database cluster on your slave, so make sure to read through it and understand what it’s doing.
At this point, you can try writing data to the master database, and you should see it mirrored over to the slave. To check the replication status, you can run the following on the master:
If you found this guide handy, you might also find the PostgreSQL 9 High Availability Cookbook useful as well!
This howto works great, also for version 9.1.
It worked great with 9.1 for me too :-) when I fixed a 9.1 streaming replication setup that had broken.
Basically there were just two thing to do: the pg_basebackup command, and then copying back my old recovery.conf.
Thanks for this article, very useful. Is creating the trigger file, then reverting back to master a simple task?
Well, creating the trigger file is easy enough. Reverting back to master, I believe, would require generally the same steps in reverse – set up the old master as a slave, synchronize, and fail over.
Very helpfull document. Thanks
Nice tutorial, i want try this postgresql streaming replication rather than using pg_cluster
I’m afraid your github gists don’t load…
Yeah…I’m not sure why it was doing that. I just switched to a different plugin for displaying gists – hopefully this one will work better.
This also works on CentOS, with the minor difference that the “main” directory is named “data”. Mind that “pg_basebackup” will overwrite postgresql.conf on CentOS, so the changes you made prior to “pg_basebackup” will be lost, unless you backup your slave’s posgresql.conf and restore it afterwards. Other that than, everything works, and thanks for all your effort in writing this!
Any chance of a ‘how to fail over’ step being added?
With the configuration in the article, at least on Ubuntu, fail over is pretty trivial:
touch /tmp/postgresql.trigger
is all it takes. That will terminate the replication, configure the slave to be writable, and will move your recovery.conf to recovery.done – at that point you should be good to go.
Ill second that … how to fail over in 10 minutes would be very helpful.
This page was helpful in figuring out failover: http://dba.stackexchange.com/questions/9587/automating-failover-in-postgresql-9-1
when i access slave db server, it says “the database system is starting up”. but replication is work. how can i check data in slave db server?
=>psql -U postgres
psql: FATAL: the database system is starting up
I have the same problem. Does anyone have a solution for starting up the database fully?
same problem. any resolution on this? did either of your systems eventually come up?
Daniel & arunnitt – I’m not sure if you were able to resolve your issue, but I had the same problem. I solved this by moving my postgresql.conf and pg_hba.conf files out of the directory which gets removed in the step “Cleaning up old cluster directory”, and then moving them back before starting the server. Otherwise your slave will startup with whatever configurations are in your master’s data directory. In my case, the configurations in my master’s data directory were not actually even being used and were incorrect (i didn’t set up the master myself).
Hello! This is occurring because the behavior of database after executing the given script makes the slave machine essentially just a back up. It is still getting the streamed XLOGs, but you can’t access it from a client in this state. If you want to instead use the slave (in a read-only manner) you need to change the hot_standby option in your postgresql.conf to on. You should be able to psql into it then.
The answer is that, in postgresql.conf, it needs to be wal_level = ‘hot_standby’ and hot_standby = ‘on’. They must be single-quoted. It took me far too long to discover this, as most blog posts on the subject do not quote their config values.
John, that must be specific to a specific OS or postgres version; I can say with absolute certainty that quotes are not required on Ubuntu 12.04 with PostgreSQL 9.3.4.
Thanks John for your comment. I modified hot_standby = ‘on’ and could connect to standby.
I just solved it by editing postgresql.conf file
max_connections = 100 should be similer on slave too. :)
Very useful tutorial, thanks! It works on Postgresql 9.3 x64.
One thing – it might be worth mentioning that you need to switch on SSL if you use the SSL connection (on Windows that’s not automatic…).
SSL is not on by default on RedHat Linux either!
Brilliant, worked as a charm. Thanks very much
My OS is Centos 5.7 and postgres-9.3. After running the backup script upon making changes related to Centos mentioned in this forum, the postgres on Slave fails to start with the following error.
>LOG: invalid checkpoint record
FATAL: could not locate required checkpoint record
HINT: If you are not restoring from a backup, try removing the file “/var/lib/pgsql/9.3/data/backup_label”.
LOG: startup process (PID 1646) exited with exit code 1
LOG: aborting startup due to startup process failure
Can you anyone please advise how to fix this error.
This happens because your backup took long enough that you’ve missed a few wal files before it finished. For example, when your backup starts the master is on WAL file A. The slave gets a backup_label that tells him to recover from point “A” in the WAL history. By the time the backup finishes, though, the master has moved on to WAL file E. The slave wants to begin recovery at point A, but when streams from the master he can only find WAL “E”.
One solution is to use “-X stream” or “-X fetch” in your pg_basebackup command. You may also need to increase the wal_keep_segments setting on your master to ensure enough segments are kept around to span the amount of time it takes to perform the backup.
Hi!
I tried this, but on the script’s step (pg_basebackup) I had to modify the postgres.conf because I had an error, changing archive_mode=on, then the script asked for an archive_command, so on the same postgresql.conf I put this: archive_command = ‘cp %p /opt/pgsql91/archive/%f’ but it launched an error: ‘Verify that your archive_command is running correctly’, what am I doing wrong? I think that my archive_command is wrong, also if I skip the archive_command the error is the same.
Has the folder of your archive owner and group labled as ‘postgres’? If not the archive_command can’t write on folders that not contain owner/group ‘postgres’.
how can we give write access to slave in this case?
Hi, This works in Postgresql 8.4.x ?
I’m wondering about
trigger_file = ‘/tmp/postgresql.trigger’
All files in /tmp get deleted on restart, so if the slave get rebooted for any reason it will come back up thinking it’s a master. Am I interpreting the documentation incorrectly?
Absence of the trigger file doesn’t cause any action. The trigger file is supposed to be absent on the slave until you need to do a failover. At which point you create the trigger file on the slave (or some monitoring software creates the trigger file), then the slave becomes the new master. Once it becomes the master it stops caring about the trigger file, so it doesn’t matter if the trigger file disappears at some later time after the promotion.
please show me how to do replication streaming on windows using postgresq 9.3 follow the step.thx
Great Work !! I used the same steps with PG 9.3 on CentOS 6.3, it worked perfectly. I just changed PG data path and used Postgres user instead of replicator. Thanks a lot for sharing your knowledge.
Works for me to
More thanks
Works on Ubuntu 12.10 and PostgreSQL 9.3. Great tutorial
The commands are missing..
Please, can you restore them.
Thanks
Oops! Sorry about that – should be fixed now.
This was of great help. Thanks!! Small changes that I had to do was switching SSL on in master, taking a copy of .conf files on slave before removing cluster and replacing it later, and renaming folder name ‘main’ to ‘data’ as I am using CentOS.
Just wanted to say thank you for this guide, much more succinct than others. First time setting up replication with Postgres, so it’s appreciated.
Slave server not able to start. In the logfile showing
startup process (PID 3825) exited with exit code 1
LOG: aborting startup due to startup process failure
LOG: database system was shut down at 2014-05-22 19:41:28 IST
LOG: entering standby mode
WARNING: WAL was generated with wal_level=minimal, data may be missing
HINT: This happens if you temporarily set wal_level=minimal without taking a new base backup.
FATAL: hot standby is not possible because wal_level was not set to “hot_standby” on the master server
HINT: Either set wal_level to “hot_standby” on the master, or turn off hot_standby here.
Please help me in this regard.
Hi Mobina – Were you able to get this resolved? I am stuck with the same problem. My master configuration has wal_archive = ‘hot_standby’. Even then, the slave fails to start up with the exact error you mentioned in your post.
Trying to get Postgresql 9.3 replication going on Mac OS X 10.9. After making the changes you outline in your article, when starting Postgresql I receive the error message:
FATAL: could not load server certificate file “server.crt”: No such file or directory
How does one generate a server certificate file, and where does it get installed?
/var/lib/postgresql/9.x/main/server.crt and /var/lib/postgresql/9.x/main/server.key are symlinks to /etc/ssl/certs/ssl-cert-snakeoil.pem and /etc/ssl/private/ssl-cert-snakeoil.key you can create them and restart server.
I changed the end of start_replication.sh to this: (It creates a self signed cert, thanks to shahjapan http://ubuntuforums.org/showthread.php?t=735020 and the postgre docs he links to. Depending on your setup you may want to get your key countersigned by a proper authority.)
echo Setting up the ssl cert:
pushd /var/lib/postgresql/9.1/main/
openssl req -new -text -out server.req
openssl rsa -in privkey.pem -out server.key
rm privkey.pem
openssl req -x509 -in server.req -text -key server.key -out server.crt
chmod og-rwx server.key
chown postgres:postgres server.*
chmod 600 server.*
popd
echo Starting PostgreSQL
sudo service postgresql start
Was able to make the SSL cert. That works. But Postgres complains:
LOG: invalid IP mask “md5”: nodename nor servname provided, or not known
CONTEXT: line 114 of configuration file “/opt/local/var/db/postgresql93/defaultdb/pg_hba.conf”
FATAL: could not load pg_hba.conf
Line 114 of the pg_hba.conf file is this:
hostssl replication replicator xxx.xxx.xxx.xxx md5
…where the xxx’s represent the master’s IP address. Don’t know what it’s not liking…
Got everything working, yeah! But I can’t access the database, boo!
$ psql
psql: FATAL: the database system is starting up
FATAL: the database system is starting up
Okay, so the log file says this:
2014-05-22 13:55:04.349 PDT [ ] LOG: database system was shut down in recovery at 2014-05-22 13:53:58 PDT
2014-05-22 13:55:04.366 PDT [ ] LOG: entering standby mode
2014-05-22 13:55:04.449 PDT [ ] LOG: redo starts at 0/70028B0
2014-05-22 13:55:04.449 PDT [ ] LOG: consistent recovery state reached at 0/7002950
2014-05-22 13:55:04.449 PDT [ ] LOG: record with zero length at 0/7002950
2014-05-22 13:55:04.851 PDT [ ] LOG: started streaming WAL from primary at 0/7000000 on timeline 1
2014-05-22 13:57:48.622 PDT [alex ::1(49156)] FATAL: the database system is starting up
2014-05-22 13:57:48.623 PDT [alex ::1(49157)] FATAL: the database system is starting up
…and ps shows this:
110 ?? S 0:00.03 /opt/local/lib/postgresql93/bin/postgres -D /opt/local/var/db/postgresql93/defaultdb
134 ?? Ss 0:00.00 postgres: logger process
145 ?? Ss 0:00.01 postgres: startup process recovering 000000010000000000000007
146 ?? Ss 0:00.00 postgres: checkpointer process
147 ?? Ss 0:00.02 postgres: writer process
148 ?? Ss 0:00.13 postgres: wal receiver process streaming 0/70029F0
So what could the problem be?
Works very well on Debian 7 (master and slave)! Thx.
Just one detail: In Debian you don’t have sudo by default, so you need run the commands in the script one by one (I recommend use su -l postgres, for example, to use one terminal).
Pingback: Screen6 – Postgresql on AWS with puppet: High Availability
Working great on 9.3 also
Just to throw my 2 cents – may help someone:
1. My setup was with Ubuntu 14.04 and postgresql-9.3
2. DO NOT make my mistake – I have initially created the recovery.conf in /etc/postgresql/… – you need to create the file in /var/lib/postgresql/…
3. DO NOT start the SLAVE without having a proper recovery.conf in place – the replication breaks if you add the file later and re-start the database on SLAVE
HTH,
Georgi
Thank you so much for your 3rd tip.
After some flawless tests using docker containers, I tried to setup a slave on a development machine and by mistake put the recovery.conf in /var/lib/postgresql instead of ./main and of course getting the infamous “record with zero length at xxxx”.
Thanks for writing this nice tutorial.
Can you guide what configuration would be required to achive cascading replication? Can we transform this master-slave setup to master-slave-slave setup?
Congratulations! after many hours searching for this info, even in the Postgresql site, just in 5 minutes my SL 6.4 Postgresql 9.2.10 is replicating in streaming mode!
Thank you very much!
Great article. Succinct and accurate.
In 9.4.1, I had to include the CIDR Mask Length in the “replicator” line in pg_hba.conf on the Master, or Postgres wouldn’t launch:
5.6.7.8/32
Not sure if that’s true of all PG versions. Can only speak for 9.4.1.
Huge thanks gregr for the article, and iq9 for this little helpful addition
Worked amazingly well. Thank you for the attention to detail!!
Watch out on Linux distributions that put everything under a “data” directory, including your postgresql.conf, as the first action in the shell script is to delete the equivalent data directory. Found this out the hard way :-/
Pingback: PostgreSQL database replication
Hi Greg! I have tried all the steps mentioned above, however while running the shell script on my slave, I’m getting the below errors:
pg_basebackup: could not connect to server: could not connect to server: Connection refused
Is the server running on host “192.168.1.40” and accepting
TCP/IP connections on port 5432?
* Error: Port conflict: another instance is already running on /var/run/postgresql with port 5432
Please help me out here, as I’m quite new to PostgreSQL. I’m unsure how to proceed next.
Thanks in advance.
Great tutorial.
Is this solution working with Windows to Linux replication?
In your second code block you have
listen_address = # make sure we’re listening as appropriate
shouldn’t it be
listen_address = ‘localhost,1.2.3.4’ # make sure we’re listening as appropriate
Do you know if it’s possible to set up replication on Windows using only the UI?
the crontab affect the replication, why i make a replication and work good about 6 hour them loss conextion the, i make some uptades , i try to restar the slave and dont start the service but the master work fine
Hi,
I am trying to create a HA Pgsql server. Is there a way to automate the running of pg_basebackup command on the recovered server to copy latest data from running server.
Regards
Precise step by step tutorial. Works like a charm Ubuntu 14.04, Postgresql 9.5. Thanks.
I’ve used this procedure several times on Ubuntu {12.04|14.04|16.04} with PostgreSQL {9.2|9.6}. Works great every time. Had a prod DB that I set up (Ubuntu 14.04/PostgreSQL 9.2) when it stopped replicating (because the devs did something after I had it running perfectly, they didn’t bother telling me). I ran the script to re-set up the slave and its been fine since. Took all of 5 seconds to run the scrip & watch it sync (tiny, but important DB), then another ~half minute to log into the master and check repl status, and finally a couple of minutes to bask in success. I really should take a few more minutes to lock the devs out…
Coming up on four years later, this is still as valid today with PgSQL 9.6 (and IPv6) as it was in early 2013.
Thank you for taking the time to put this together.
Awesome. However, I don’t understand why you update postgresql.conf on slave in slave_postgresql.conf, doesn’t this file would be removed by “sudo -u postgres rm -rf /var/lib/postgresql/9.2/main” in start_replication.sh later? It seems that updating postgresql.conf on slave before pg_basebackup is meaningless. Should we update postgresql.conf in start_replication.sh, maybe just before creating recovery.conf ?
I have write a short script to set up postgresql streaming replication basing on your blog:
https://github.com/ictlyh/Utils/blob/master/shell/setup_pg_stream_replicate.sh
The conf file is in /etc/…, so it’s not removed by the rm command you mention.
Great Article !!
I’ve learned a lot from it and also successfully set up production replication schemes.
Someone please help! I have followed all the steps but in the end I am unable to psql into the slave and check whether data is replicating from the master. I get error as follows:
psql FATAL: the database system is starting up
Please, does anyone have a solution to this? I have tried changing the postgresql.conf file on slave to hot_standby=’on’ and wal_level=’hot_standby’ but it still wont work.
I am facing the same situation . did you find any solution to it ?. If so please share it .
Hi guys,
I try replication with postgresql 9.5 on Red Hat 7.2. when I try this part
sudo -u postgres pg_basebackup -h 1.2.3.4 -D /var/lib/postgresql/9.2/main -U replicator -v -P
(of course changed for redhat and version for RHEL)
there is permission error although it has permission and all configurations are ok.
Besides this I google a lot about main path and don’t understand where it is in my postgresql installed host.
in a lot of source the path is in that format /var/lib/postgresql/9.x/main
when I installed postresql 9.5 on my host I didn’t install postgresql packages when installing RHEL. I install postgresql after RHEL installation to specific directory (from source not with yum command) as /usr/local/pgsql/
As a result postgresql installed successfully and database works.
I don’t understand clearly …/main parts equal to what in my configuration path?
I guess it is equal to /usr/local/pgsql/data in my configuration. Am I right???
The error is like that when try to use pg_basebackup command
[postgres@slave ~]$ pg_basebackup -h 192.168.1.108 -D /usr/local/pgsql/data -U replica -v -P
pg_basebackup: could not access directory “/usr/local/pgsql/data”: Permission denied
The IP here is master server IP.
And I mv ../data/ to backup location on standby (slave) before replication
Please HELP ME?? It is urgent
Suggestion: replace the `rm -rf` command with a rename (e.g. `mv main main-backup`). I accidentally deleted the wrong cluster (master server rather than replica server).
FATAL: incorrect checksum in control file.
You may see this, as I did, if you try to replicate from a 32 bit to a 64 bit machine. You can’t.
https://www.postgresql.org/message-id/13614.1094787412%40sss.pgh.pa.us
Also be careful if the slave is CentOS all the paths are different and the config files are not in /etc/postgres they are in the data directory which you will delete when you convert this line:
sudo -u postgres rm -rf /var/lib/postgresql/9.2/main
to
sudo -u postgres rm -rf /var/lib/pgsql/9.3/data
I had to rebuild postgresql.conf and pg_hba.conf and the certs:
https://www.howtoforge.com/postgresql-ssl-certificates
Anyway its an excellent tutorial I it worked for me when I used it previously between two debian servers, but not under today’s circumstances.
THANKS A LOT
Version 9.6 postgres requires /32 in pg_hba.conf (or appropriate mask), checkpoint_segments is deprecated and will cause a crash on startup, and wal_level should be ‘replicate’ instead of hot_standby.
Besides those things, this guide worked fine on Debian 9.
Thank you for your put-together. It is very helpful compare to the postgresql website