What is Data Replication?

Data replication is the process of creating a partial or complete copy of a database in a separate location. Replication is a key element of a distributed database strategy.

Replication can be done to improve performance, to make data available for other users, or for swift disaster recovery.

What are the Types of Data Replication?

There are two main types of data replication:

  • Full replication: An entire relational database is replicated in another location. All relationships and dependencies are reserved. This version will be an accurate representation of the original database.
  • Partial replication: Selected tables from the database are replicated elsewhere. Often, these are the tables that are most frequently accessed by the relevant users. Fragmentation may occur, as certain values might not be properly synced between versions  

How is Data Replication Performed?

Initially, a full or partial copy of a database is created in another location.

After that, the challenge is to ensure that both instances of the database remain in agreement with each other. There are multiple syncing methods, such as those supported in Microsoft SQL Server.

In data replication, the original database is known as the Publisher, while the replicated database is called the Subscriber.

Transactional Replication

Each transaction on the Publisher is replicated in sequence on the Subscriber. This offers a near-real-time updating of the replicated database and allows users on the Subscriber system to observe changes to the Publisher data. Changes made on the Subscriber side can be preserved.

Snapshot Replication

The system takes a snapshot of the data from the Publisher and overwrites the data on the Subscriber. The Subscriber always holds a mirror image of what is contained on the Publisher. This approach is best suited to situations where Publisher data doesn’t change often, or where the Subscriber has read-only permissions.

Merge Replication

This approach allows for bidirectional communication between both instances. An update by either the Publisher or Subscriber will trigger an automated process that sends the updated value to the other process. Updates from both sides are treated with equal importance.

Challenges with Data Replication

Data replication can present some issues, such as:

  • Consistency: multiple instances of the same database may result in inconsistencies, especially if updates fail.
  • Latency: Syncing data involves some processing overheads, which can cause some lag. Values on the Subscriber database may take some time to catch up with the Publisher. 
  • Storage: Replicated databases need to be housed somewhere. For larger databases, this may present an issue.

These issues can become problematic when dealing with very large databases, especially if some degree of data transformation is required.

In this instance, many organizations will use ETL (Extract, Transform, Load) instead of data replication. ETL allows data to be replicated in another location, such as a data warehouse, with low overheads. Data can be transformed during the transfer if required.

Share This Article
facebook linkedin twitter

Glossary of Terms

A guide to the nomenclature of data integration technology.