Sometime in 2019, Netflix cracked a conundrum that stumped them for years. The company had so much data about its content and subscribers, it had to sync multiple heterogeneous data stores like MySQL and Elasticsearch continuously, which brought seriously stressful challenges like dual writes and distributed transactions.
So Netflix created its own CDC tool that processes captured log events in sequence and takes dumps for specific tables and primary keys of tables.
Problem sorted. Case closed.
Netflix's flashy new tool utilizes log-based change data capture — one of two primary CDC methods that enables more accurate data decisions in your data-driven organization. Netflix uses log-based CDC to feed database transactions to a centralized destination, presumably a data warehouse, and then analytic applications. This process lets the company, among lots of other things, analyze how many people watch "Bridgerton" or "The Crown" without syncing loads of disparate data stores.
Life suddenly got simpler for them.
CDC is an absolute game-changer for Netflix, and enterprises like yours can use it too. But the problem is deciding when to use CDC in your organization. What scenario should you use it for? And how? And why?
Integrate.io clears up the confusion below.
Table of Contents
- CDC Minimizes Disruptions to Workloads (What Netflix Uses It for)
- CDC Makes Batch ETL Better
- CDC Prevents a Government Fine
- How Integrate.io Can Help
Read this! What is Change Data Capture?
CDC Minimize Disruptions to Workloads (What Netflix Uses It for)
For years, Netflix had an itch it couldn't scratch. Continuously syncing multiple unrelated data stores was a costly and time-exhausting process that tested its data engineers to the absolute limit. The company, one of the most technologically advanced on the planet, was still dealing with issues like dual writes and distributed transactions.
Then CDC came along to save the day. (Kind of like a real-life Lucifer Morningstar.) Netflix's log-based CDC tool now processes all of its captured log events in order and takes dumps, in chunks, across all tables. The tool supports any output in Netflix's complex data labyrinth — stream, database, API; you name it. Now the company moves data from disparate sources to a centralized destination and then onto analytics apps for data deep-dives.
If you have the same kind of problems Netflix had, use CDC to minimize workload interruptions and reduce dual writes, disrupted transactions, and all that nasty stuff that makes you want to tear your hair out.
Read this! When to Use Change Data Capture
CDC Makes Batch ETL Better
Think of all those data stores in your organization running somewhere in the background right now. Columnar databases, operational data stores, SaaS apps, ERPs, CRMs — these repositories contain hundreds of thousands, perhaps millions, of data records. The problem is these records change little between two iterations of batch ETL. That's because it would be incredibly expensive to indiscriminately ingest a data store each time you extracted, transformed, and loaded data.
So that's where CDC comes in.
Log-based CDC (what Netflix used) and another method called trigger-based change data capture detect changes made to source tables and databases almost always in real-time. These processes identify any modified rows and entries in data records as those records move from a data store to a new location like a warehouse. All of that makes it so much easier to shift data from disparate sources to a central destination and then to an analytics app like a business intelligence (BI) tool.
CDC streamlines batch CDL, pure and simple. If you want to track changes to source tables and databases, log-based and trigger-based CDC can help you do just that.
CDC Prevents a Government Fine
"Data governance" is a phrase you can no longer ignore. As governments worldwide crackdown on organizations that collect, process, and share data illegitimately, monetary penalties will continue to increase. Take GDPR, for example, which stipulates million-dollar fines that impact all companies that don't comply with its framework. Or HIPAA, which lets the feds fine healthcare organizations for data violations. Or CCPA, which allows the State of California to do the same thing.
CDC (both log-based and trigger-based) could be one of the most effective processes for data governance because it lets you decide which data records to sync before moving those records from a source to a destination like a warehouse. That means you have control over the information that flows through pipelines and can prevent data violations that land you in trouble with the federal or state authorities.
If you don't want a brown letter from the Department of Health and Human Services or the State of California Department of Justice, consider investing in a CDC solution.
How Integrate.io Can Help
You might not have as much big data as Netflix, but CDC could be the best investment you make all year. That's because you can improve your data architecture by capturing changes made in databases and ensuring those changes replicate in a warehouse or data lake.
Of course, this process doesn't magically happen by itself. You need an ETL platform like Integrate.io that executes both log-based and trigger-based CDC. The result? You can ingest the data you require time and time again.
Integrate.io is your all-in-one ETL hub and the only tool you need for CDC. Its low-code/no-code UI means you don't have to program CDC or call in professional data engineers that cost thousands of dollars. Use this software to move data from one location to another (and then another!) without loads of complicated code.
Other Integrate.io features include:
- Super simple in-pipeline data transformations.
- 100+ pre-built out-of-the-box data connectors and integrations. No coding is required!
- Incredible Salesforce-to-Salesforce integration that lets you move data to Salesforce and then back again without breaking a sweat.
- Drag-and-drop interface.
- Streamlined workflow creation.
- Pay-as-you-go pricing model. Only pay for the data connectors you use, not for data volume.
- REST API.
- World-class customer service that will make you smile.
Check out more Integrate.io features.
With Integrate.io, CDC is as easy as ABC. Contact Integrate.io to arrange a 7-day demo or pilot now.