How Do I Read data from file storage incrementally

This article will show you how to design a data flow that reads data incrementally from a file storage, so that only new (or changed) files are read every time the process is executing.

What you’ll need

File storage bucket with a directory that contains files to read incrementally, with read-only permissions for Integrate.io ETL at the minimum.
File storage bucket with read-write permissions for Integrate.io ETL to write its meta data.

How to incrementally read data from file storage

Start with adding a File Storage source component to your package and click the new component to edit its properties.
Fill in the bucket and path for the source data (in this example, the bucket is integrate.io ETL.public and the path is /twitter)
Change the source action to “Process only new files (Incremental load)” , select the manifest connection and add a files manifest path. In the path, use a bucket with read-write permission for Integrate.io ETL. In our example it’s integrate.io ETL.dumpster/manifests/twitter_reader.gz.
That’s essentially it! This tells Integrate.io ETL to list the files in the source path, compare the list to the manifest file and read only the new or changed files. If the manifest file doesn’t exist, Integrate.io ETL will read all files in the source path - that’s what happens when you execute the package for the first time, or if you delete the manifest file. Once the package executes successfully, the files read by the package are added to the manifest file.

Note that your path can contain a pattern or a variable and incremental reading would still work. However, files that are not found in the source path are removed from the manifest. This can be a good thing if it allows you to maintain a smaller manifest file, but if you intend to add paths you previously read from to the source component, these files will be read again.

Finally, complete your data flow and execute it.

ETL & Reverse ETL
Knowledge Base

ETL & Reverse ETL Knowledge base

Getting started

5 Articles

How Do I ...

12 Articles

Connectivity And Security

48 Articles

Creating packages

55 Articles

Using clusters

4 Articles

Running and monitoring jobs

8 Articles

Configuring your Integrate.io ETL environment

13 Articles

Programming and API

5 Articles

Other

189 Articles

New Releases

18 Articles

How Do I Read data from file storage incrementally

What you’ll need

How to incrementally read data from file storage

Solutions

Support

Company

Language

ETL & Reverse ETL Knowledge Base

ETL & Reverse ETL Knowledge base

Getting started

5 Articles

How Do I ...

12 Articles

Connectivity And Security

48 Articles

Creating packages

55 Articles

Using clusters

4 Articles

Running and monitoring jobs

8 Articles

Configuring your Integrate.io ETL environment

13 Articles

Programming and API

5 Articles

Other

189 Articles

New Releases

18 Articles

How Do I Read data from file storage incrementally

What you’ll need

How to incrementally read data from file storage

See Also

Solutions

Support

Company

Language

ETL & Reverse ETL
Knowledge Base