Scheduling Package Execution

 

You spoke, we listened! One of Integrate.io ETL's most requested features, Cron Expressions, is now available on all Integrate.io ETL accounts. This allow much more flexibility in your job scheduling by supporting irregular intervals. Here are some examples:

0 8 * * *     everyday at 8am UTC

0 8 * * MON   every Monday at 8am UTC

0 8 1 * *     every 1st of the month at 8am UTC

Use the scheduler to execute packages periodically starting at a specified date and time. The packages will be executed as scheduled, using an existing cluster that fits the scheduled cluster size or if one doesn't exist, a cluster will be provisioned automatically with the number of specified nodes. By default, the cluster is taken down as soon as package execution is completed.

Your schedule list can be viewed in the schedules list. Any schedule can be disabled or enabled from within the list.

How to create a new schedule

  1. Click Schedules on the side menu.
  2. Click New schedule to open the new schedule dialog.

Configure Schedule

      1. Enter a Name for the schedule and optionally a Description.

      2. Choose a schedule type

        • Repeat every - Used for fixed amount of intervals between schedules (minutes / hours / days / weeks / months).

        • Cron expression - Used for setting up irregular intervals between schedules (i.e: every weekday at 8 AM).

          You can click the time icon for presets. If any of the presets doesn't suit your use case, you can generate your own cron expression here.

          If the schedule repeats every hour, subsequent executions will start at the specified minute after the hour. Note that the time is in UTC.

          Note: only standard expressions are supported, special characters like # and ? ,are not supported.

      3. By default, schedules will not execute a job if previous jobs executed by the same schedules are running. Check Allow concurrent schedule executions if you want the schedule to execute jobs regardless of previous jobs status.

Configure Schedule Cluster

The schedule can create a new cluster to run jobs or use existing clusters.

      1. Move the Cluster Size slider to the number of cluster nodes to use for package execution.

      2. By default, the cluster will terminate after 1 minute of inactivity. If the package execution and schedule recurrence is lower than 1 hour, we recommend to turn automatic cluster termination off so the cluster can be reused.

      3. Set Re-use strategy:

        • Any cluster created by this schedule - use a cluster created by this schedule if one is available. Otherwise, create a cluster.

        • Any similar cluster (default) - use any existing cluster so long as it's at least as big as you set in the schedule's cluster size. Otherwise, create a cluster.

        • Any similar cluster with the same node count - use any existing cluster with the same node count as you set in the schedule's cluster size. Otherwise, create a cluster.

        • Cluster with the least number of jobs running - use a cluster with the least number of jobs running and node size at least as big as the schedule's cluster size.

          Note - when scheduling a few schedules at the same time, please set them a few minutes apart so that the job count per cluster will be more accurate to avoid race condition
        • Never - create a new cluster every time the schedule is running.

Select packages to run

      1. Click add package to add at least one package to execute.

      2. Choose a package from the list and click set variables.

      3. Set the value for any user variables or system variables and click Save. The variable values for a package in the schedule override the package defaults.

        Note: Variable values are expressions that are useful to calculate relative datetime values which can be very useful in your scheduled jobs.

        For example:

        ToDate(ToString(SubtractDuration(CurrentTime(),'P1D'),'yyyy-MM-dd')) - returns a datetime value of Yesterday midnight

        ToString(SubtractDuration(CurrentTime(),'P1D'),'yyyy/MM/dd') - returns a string in the form of yyyy/mm/dd to use in a path for yesterday's data.

        AddDuration(ToDate('2000-01-01'),REPLACE(' PnM','n',(chararray)MonthsBetween(CurrentTime(),ToDate('2000-01-01')))) - returns a datetime value of the first day of the month

      4. Optionally add additional packages.

      5. Change Status to on to enable the schedule.

      6. Click Create schedule.


To view and maintain schedules

In the schedules list, you can see all of your schedules with execution information. You can enable, disable, edit, duplicate, and delete each of your schedules. You can also run a schedule one-off from the schedules list.