ETL: Creating packages with MCP tools

The Integrate.io MCP server exposes a set of package-authoring tools that AI clients can call to build a working ETL package end to end. Supported clients include Claude Desktop, Claude Code, Cursor, and the MCP Inspector. The agent discovers what the catalog supports, composes a dataflow JSON, creates the package, fixes wiring errors in place, and runs it. No web UI required. These tools complement the read and run tools described in the MCP Server overview. Configure your client and mint a token there first.

When to use these tools

You want an AI assistant to build a new source-to-destination package in a single tool call from a natural-language prompt.
You want the agent to clone an existing pipeline as a template, then swap in different connections, tables, or paths.
You want the agent to add, edit, or remove components on an existing pipeline without opening the package designer.
You want the agent to rename a package or define its public variables in place.
You want the agent to fix a broken edge or component without opening the package designer.
You want the agent to detect the columns of a CSV or other delimited file on an SFTP, S3, or GCS connection before wiring it into a flow.
You want the agent to archive packages that failed an authoring attempt so they don’t clutter the active list.

If you only need read or run access, stick to the inspection and execution tools on the overview page.

Tool reference

build_pipeline (mutation)

Builds a complete source → destination pipeline in one call from a high-level intent. The server discovers the source schema, picks the matching component types for each connection, wires source → select → destination with edges, maps fields 1:1, and persists the package. The agent only passes intent (which connections, which table or file path, which fields, any computed columns), so the build turn stays fast even on wide schemas. Prefer build_pipeline over create_package for a straightforward “load X into Y” build. Use create_package or add_package_components when the shape isn’t covered here: filters, joins, multiple sources, a file or cloud-storage destination, or edits to an existing package.

Argument	Type	Required	Notes
`name`	string	Yes	Package name shown in the dashboard.
`source`	object	Yes	Source intent. See below.
`destination`	object	Yes	Destination intent. Must be a database or warehouse connection (for example Snowflake, Redshift, BigQuery, MySQL, Postgres). File and cloud-storage destinations are not yet supported here — use `create_package` for those.
`fields`	`"all"` or array	No	`"all"` (default) carries every discovered source column. An array of source column names keeps only that subset.
`computed`	array	No	Derived columns to append. Each entry: `{ "name": "<output column>", "expression": "<Pig expression>" }`. A computed `name` equal to a source column overrides that column.
`flow_type`	string	No	`dataflow` (default) or `workflow`.
`workspace_id`	integer	No	Must belong to the calling account. Omit to create the package as Unassigned.
`description`	string	No	Free-form, up to 4096 characters.

source accepts:

connection_id (integer, required).
path (string) for file or cloud-storage sources (SFTP, S3, GCS, SharePoint).
table_name (string) for database or warehouse sources.
schema_name (string) — optional, database sources only.
delimiter (string) — file field delimiter, default ,.
header_row (boolean) — whether the file has a header row, default true.
source_path_field_alias (string) — optional, file and cloud-storage sources only. When set, adds an output column with this exact name carrying the source file path each row was read from. Use this when the agent wants the filepath as a column for lineage. Never hardcode the literal path as a field value.

destination accepts:

connection_id (integer, required).
table_name (string, required).
schema_name (string) — optional.
operation_type (string) — write mode (for example append, truncate_and_insert, merge). Defaults to append. Validated against the destination’s allowed write_modes from describe_component_type.
create_table (boolean) — auto-create the table, default true.
merge_keys (array of string) — required for merge and upsert modes (merge, merge_update_and_insert, insert_or_update). Each name flags the matching destination column with is_merge_key: true. The platform never infers the key, so a merge without merge_keys will not upsert correctly. Append, overwrite, and truncate modes ignore this field.

Field names from the source are carried through exactly as discovered — no renames, recasing, or added prefixes. The tool returns the new package_id and also runs validate_package automatically so the caller knows whether the build is clean on the first response.

Example: SFTP to Snowflake with a merge key and filepath column

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "build_pipeline",
    "arguments": {
      "name": "Daily orders load",
      "source": {
        "connection_id": 47376,
        "path": "/incoming/orders/*.csv",
        "source_path_field_alias": "source_file"
      },
      "destination": {
        "connection_id": 91002,
        "table_name": "orders",
        "schema_name": "public",
        "operation_type": "merge",
        "merge_keys": ["order_id"]
      },
      "computed": [
        { "name": "loaded_at", "expression": "CurrentTime()" }
      ]
    }
  }
}

Error envelopes

Envelope	Cause
`source connection <id> not found in this account`	Source `connection_id` is unknown to this account.
`destination connection <id> not found in this account`	Destination `connection_id` is unknown to this account.
`build_pipeline currently supports database/warehouse destinations only; ...`	Destination is a file or cloud-storage connection. Use `create_package` instead.
`source.path is required for a file/cloud-storage source`	Missing `path` on a file source.
`source.table_name is required for a database source`	Missing `table_name` on a database source.
`no columns were discovered on the source — check the path/table and delimiter`	The schema importer returned no fields. Verify the path, table, or delimiter.
`operation_type '<mode>' is not valid for this destination; valid: ...`	Write mode isn’t supported by this destination. The envelope lists the valid options.

create_package (mutation)

Mints a new package from a full data_flow_json spec. Components and edges are persisted in a single transaction — partial packages are never written.

Argument	Type	Required	Notes
`name`	string	Yes	Package name shown in the dashboard.
`flow_type`	string	Yes	`dataflow` or `workflow`.
`components`	array	Yes	Array of single-key wrapper hashes (see below).
`edges`	array	Yes	Array of edge hashes referencing component `id`s.
`workspace_id`	integer	No	Must belong to the calling account. Omit to create the package as Unassigned.
`description`	string	No	Free-form, up to 4096 characters.
`flow_version`	string	No	Defaults to `2.0.0` (the modern package designer).

variables is not accepted. Packages that need package-level variables must be created through the REST API.

Component shape

Each component is a single-key wrapper hash. The key is the wrapper type (for example database_source_component), and the inner hash carries the configuration. Common fields:

id — stable identifier referenced by edges. Convention is component-<hex>. Omit to let the server generate one.
name — human-readable label shown on the canvas.
xy — [x, y] position pair.
alias — data-flow alias. Sources expose this; destinations consume it via input_alias.
connection — connection metadata block. Uses the generic key connection, not type-specific keys. Shape: { "id": <integer>, "name": "<display name>", "type": "<connection-type-slug>" }. Without name and type, the dashboard can’t render the connection chip. When update_package_components changes a *_connection_id, the server clears this block and re-embeds it from the new connection so the dashboard chip stays in sync with the connection the job will run.
<type>_connection_id — string form of the connection id (for example "cloud_storage_connection_id": "47376").
specificComponentType — vendor sub-type (for example sftp_source_component, mysql_source_component). Required when the wrapper covers multiple vendors so the dashboard renders the correct icon and form.
schema — nested object { "fields": [{ name, alias, data_type }, ...] }. Each field needs an alias.

Edge shape

Edges reference components by their id field:

{
  "id": "edge-abc123",
  "label": "edge-abc123",
  "source": "component-aaa111",
  "target": "component-bbb222",
  "source_index": 1,
  "order": 1
}

id and label are optional — the server generates them when absent.

Best practice: clone from an existing package

The safest way to compose a valid data_flow_json is to read a similar existing package first:

Call list_packages and find a pipeline with the same source and destination connection types.
Call get_package(<that_id>, include_full_graph: true) and inspect its data_flow_json.
Use the result as a template. Swap in your own connection ids and table or file paths. Keep the structural keys.

Inline secrets in the source package (REST api_key query params, Authorization headers, basic-auth passwords, and similar) come back as [REDACTED] in the full graph. Structure and non-secret fields are preserved, so the result is still a valid template — refer real credentials through a connection or a secret variable when you re-author the package. See Inline secret redaction in get_package. Building from describe_component_type output alone is error-prone — it surfaces attr_accessor property names but can’t expose nested sub-shapes (like schema.fields) or renderer conventions like specificComponentType. Pair it with a working template whenever possible.

Returns

On success: { package_id, package_version, name, flow_type, flow_version, workspace_id, created_at, warnings }. The warnings array lists shape fixups the server applied (for example rewriting a vendor-named wrapper to the canonical wrapper plus specificComponentType). On failure: { error: "..." } — workspace not in account, name too long, save failure, and so on.

list_component_types (read)

Enumerates every component registered in the platform catalog. Returns one entry per concrete subclass with:

name — internal name (for example mysql_source_component). Pass this to describe_component_type.
category — source, transformation, or destination.
class_name — Ruby class name, informational.
description — header docstring from the component’s source file.
wrapper_key — the outer key to use in data_flow_json. May be null for transformations or unmapped classes.
specific_component_type — value for the inner specificComponentType field. May be null when the vendor has its own top-level wrapper.

Optional category argument filters the result. Use this when the account has no similar package to clone from. For mature accounts, get_package of an existing pipeline is often enough.

describe_component_type (read)

Per-component introspection by internal name. Returns:

properties — writable attribute names extracted from attr_accessor declarations.
required — attribute names with presence validators.
description — header docstring.
example — fixture JSON example, or null if none exists.
wrapper_key, specific_component_type, valid_specific_types, wrapper_example — copy-paste-ready wrapper recipe.

Required argument: type (the name from list_component_types). Returns { error: "unknown component type: <type>" } for unknown names.

update_package_edges (mutation)

Replaces a package’s edges array wholesale. Pass the complete desired edges array — existing edges are overwritten.

Argument	Type	Required	Notes
`package_id`	integer	Yes	Package to update.
`edges`	array	Yes	Full edges array. Each edge needs at minimum `source` and `target` matching a component `id` in the current package.

The tool pre-validates every source and target against the package’s current components. If any edge references an unknown component, the entire call is rejected — no partial writes. Each write bumps the package version via PaperTrail and is reversible through the existing UI history. Use this when validate_package surfaces an edge-related error after create_package and you need to fix wiring in place. For component-internal edits (changing a connection, adjusting a schema), use update_package_components.

remove_package_components (mutation)

Removes one or more components from an existing package’s flow by component name. Any edge whose source or target was a removed component is dropped at the same time, so the package never ends up with dangling edges.

Argument	Type	Required	Notes
`package_id`	integer	Yes	Package to update.
`component_names`	array	Yes	Inner `name` values from the package’s components. The same key `update_package_components` targets.

The call is all-or-nothing. If any name doesn’t match a component currently in the flow, the entire call is rejected with the list of unknown names and nothing is removed. The package row is locked for the read-modify-write, and each successful call bumps the package version via PaperTrail and is reversible from the version history UI. Use this to undo a mis-added component or to clean up an unused branch. To add a component, use add_package_components. To edit one in place, use update_package_components. To archive the entire package, use delete_package. Returns { package_id, components_removed, edges_removed, new_version, updated_at } on success.

rename_package (mutation)

Renames a package and, optionally, updates its description. Metadata-only. data_flow_json (components and edges) is left untouched.

Argument	Type	Required	Notes
`package_id`	integer	Yes	Package to rename.
`name`	string	Yes	New package name (3–128 characters).
`description`	string	No	Free-form, up to 1024 characters.

The change is attributed to the calling user and recorded in the package version history, the same as a rename through the dashboard. Use this when an agent needs to retitle a package after a prompt-driven authoring step rather than asking the user to open the UI. Returns { package_id, previous_name, name, updated_at } on success.

manage_package_variables (mutation)

Defines, updates, or removes a package’s public variables. Public variables are the named defaults a package exposes; per-run overrides are still passed through run_package’s variables argument.

Argument	Type	Required	Notes
`package_id`	integer	Yes	Package to update.
`set`	object	At least one of `set` or `remove`	Hash of `{ variable_name => default_value }` to add or update. Values are stored as given; for Pig string literals, follow the embedded single-quote rule used by `run_package`.
`remove`	array	At least one of `set` or `remove`	Variable names to delete.

This is a partial update of the variables store, modeled on the dashboard’s Variables modal. The pipeline graph is not touched. The package row is locked for the read-merge-write so a concurrent dashboard save can’t drop a just-set variable. Secret variables are intentionally not supported here. They are encrypted at rest and must be set through the dashboard so plaintext never flows through the MCP transcript. Putting a secret value in set would store it as a plaintext public variable. Returns { package_id, variables } (the full resulting public-variable map) on success.

discover_file_schema (read)

Detects the columns of a delimited file (CSV, TSV, and similar) on a cloud-storage or SFTP connection. This is the file-source equivalent of discover_schema, which only handles database connections. Use it to learn a file’s columns before wiring a Select component or destination.

Argument	Type	Required	Notes
`connection_id`	integer	Yes	A cloud-storage or SFTP connection id from `list_connections`.
`path`	string	Yes	File path on the connection (for example `/data/customers.csv`).
`delimiter`	string	No	Field delimiter. Defaults to `,`; use `\t` for TSV.
`header_row`	boolean	No	`true` if the first row holds column names (default `true`).
`record_type`	string	No	`delimited` (default) for CSV/TSV.
`record_delimiter`	string	No	Defaults to `new_line`.
`char_encoding`	string	No	Defaults to `utf-8`.
`bucket`	string	No	Container or bucket. Default `""`, which is correct for SFTP since the file location lives in `path`.
`lines`	integer	No	Sample size the importer reads (default 20, max 200).
`quote`	string	No	Optional CSV quote character.
`escape`	string	No	Optional CSV escape character.

Returns { connection_id, connection_type, path, field_count, column_names, fields }. Use column_names to wire a Select. Copy fields into the source component’s schema.fields. The importer’s raw field objects are passed through unchanged, so nothing is lost in translation. Each call hits the schema importer over the network and reads a sample of the actual file, so use it sparingly.

Example: detect columns on an SFTP file

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "discover_file_schema",
    "arguments": {
      "connection_id": 47376,
      "path": "/incoming/orders/2026-06-15.csv"
    }
  }
}

delete_package (mutation)

Archives a package — same semantics as the dashboard’s Archive action. The Job row is preserved with status archived, removed from the active package list, and remains queryable via list_packages(status: 'archived').

Argument	Type	Required
`package_id`	integer	Yes

If any active schedule still references the package, the archive transition is rejected. Disable the schedule first with toggle_schedule(enabled: false). Use this to clean up after an unsalvageable create_package attempt so failed iterations don’t accumulate as dead rows. Returns { package_id, status: 'archived', archived_at } on success, or { error: "..." } for package-not-found or rejected transitions.

Recommended agent flow

list_connections                                                           # confirm source + destination connections (never auto-pick)
build_pipeline                                                             # one-call build for source -> destination (preferred)
# OR, for shapes build_pipeline doesn't cover:
discover_schema / discover_file_schema / preview_data                      # learn the source columns
list_packages / get_package(<reference_id>)                                # find a template to clone
list_component_types / describe_component_type                             # only if no template exists
create_package                                                             # mint the new package
preview_transformation                                                     # sanity-check a transform's output mid-build (database sources)
validate_package                                                           # catch structural + per-component errors
update_package_components / update_package_edges / remove_package_components  # fix anything validate_package flagged
rename_package / manage_package_variables                                  # metadata + variable cleanup
delete_package                                                             # archive failed attempts
run_package                                                                # execute on an available cluster
get_run                                                                    # poll until completed

Example: create a minimal SFTP-to-S3 package

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "create_package",
    "arguments": {
      "name": "SFTP nightly drop to S3",
      "flow_type": "dataflow",
      "components": [
        {
          "cloud_storage_source_component": {
            "id": "component-sftp01",
            "name": "nightly_drop",
            "xy": [100, 100],
            "alias": "raw",
            "specificComponentType": "sftp_source_component",
            "connection": { "id": 47376, "name": "[Prod] SFTP", "type": "sftp" },
            "cloud_storage_connection_id": "47376",
            "path": "/incoming/orders/*.csv",
            "schema": {
              "fields": [
                { "name": "id",    "alias": "id",    "data_type": "string" },
                { "name": "total", "alias": "total", "data_type": "double" }
              ]
            }
          }
        },
        {
          "cloud_storage_destination_component": {
            "id": "component-s3001",
            "name": "s3_landing",
            "xy": [400, 100],
            "input_alias": "raw",
            "specificComponentType": "s3_destination_component",
            "connection": { "id": 52001, "name": "[Prod] S3", "type": "s3" },
            "cloud_storage_connection_id": "52001",
            "path": "s3://my-bucket/orders/"
          }
        }
      ],
      "edges": [
        { "source": "component-sftp01", "target": "component-s3001" }
      ]
    }
  }
}

A successful response returns the new package_id. Pass it to validate_package next, fix any errors with update_package_components or update_package_edges, then run with run_package.

Error envelopes

Mutating tools return a plain { "error": "..." } object instead of a JSON-RPC error when the failure is expected and recoverable. Common cases:

Tool	Envelope	Cause
`create_package`	`workspace not found in this account`	`workspace_id` belongs to another account.
`create_package`	record-validation message	Name too long, invalid `flow_type`, save failure.
`update_package_edges`	`unresolved component references: ...`	Edge references a component id that isn’t in the package.
`delete_package`	`package not found`	`package_id` doesn’t belong to the calling account.
`delete_package`	archive transition rejection	An active schedule still references the package.
`remove_package_components`	`component(s) not found in flow: ...`	A name in `component_names` doesn’t match any current component. The whole call is rejected.
`rename_package`	record-validation message	`name` is too short, too long, or otherwise rejected by the model.
`manage_package_variables`	`provide \`set` (hash) and/or `remove` (array) — nothing to do`	Both arguments were missing or empty.
`discover_file_schema`	`connection not found, or not a cloud-storage/SFTP connection`	The id belongs to a database connection (use `discover_schema` instead) or is from another account.
`discover_file_schema`	`schema-importer error: ...`	The importer couldn’t read the file (auth failure, missing file, malformed delimiter). Fix the connection or arguments and retry.
`describe_component_type`	`unknown component type: <name>`	`type` argument doesn’t match any registered component.

Unrecoverable failures (auth, malformed JSON-RPC) return standard JSON-RPC error responses — see the MCP Server overview for HTTP-level errors.

Audit trail

Every write performed through these tools is captured in your account’s audit history via PaperTrail. Package creates, edge updates, and archive transitions all record the calling user as the author. Component edits made by update_package_components are versioned and reversible through the existing package version history UI.

​When to use these tools

​Tool reference

​build_pipeline (mutation)

​Example: SFTP to Snowflake with a merge key and filepath column

​Error envelopes

​create_package (mutation)

​Component shape

​Edge shape

​Best practice: clone from an existing package

​Returns

​list_component_types (read)

​describe_component_type (read)

​update_package_edges (mutation)

​remove_package_components (mutation)

​rename_package (mutation)

​manage_package_variables (mutation)

​discover_file_schema (read)

​Example: detect columns on an SFTP file

​delete_package (mutation)

​Recommended agent flow

​Example: create a minimal SFTP-to-S3 package

​Error envelopes

​Audit trail

When to use these tools

Tool reference

build_pipeline (mutation)

Example: SFTP to Snowflake with a merge key and filepath column

Error envelopes

create_package (mutation)

Component shape

Edge shape

Best practice: clone from an existing package

Returns

list_component_types (read)

describe_component_type (read)

update_package_edges (mutation)

remove_package_components (mutation)

rename_package (mutation)

manage_package_variables (mutation)

discover_file_schema (read)

Example: detect columns on an SFTP file

delete_package (mutation)

Recommended agent flow

Example: create a minimal SFTP-to-S3 package

Error envelopes

Audit trail