Skip to main content
Use the DB Lookup transformation to enrich each incoming record with columns fetched from a database table or SQL query. For every input row, DB Lookup matches one or more input fields against columns in the lookup source and appends the lookup’s return columns to the row. It is the ETL equivalent of a database join against a reference table, configured entirely in the pipeline designer with no SQL required. DB Lookup is best for small reference tables (country codes, status codes, product categories, user ID to name/email, and the like, typically under 100K rows). The entire lookup table or query result is loaded into memory, so very large tables are not a good fit. For large-to-large joins, use the Join transformation instead.
DB Lookup transformation editor showing the four configuration steps with Lookup properties expanded

Overview

The DB Lookup component has one input and one output. It is configured through four steps:
StepNameWhat you configure
1Choose input connectionThe database connection holding the lookup table or query
2Lookup propertiesThe lookup source, match type, and no-match / multiple-match behavior
3Key mappingWhich input fields match which lookup columns
4Return schemaWhich upstream and lookup columns appear in the output
DB Lookup supports the same database connection types as the Database source (MySQL, PostgreSQL, Oracle, SQL Server, Snowflake, Redshift, BigQuery, Vertica, SAP HANA, and others).

Step 1: Choose input connection

Select the database connection that holds your lookup data. This is the connection DB Lookup reads the reference table or query from. The connection must already exist in your account.

Step 2: Lookup properties

Configure how the lookup source is read and how matches are handled.
FieldDescription
Lookup sourceTable Direct reads a single table; Custom SQL Query reads the result of a SQL query.
Lookup schema / Lookup tableThe schema and table to read (Table Direct mode).
QueryThe SQL query whose result is used as the lookup data (Custom SQL Query mode).
Match typeExact match compares keys exactly; Case insensitive ignores case when comparing text keys.
When no match is foundReturn nulls outputs null for the lookup columns; Use default values outputs configured defaults; Fail pipeline stops the job.
When multiple rows matchFirst record (default), Last record, Fail pipeline, or Return null when a key matches more than one lookup row.

Step 3: Key mapping

Map one or more input fields to the lookup columns they should match against. A row in the output is matched when all of its key mappings are satisfied (logical AND). The step opens with one blank mapping row. For each mapping, pick an Upstream field (from the incoming records) and the Lookup column to compare it to. Add more rows with + Add key mapping for composite keys.
Key mapping step with Upstream field and Lookup column columns and an Add key mapping button
Make sure each mapped pair shares a compatible data type. Comparing a numeric input field against a text lookup column may produce no matches.

Step 4: Return schema

Choose which columns appear in the component’s output. The output is the combination of upstream pass-through columns and lookup return columns.
Return schema step showing Upstream fields and Lookup Table tabs, with selected fields and auto-renamed aliases
The Available fields panel has two tabs:
  • Upstream fields are the columns coming from the input. Their type is determined by the upstream component; when it cannot be determined, the type shows a dash and is read-only (Pig preserves the real type at run time).
  • Lookup Table are the columns from the lookup table or query, with the type reported by the source.
Add a field to the output with its + button, or remove a selected field with its - button. Each output column has an editable Alias that becomes its name in the output schema.
BehaviorDescription
Independent add/removeAn upstream column and a lookup column that share a name (for example both id) are treated as separate fields. Adding or removing one does not affect the other.
Automatic alias suffixWhen a lookup column’s default alias would collide with an upstream column of the same name, the lookup alias is automatically suffixed with _lookup (for example id_lookup).
Unique output aliasesEvery output column needs a unique alias. If two columns still share an alias, an inline warning blocks Preview and Save until you rename one.
Default valuesWhen Use default values is selected in Step 2, set a default for each lookup return column. Numeric columns require a default that parses as their type.

Example

A pipeline reads orders that carry a product_id. A DB Lookup against a products reference table maps the input product_id to the lookup id column and returns name and price. Input record:
order_id=1001, product_id=42, quantity=3
Lookup table products:
id=42, name="Widget", price=9.99
Output record (upstream columns passed through, lookup columns appended):
order_id=1001, product_id=42, quantity=3, name="Widget", price=9.99
If name or price already existed upstream, the lookup aliases would be auto-suffixed to name_lookup and price_lookup to keep every output column unique.

Best Practices

  • Keep the lookup table small. The whole table or query result is loaded into memory, so DB Lookup suits reference data, not large fact tables.
  • Restrict the columns. Use Custom SQL Query mode (or trim the Return schema) so only the columns you need are loaded.
  • Choose unique keys. Mapping to a non-unique lookup column triggers the multiple-match behavior; pick keys that are unique in the lookup source where possible.
  • Decide no-match handling deliberately. Use Return nulls for optional enrichment, Use default values for guaranteed output, and Fail pipeline when an unmatched row is a data error.

Preview behavior

Preview loads up to 1000 rows of the lookup table through the schema importer (its first rows). For lookup tables larger than that, rows may show in preview as “no match” that would match in a full job run, which reads the entire lookup table. Preview key-matching is also approximate (keys are compared as text), so a preview match is not a guaranteed match in the job. Verify with a real run.

FAQ

Q: How is DB Lookup different from the Join transformation? Join combines two pipeline inputs and runs on the cluster. DB Lookup reads a reference table or query from a database connection and loads it into memory to enrich a single input. Use DB Lookup for small reference data and Join for combining two large pipeline data sets. Q: What happens when a key matches more than one lookup row? The “When multiple rows match” setting decides: keep the first matching record (default), keep the last, return null for the lookup columns, or fail the pipeline. Q: Why did my lookup column get renamed with a _lookup suffix? Its default alias collided with an upstream column of the same name. To keep every output column unique, the lookup alias is automatically suffixed (for example id_lookup). You can edit the alias to any unique name. Q: Why is the type for an upstream field shown as a dash and not editable? The upstream component did not report a type for that column. The value is display-only; the real type is preserved at run time, so the output is unaffected. Q: How large can the lookup table be? The entire lookup table or query result is loaded into memory, so keep it small (typically under 100K rows). For larger data, use the Join transformation.

ETL: Join Transformation

ETL: Cross Join Transformation

ETL: Select Transformation

ETL: Database Source

Last modified on June 16, 2026