Skip to content

Getting Started

Deem Integrator is a data integration platform that allows you to build data pipelines and workflows to extract, transform, and load (ETL) data from various sources to targets.

This guide will help you understand the core concepts and get started with creating your first pipeline.

A pipeline is a visual representation of a data transformation process. It consists of:

  • Transforms: Individual steps that read, transform, or write data
  • Hops: Connections between transforms that define the data flow
  • Data Streams: The flow of data records through the pipeline

A workflow (also called a job) orchestrates the execution of pipelines and other actions. Workflows can:

  • Execute pipelines in sequence or parallel
  • Handle errors and retries
  • Set variables and parameters
  • Perform file operations
  • Send notifications

Transforms are the building blocks of pipelines. Each transform performs a specific operation:

  • Input Transforms: Read data from sources (files, databases, APIs)
  • Output Transforms: Write data to targets (files, databases, tables)
  • Transformation Transforms: Modify, filter, or enrich data
  • Utility Transforms: Perform calculations, lookups, or validations

Variables allow you to parameterize your pipelines and workflows. They can be:

  • Set at runtime
  • Defined in configuration files
  • Passed between workflows and pipelines
  • Used in expressions and SQL queries

Start by adding an input transform to read data. For example:

  • Data File Input: Read from CSV or text files
  • Cloud API: Read from APIs (Deem Insight, Dynamics 365, etc.)
  • Deem Datalake Input: Read from Infor Ion Datalake

Step 2: Add Transformation Transforms (Optional)

Section titled “Step 2: Add Transformation Transforms (Optional)”

Add transforms to modify your data:

  • Deem Java Expression: Calculate new fields or modify existing ones
  • Filter Rows: Filter data based on conditions
  • String Range: Map values using ranges

Add an output transform to write your processed data:

  • Table Output: Write to database tables
  • Data File Output: Write to files
  • Staging Upsert Output: Upsert to staging tables

Connect transforms using hops to define the data flow direction.

Configure each transform with the appropriate settings and run the pipeline to process your data.

  • Keep pipelines focused: Each pipeline should have a single, clear purpose
  • Use descriptive names: Name your transforms and pipelines clearly
  • Document complex logic: Add notes or comments for complex transformations
  • Test incrementally: Test each transform as you build the pipeline
  • Use bulk loaders: For large datasets, use bulk loaders (Bulk Loader, MySQL Text Loader) instead of row-by-row inserts
  • Filter early: Apply filters as early as possible in the pipeline to reduce data volume
  • Optimize lookups: Use indexed columns for database lookups
  • Batch processing: Process data in batches when possible
  • Validate inputs: Check for required fields and data types
  • Handle nulls: Use transforms like “If Empty” to handle null or empty values
  • Log errors: Configure error handling to log and track issues
  • Test edge cases: Test with empty datasets, null values, and boundary conditions
  • Use variables: Parameterize file paths, connection strings, and other configuration
  • Environment-specific configs: Use different configurations for development, testing, and production
  • Secure credentials: Store sensitive information securely, not hardcoded in pipelines

A common pattern is to load data into staging tables, then process and move to final tables:

  1. Bulk Loader or Indexed Table Output → Load raw data to staging table
  2. Get Timestamp → Get last processed timestamp
  3. Table Input → Read only new/changed records
  4. Transform → Apply business logic
  5. Staging Upsert Output → Upsert to final tables

For incremental data loads:

  1. Get Timestamp → Get last successful run timestamp
  2. Input Transform → Read data filtered by timestamp
  3. Transform → Process new data
  4. Output → Write to target
  5. Update Timestamp → Store new timestamp for next run

For integrating with APIs:

  1. Cloud API → Read data from API
  2. Deem Java Expression → Transform API response
  3. Table Output or File Output → Store processed data