Getting Started

Overview

Deem Integrator is a data integration platform that allows you to build data pipelines and workflows to extract, transform, and load (ETL) data from various sources to targets.

This guide will help you understand the core concepts and get started with creating your first pipeline.

Core Concepts

Pipelines

A pipeline is a visual representation of a data transformation process. It consists of:

Transforms: Individual steps that read, transform, or write data
Hops: Connections between transforms that define the data flow
Data Streams: The flow of data records through the pipeline

Workflows

A workflow (also called a job) orchestrates the execution of pipelines and other actions. Workflows can:

Execute pipelines in sequence or parallel
Handle errors and retries
Set variables and parameters
Perform file operations
Send notifications

Transforms

Transforms are the building blocks of pipelines. Each transform performs a specific operation:

Input Transforms: Read data from sources (files, databases, APIs)
Output Transforms: Write data to targets (files, databases, tables)
Transformation Transforms: Modify, filter, or enrich data
Utility Transforms: Perform calculations, lookups, or validations

Variables

Variables allow you to parameterize your pipelines and workflows. They can be:

Set at runtime
Defined in configuration files
Passed between workflows and pipelines
Used in expressions and SQL queries

Creating Your First Pipeline

Step 1: Add an Input Transform

Start by adding an input transform to read data. For example:

Data File Input: Read from CSV or text files
Cloud API: Read from APIs (Deem Insight, Dynamics 365, etc.)
Deem Datalake Input: Read from Infor Ion Datalake

Step 2: Add Transformation Transforms (Optional)

Add transforms to modify your data:

Deem Java Expression: Calculate new fields or modify existing ones
Filter Rows: Filter data based on conditions
String Range: Map values using ranges

Step 3: Add an Output Transform

Add an output transform to write your processed data:

Table Output: Write to database tables
Data File Output: Write to files
Staging Upsert Output: Upsert to staging tables

Step 4: Connect Transforms

Connect transforms using hops to define the data flow direction.

Step 5: Configure and Run

Configure each transform with the appropriate settings and run the pipeline to process your data.

Best Practices

Pipeline Design

Keep pipelines focused: Each pipeline should have a single, clear purpose
Use descriptive names: Name your transforms and pipelines clearly
Document complex logic: Add notes or comments for complex transformations
Test incrementally: Test each transform as you build the pipeline

Performance

Use bulk loaders: For large datasets, use bulk loaders (Bulk Loader, MySQL Text Loader) instead of row-by-row inserts
Filter early: Apply filters as early as possible in the pipeline to reduce data volume
Optimize lookups: Use indexed columns for database lookups
Batch processing: Process data in batches when possible

Error Handling

Validate inputs: Check for required fields and data types
Handle nulls: Use transforms like “If Empty” to handle null or empty values
Log errors: Configure error handling to log and track issues
Test edge cases: Test with empty datasets, null values, and boundary conditions

Variables and Configuration

Use variables: Parameterize file paths, connection strings, and other configuration
Environment-specific configs: Use different configurations for development, testing, and production
Secure credentials: Store sensitive information securely, not hardcoded in pipelines

Common Patterns

Staging Pattern

A common pattern is to load data into staging tables, then process and move to final tables:

Bulk Loader or Indexed Table Output → Load raw data to staging table
Get Timestamp → Get last processed timestamp
Table Input → Read only new/changed records
Transform → Apply business logic
Staging Upsert Output → Upsert to final tables

Incremental Load Pattern

For incremental data loads:

Get Timestamp → Get last successful run timestamp
Input Transform → Read data filtered by timestamp
Transform → Process new data
Output → Write to target
Update Timestamp → Store new timestamp for next run

API Integration Pattern

For integrating with APIs:

Cloud API → Read data from API
Deem Java Expression → Transform API response
Table Output or File Output → Store processed data

Transforms Overview - All available transforms
Workflow Actions - Workflow actions
Testrapporter Overview - Main documentation index