Getting Started
Overview
Section titled “Overview”Deem Integrator is a data integration platform that allows you to build data pipelines and workflows to extract, transform, and load (ETL) data from various sources to targets.
This guide will help you understand the core concepts and get started with creating your first pipeline.
Core Concepts
Section titled “Core Concepts”Pipelines
Section titled “Pipelines”A pipeline is a visual representation of a data transformation process. It consists of:
- Transforms: Individual steps that read, transform, or write data
- Hops: Connections between transforms that define the data flow
- Data Streams: The flow of data records through the pipeline
Workflows
Section titled “Workflows”A workflow (also called a job) orchestrates the execution of pipelines and other actions. Workflows can:
- Execute pipelines in sequence or parallel
- Handle errors and retries
- Set variables and parameters
- Perform file operations
- Send notifications
Transforms
Section titled “Transforms”Transforms are the building blocks of pipelines. Each transform performs a specific operation:
- Input Transforms: Read data from sources (files, databases, APIs)
- Output Transforms: Write data to targets (files, databases, tables)
- Transformation Transforms: Modify, filter, or enrich data
- Utility Transforms: Perform calculations, lookups, or validations
Variables
Section titled “Variables”Variables allow you to parameterize your pipelines and workflows. They can be:
- Set at runtime
- Defined in configuration files
- Passed between workflows and pipelines
- Used in expressions and SQL queries
Creating Your First Pipeline
Section titled “Creating Your First Pipeline”Step 1: Add an Input Transform
Section titled “Step 1: Add an Input Transform”Start by adding an input transform to read data. For example:
- Data File Input: Read from CSV or text files
- Cloud API: Read from APIs (Deem Insight, Dynamics 365, etc.)
- Deem Datalake Input: Read from Infor Ion Datalake
Step 2: Add Transformation Transforms (Optional)
Section titled “Step 2: Add Transformation Transforms (Optional)”Add transforms to modify your data:
- Deem Java Expression: Calculate new fields or modify existing ones
- Filter Rows: Filter data based on conditions
- String Range: Map values using ranges
Step 3: Add an Output Transform
Section titled “Step 3: Add an Output Transform”Add an output transform to write your processed data:
- Table Output: Write to database tables
- Data File Output: Write to files
- Staging Upsert Output: Upsert to staging tables
Step 4: Connect Transforms
Section titled “Step 4: Connect Transforms”Connect transforms using hops to define the data flow direction.
Step 5: Configure and Run
Section titled “Step 5: Configure and Run”Configure each transform with the appropriate settings and run the pipeline to process your data.
Best Practices
Section titled “Best Practices”Pipeline Design
Section titled “Pipeline Design”- Keep pipelines focused: Each pipeline should have a single, clear purpose
- Use descriptive names: Name your transforms and pipelines clearly
- Document complex logic: Add notes or comments for complex transformations
- Test incrementally: Test each transform as you build the pipeline
Performance
Section titled “Performance”- Use bulk loaders: For large datasets, use bulk loaders (Bulk Loader, MySQL Text Loader) instead of row-by-row inserts
- Filter early: Apply filters as early as possible in the pipeline to reduce data volume
- Optimize lookups: Use indexed columns for database lookups
- Batch processing: Process data in batches when possible
Error Handling
Section titled “Error Handling”- Validate inputs: Check for required fields and data types
- Handle nulls: Use transforms like “If Empty” to handle null or empty values
- Log errors: Configure error handling to log and track issues
- Test edge cases: Test with empty datasets, null values, and boundary conditions
Variables and Configuration
Section titled “Variables and Configuration”- Use variables: Parameterize file paths, connection strings, and other configuration
- Environment-specific configs: Use different configurations for development, testing, and production
- Secure credentials: Store sensitive information securely, not hardcoded in pipelines
Common Patterns
Section titled “Common Patterns”Staging Pattern
Section titled “Staging Pattern”A common pattern is to load data into staging tables, then process and move to final tables:
- Bulk Loader or Indexed Table Output → Load raw data to staging table
- Get Timestamp → Get last processed timestamp
- Table Input → Read only new/changed records
- Transform → Apply business logic
- Staging Upsert Output → Upsert to final tables
Incremental Load Pattern
Section titled “Incremental Load Pattern”For incremental data loads:
- Get Timestamp → Get last successful run timestamp
- Input Transform → Read data filtered by timestamp
- Transform → Process new data
- Output → Write to target
- Update Timestamp → Store new timestamp for next run
API Integration Pattern
Section titled “API Integration Pattern”For integrating with APIs:
- Cloud API → Read data from API
- Deem Java Expression → Transform API response
- Table Output or File Output → Store processed data
Related
Section titled “Related”- Transforms Overview - All available transforms
- Workflow Actions - Workflow actions
- Testrapporter Overview - Main documentation index