Table of Contents
What does software integration mean?
Definition
Software integration is a development process in which separate software systems—applications and components—are connected so they work together to form a new, unified system.
Phases
1.) Requirements assessment and planning
- requirements assessment: identifying user needs and business requirements
- planning: developing the integration strategy, designing the system architecture and integration points
2.) Requirements analysis and specification
- analysis: detailed assessment of required functions and the capabilities of existing systems
- specification: definition of interfaces and data exchange formats
3.) Development and implementation
- development: creation of integration code for existing systems
- implementation: deployment of new components and modification of existing ones
4.) Testing and validation
5.) Maintenance and support
Legacy Systems
Definition The term legacy system refers to IT systems that use older (possibly obsolete) technologies but are still actively operating and play an essential role in an organisation's everyday operation.
Why use legacy systems?
- Long lifetime and stability: Many legacy systems have operated reliably for years or even decades. If a system functions well and is mission-critical, there is often no compelling reason to replace it.
- Cost considerations: Replacing an entire system can be extremely expensive.
- Complexity: Legacy systems are often deeply integrated into other organizational processes and systems.
- Risk avoidance: Organizations often avoid the risk of replacement
Why Are They Not Replaced?
- Cost and lack of resources: Replacement is often not economically feasible, and risk estimation can be difficult due to strict security requirements.
- Disruption of business processes: Introducing a new system may significantly disrupt daily operations. Many organizations prioritize operational stability (e.g., banking institutions).
Solutions
- Gradual migration: Instead of replacing the entire system at once, organizations incrementally migrate to new systems by replacing specific components or functionalities.
- Outsourcing maintenance and operation: Maintenance and operation of legacy systems may be delegated to external service providers, reducing internal costs and specialized staffing requirements.
Overview of Integration Strategies
Point to Point connection
Components connect directly to each other, typically via file transfer or direct database access. There is no intermediary layer, therefore communication is fast. Initially, it is easy to implement.
Disadvantages – Challenges
- Difficult to scale
- Future expansion can become complex
- The number of connections grows exponentially → n(n−1)/2 connections
- Fragile architecture, as monitoring and troubleshooting errors are difficult
Middleware Integration
Components do not connect to each other directly; instead, they communicate through a central intermediary (e.g., API Gateway, Application Server, Enterprise Service Bus – ESB).
The intermediary layer handles different communication protocols.
- Monitoring capabilities: communication tracking and centralized supervision
- Improved scalability
- Centralized functions: authorization management and transaction handling
Disadvantages – Challenges
- Complexity: system development costs are high
- Monolithic architecture – one central server serving many clients
Message Queue-Based Integration
Components do not connect to each other directly; instead, they communicate via message queues.
Messages are processed asynchronously.
- Monitoring capabilities: use of specialized queues (e.g., Dead Letter Queue – DLQ)
- Highly scalable architecture
- Naturally suited for cloud-based systems
Disadvantages – Challenges
- Complexity: system development costs are high
- Requires advanced expertise and careful architectural design
Data Sharing
A simple approach to integration is data sharing. Data sharing–based integration aims to transfer and share data between systems. This enables individual systems to access and utilize data stored in other systems.
Data sharing can take several forms:
Comparison of Data Sharing Approaches
- Data Migration is typically a one-time process: It is suitable for system replacement or major upgrades, but it does not ensure continuous consistency.
- Data Synchronization provides ongoing consistency between systems: It is appropriate when multiple systems must maintain aligned datasets over time.
- Data Sharing Services enable real-time access to shared data: They are ideal for modern distributed and cloud-based architectures that require immediate data availability.
File-Based Data Sharing
The most fundamental method of data sharing. One application writes data, while another application reads data from the same file. The data files are stored in a central location — such as a shared folder (e.g., NFS) or an (S)FTP server. The information flow is unidirectional: A → B.
Data Encoding
Most file-based integration approaches use text-based files.
The most common formats are:
- Plain text
- XML
- JSON (in modern systems)
Raw text formats may use:
- Fixed-length records
- Variable-length records (commonly used in billing and financial systems)
For variable-length records, a delimiter is required to separate data fields. The most widely known method is CSV (Comma-Separated Values).
File-Based Integration with Lock Mechanism
State Files
State files can be used to track the processing status of data files.
These files may contain the current processing state, such as:
- “in progress”
- “completed”
- “failed”
File-Based Integration with Lock Mechanism
State Files
State files can be used to track the processing status of data files.
These files may contain the current processing state, such as:
- “in progress”
- “completed”
- “failed”
Lock File Mechanism
1) Lock file creation: System A begins processing a data file and creates a lock file, for example: data.lock.
2) Writing phase: System A creates or writes the data file while data.lock exists. System B attempts to access the data file but detects that the lock file exists, therefore it waits.
3) Completion: System A finishes processing and removes the data.lock file.
4) Reading phase: System B detects the data.lock file has been removed, and it can begin its own processing.
Purpose of the Lock Mechanism
This method ensures that only one system processes the data file at a time, preventing data conflicts and inconsistencies.
The use of lock files is a simple and effective technique for process synchronization and coordination in file-based integration.
Limitations of File-Based Integration
This method remains widely used today, but it has several significant disadvantages:
- The data sharing is not real-time. It is typically suitable for daily, weekly, or monthly batch data exchange. If data is modified between cycles — for example, if a customer changes their address — the invoicing application may still send the invoice to the old address because it receives the update only later.
- It may become unreliable when transferring a large number of files (although tools such as rsync can help).
- Successful integration requires that developers of both applications (in most cases) agree on and understand:
- the file format
- file naming conventions
- file storage location
- how file deletion is handled
- the lock mechanism used
- the file transfer method
Database-Based Data Sharing
Database-based integration is a method that enables data sharing and synchronization between different systems directly through databases.
In this approach, multiple applications and systems use either:
- a shared database, or
- database replication
to access and manage data.
The same system with db replication:
Example
E-commerce platform and Warehouse Management System: The e-commerce platform can be directly integrated with the warehouse database to provide real-time inventory information.
Similarities to File-Based Integration
- Platform-independent connectivity (e.g., JDBC, ODBC)
- Multiple instances of identical components may access the same database
- Synchronization issue: Who processes the next record in the queue?
- However, it can be an ideal solution for data collection scenarios.
Limitations
- Not real-time by default. If one application writes to the database, another application does not automatically receive notification.
- Possible solutions:
- Database notification mechanisms (e.g., PostgreSQL LISTEN and NOTIFY)
- Triggers
- Polling mechanisms
- Security considerations: Properly defined access rights are required (table access, permitted operations). Developers often restrict direct visibility of database structures.
- Lack of well-defined interfaces — this approach provides only data-level integration.
When to Use Database-Based Integration?
Database-based integration is appropriate in the following scenarios:
- When multiple applications need direct access to the same structured data.
- When strong transactional consistency is required.
- When systems operate within the same organizational or security boundary.
- When the data model is stable and well-defined.
- When high-performance querying and reporting are necessary.
- When database replication can support load balancing or read scalability.
However, it may NOT be the best choice:
- In loosely coupled, distributed architectures (e.g., microservices).
- When clear service-level interfaces are required.
- When strict decoupling between systems is a design goal.
- In highly scalable cloud-native environments where message-based communication is preferred.
Integration Strategy Comparison
| Aspect | File-Based Integration | Database-Based Integration | Message Queue-Based Integration |
|---|---|---|---|
| Coupling | Tight coupling (shared file format) | Tight to medium coupling (shared schema) | Loose coupling |
| Communication Style | Batch, unidirectional | Data-level sharing | Asynchronous message exchange |
| Real-Time Capability | No | Not by default | Yes (naturally asynchronous) |
| Scalability | Limited | Moderate | High |
| Monitoring | Difficult | Database-level monitoring | Built-in queue monitoring (DLQ, metrics) |
| Complexity | Low initial complexity | Medium | High |
| Transaction Support | No native support | Strong ACID support | Depends on message broker |
| Typical Use Case | Periodic data exchange | Shared enterprise systems | Distributed / cloud-native systems |
| Interface Definition | File format agreement | Shared database schema | Message contract / schema definition |
| Cloud-Native Suitability | Low | Medium | High |
