Differences

This shows you the differences between two versions of the page.

--- tanszek:oktatas:iss_t:software_integration [2026/02/15 21:45] – [Comparison of Data Sharing Approaches] knehez
+++ tanszek:oktatas:iss_t:software_integration [2026/02/15 22:18] (current) – [Data Sharing] knehez
@@ Line 208: / Line 208: @@
 ===== Data Sharing =====
-A simple approach to integration is data sharing.
+A simple approach to integration is data sharing. Data sharing–based integration aims to transfer and share data between systems. This enables individual systems to access and utilize data stored in other systems.
-Data sharing–based integration aims to transfer and share data between systems.
-This enables individual systems to access and utilize data stored in other systems.
 Data sharing can take several forms:
@@ Line 220: / Line 217: @@
   * **Data Sharing Services** enable real-time access to shared data: They are ideal for modern distributed and cloud-based architectures that require immediate data availability.
-==== File-based data sharing ====
+===== File-Based Data Sharing =====
+The most fundamental method of data sharing. One application writes data, while another application reads data from the same file. The data files are stored in a central location — such as a shared folder (e.g., NFS) or an (S)FTP server. The information flow is unidirectional: A → B.
+==== Data Encoding ====
+Most file-based integration approaches use text-based files.
+The most common formats are:
+  * Plain text
+  * XML
+  * JSON (in modern systems)
+Raw text formats may use:
+  * Fixed-length records
+  * Variable-length records (commonly used in billing and financial systems)
+For variable-length records, a delimiter is required to separate data fields.
+The most widely known method is CSV (Comma-Separated Values).
+<mermaid>
+flowchart LR
+    A[System A]
+    STORAGE[[Shared Folder\nFTP Server]]
+    B[System B]
+    A -- writes --> STORAGE
+    STORAGE -- reads --> B
+</mermaid>
+===== File-Based Integration with Lock Mechanism =====
+==== State Files ====
+State files can be used to track the processing status of data files.
+These files may contain the current processing state, such as:
+  * "in progress"
+  * "completed"
+  * "failed"
+===== File-Based Integration with Lock Mechanism =====
+==== State Files ====
+State files can be used to track the processing status of data files.
+These files may contain the current processing state, such as:
+  * "in progress"
+  * "completed"
+  * "failed"
+<mermaid>
+flowchart LR
+    A[System A]
+    STORAGE[[Shared Folder / FTP Server]]
+    LOCK[(data.lock)]
+    B[System B]
+    A -- "1) create lock" --> LOCK
+    A -- "2) write data" --> STORAGE
+    B -- "3) detect lock" --> LOCK
+    B -- "waits" --> STORAGE
+    A -- "4) remove lock" --> LOCK
+    B -- "5) read data" --> STORAGE
+</mermaid>
+==== Lock File Mechanism ====
+) Lock file creation:
+System A begins processing a data file and creates a lock file,
+for example: //data.lock//.
+) Writing phase:
+System A creates or writes the data file while //data.lock// exists.
+System B attempts to access the data file but detects that the lock file exists,
+therefore it waits.
+) Completion:
+System A finishes processing and removes the data.lock file.
+) Reading phase:
+System B detects the //data.lock// file has been removed, and it can begin its own processing.
+==== Purpose of the Lock Mechanism ====
+This method ensures that only one system processes the data file at a time,
+preventing data conflicts and inconsistencies.
+The use of lock files is a simple and effective technique for
+process synchronization and coordination in file-based integration.
+===== Limitations of File-Based Integration =====
+This method remains widely used today, but it has several significant disadvantages:
+  * The data sharing is not real-time. It is typically suitable for daily, weekly, or monthly batch data exchange. If data is modified between cycles — for example, if a customer changes their address — the invoicing application may still send the invoice to the old address because it receives the update only later.
+  * It may become unreliable when transferring a large number of files (although tools such as rsync can help).
+  * Successful integration requires that developers of both applications (in most cases) agree on and understand:
+    - the file format
+    - file naming conventions
+    - file storage location
+    - how file deletion is handled
+    - the lock mechanism used
+    - the file transfer method
+----
+===== Database-Based Data Sharing =====
+Database-based integration is a method that enables data sharing and synchronization between different systems directly through databases.
+In this approach, multiple applications and systems use either:
+  * a shared database, or
+  * database replication
+to access and manage data.
+<mermaid>
+flowchart LR
+    A[Application A]
+    B[Application B]
+    DB[(Shared Database)]
+    A <--> DB
+    B <--> DB
+</mermaid>
+----
+The same system with db replication:
+<mermaid>
+flowchart LR
+    A[Application A]
+    C[Application C]
+    DB1[(Primary Database)]
+    B1[Application B]
+    B2[Application B]
+    DB2[(Replica Database)]
+    A <--> DB1
+    C <--> DB1
+    DB1 -- replication --> DB2
+    B1 <--> DB2
+    B2 <--> DB2
+</mermaid>
+==== Example ====
+E-commerce platform and Warehouse Management System:
+The e-commerce platform can be directly integrated with the warehouse database to provide real-time inventory information.
+==== Similarities to File-Based Integration ====
+  * Platform-independent connectivity (e.g., JDBC, ODBC)
+  * Multiple instances of identical components may access the same database
+    * Synchronization issue: Who processes the next record in the queue?
+    * However, it can be an ideal solution for data collection scenarios.
+==== Limitations ====
+  * Not real-time by default. If one application writes to the database, another application does not automatically receive notification.
+    * Possible solutions:
+      * Database notification mechanisms (e.g., PostgreSQL LISTEN and NOTIFY)
+      * Triggers
+      * Polling mechanisms
+  * Security considerations: Properly defined access rights are required (table access, permitted operations). Developers often restrict direct visibility of database structures.
+  * Lack of well-defined interfaces — this approach provides only data-level integration.
+==== When to Use Database-Based Integration? ====
+Database-based integration is appropriate in the following scenarios:
+  * When multiple applications need direct access to the same structured data.
+  * When strong transactional consistency is required.
+  * When systems operate within the same organizational or security boundary.
+  * When the data model is stable and well-defined.
+  * When high-performance querying and reporting are necessary.
+  * When database replication can support load balancing or read scalability.
+However, it may NOT be the best choice:
+  * In loosely coupled, distributed architectures (e.g., microservices).
+  * When clear service-level interfaces are required.
+  * When strict decoupling between systems is a design goal.
+  * In highly scalable cloud-native environments where message-based communication is preferred.
+===== Integration Strategy Comparison =====
+^ Aspect ^ File-Based Integration ^ Database-Based Integration ^ Message Queue-Based Integration ^
+| Coupling | Tight coupling (shared file format) | Tight to medium coupling (shared schema) | Loose coupling |
+| Communication Style | Batch, unidirectional | Data-level sharing | Asynchronous message exchange |
+| Real-Time Capability | No | Not by default | Yes (naturally asynchronous) |
+| Scalability | Limited | Moderate | High |
+| Monitoring | Difficult | Database-level monitoring | Built-in queue monitoring (DLQ, metrics) |
+| Complexity | Low initial complexity | Medium | High |
+| Transaction Support | No native support | Strong ACID support | Depends on message broker |
+| Typical Use Case | Periodic data exchange | Shared enterprise systems | Distributed / cloud-native systems |
+| Interface Definition | File format agreement | Shared database schema | Message contract / schema definition |
+| Cloud-Native Suitability | Low | Medium | High |