tanszek:oktatas:iss_t:modern_data_integration_based_on_protocol_buffer

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
tanszek:oktatas:iss_t:modern_data_integration_based_on_protocol_buffer [2023/03/05 20:41] kneheztanszek:oktatas:iss_t:modern_data_integration_based_on_protocol_buffer [2024/03/18 10:07] (current) knehez
Line 1: Line 1:
-==== Protocol Buffer ====+==== Protocol Buffers (Protobuf) ====
  
-This is a solution for the serialization of structured data, developed by GoogleThe interface description is also displayed for this data integration method.+Protocol Buffers (Protobuf) is a method developed by Google for //serializing structured data//similar to XML or JSONIt is especially beneficial in applications that communicate with servers or store data, where efficiency and the //speed of data transmission// are crucial. Protobuf is designed to be simpler and more efficient than XML and JSON, offering both smaller message sizes and faster processing.
  
-The protocol buffer is a binary serialization methodHoweverits big advantage is that it supports many technologiesthereby increasing platform independence.+**Protobuf** requires you to define your structured data in a standard format in a **.proto** file, which is then used to generate source code in your //chosen programming language//. This source code is used to write and read your structured data to and from variety of data streams and using a variety of languages 
 + 
 +==== Key Features of Protobuf ==== 
 +    * **Efficiency**: Protobuf is designed to be more efficient than XML and JSONboth in terms of speed and the size of the serialized data. 
 +    * **Cross-language**: Protobuf supports generated code in various programming languages, allowing for easy data exchange between systems written in different languages. 
 +    * **Backward compatibility**: Protobuf is designed to maintain compatibility even if the structure of the data changes, allowing old code to read new data formats and vice versa. 
 +    * **Less verbose**: Protobuf messages are much less verbose than XML, leading to significant bandwidth savings. 
 + 
 +==== Using Protobuf in Data Integration ==== 
 + 
 +//Protobuf// can be particularly useful in data integration scenarios where different systems or components need to exchange data efficiently. 
 + 
 +    * **Cross-Language Communication**: since Protobuf supports various languages (Java, C++, Python, etc.), it's an excellent choice for integrating systems that are //developed in different programming languages//
 +    * **Microservices Architecture**: in a //microservices// architecture, different services might need to communicate with each other over a network. Protobuf can be used to serialize the messages exchanged between services, ensuring efficient communication. 
 +    * **API Development**: when developing APIs, especially those that are used heavily or exposed to external users, Protobuf can be used to efficiently serialize request and response bodies. This can be particularly beneficial for mobile clients where bandwidth might be limited. 
 +    * **Big Data and Streaming**: For systems that process large volumes of data or stream data in real-time, Protobuf can be used to serialize data points efficiently. This ensures that the system can handle high volumes of data with minimal overhead. 
 +    * **Data Storage:** Protobuf can also be used for serializing data before storing it in databases or file systems. Its efficient serialization can lead to performance improvements and reduced storage costs. 
 + 
 +To implement Protobuf in a data integration projectyou would typically: 
 + 
 +  * Define your data structures in a **.proto** file. 
 +  * Use the Protobuf compiler (**protoc**) to generate data access classes in your preferred programming language from your .proto files. 
 +  * Use these //generated classes// to serialize and deserialize your data structures for communication between systems or services.
  
 More details can be found here: More details can be found here:
Line 9: Line 31:
 https://developers.google.com/protocol-buffers/docs/tutorials https://developers.google.com/protocol-buffers/docs/tutorials
  
-1.) Install the translator from the official website. https://github.com/protocolbuffers/protobuf/releases - in the case of Windows, unzip the file protoc-XXX.zip.+1.) Install the compiler from the official website. https://github.com/protocolbuffers/protobuf/releases - in the case of Windows, unzip the file protoc-XXXXXX-win64.zip.
  
 2.) Create a directory called ./proto and the file book.proto with the following content: 2.) Create a directory called ./proto and the file book.proto with the following content:
  
-<code python>+<sxh python>
 syntax = "proto3"; syntax = "proto3";
    
Line 26: Line 48:
     repeated Book books = 1;     repeated Book books = 1;
 } }
-</code>+</sxh>
  
 We have created two messages named Book and Books. Books can contain several Books. = 1, = 2 at the end of the lines indicates the internal position of the structure field, numbering starts from one. We have created two messages named Book and Books. Books can contain several Books. = 1, = 2 at the end of the lines indicates the internal position of the structure field, numbering starts from one.
Line 36: Line 58:
 After running, book_pb2.py is created, which is generated source code and contains the data interface. This can be used to manage (serialize and de-serialize) the data. After running, book_pb2.py is created, which is generated source code and contains the data interface. This can be used to manage (serialize and de-serialize) the data.
  
-4.) Run  +4.) Upgrade protobuf interface 
-   pip install –upgrade protobuf+   pip install protobuf
  
 5.) Create the server.py file with the following content: 5.) Create the server.py file with the following content:
  
-<code python>+<sxh python>
 import socket import socket
 import book_pb2 import book_pb2
Line 77: Line 99:
  
 s.close() s.close()
-</code>+</sxh>
  
 6.) Create the create_books.py file with the following content: 6.) Create the create_books.py file with the following content:
  
-<code python>+<sxh python>
 import book_pb2 import book_pb2
  
Line 106: Line 128:
  
     return books     return books
-</code>+</sxh>
  
 7.) Create the client.py file with the following content: 7.) Create the client.py file with the following content:
-<code python>+<sxh python>
 import socket import socket
 import book_pb2 import book_pb2
Line 142: Line 164:
     fb.write(msg)     fb.write(msg)
     print("client> data.bytes saved\n")     print("client> data.bytes saved\n")
-</code>+</sxh>
 8.) Run the server and client. python server.py then python client.py commands and let's see and analyze what happens? 8.) Run the server and client. python server.py then python client.py commands and let's see and analyze what happens?
tanszek/oktatas/iss_t/modern_data_integration_based_on_protocol_buffer.1678048874.txt.gz · Last modified: 2023/03/05 20:41 by knehez