Transcript
Distributed Sys Systems tems Message Passing Prof. Aniruddha S Rumale Assistant Professor, Comp. Engg. Dept.
Introduction Process means program in execution. Two comps in DS are communicating means two processes communicating with each other. Process communication is necessary to achieve some common goal. DS needs to provide InterProcess Communication ( IPC)
IPC IPC requires information sharing among two or more processes. 1) Original sharing or shared data approach information to be shared is stored in common memory area
2) Copy Sharing or message passing information to to be shared is physically copied from sender process¶s address space to address spaces of all receiver processes in form of message.
P1
Shared common memory
Shared data approach
P2
P1
P2
Message-passing approach
Generally Computers in N/W do not n ot share memory. memory. So IPC in DS uses Message-passing Message-p assing over over shared data Approach. Message-passing system system provides message based IPC protocols by shielding the complex N/W protocols and also by shielding the multiple heterogeneous platforms from user user..
Features
of Message-passing system
1. Si Simp mpli lici city ty should be simple and easy to use. It must be straightforward to construct new applications and communicate with existing ones using systems primitives.
2. Uni Unifor form m Sema Semanti ntics cs semantics of remote communications( communicating processes are on different nodes) should be as close as possible to local communications( communication s( communicating processes are on same node).
Features
of Message-passing system «continue
3.Efficiency an IPC protocol of a message-passing system system can be made efficient by reducing the number of message exchanges, as far as practicable, during the communication process. Avoiding the cost of establishing and terminating connections between same pair of processes each and every message exchange between betw een them. Minimizing the cost of maintaining the connections. Piggybacking the acknowledgement of previous messages with the next message during a connection involving several message exchanges between sender and receiver rec eiver..
Features
of Message-passing system «continue
4. Reliabilty DS is prone to catastrophic catastroph ic events.( node crash, communication link (s) failures etc«) A reliable IPC protocol can cope with failure problems and guarantees the delivery of message. Handling of lost messages involves ACKs ACKs and retransmissions on the basis of timeout. Duplicate messages may be sent in the event of failures or timeouts. A reliable IPC protocol must be capable of detecting and handling of duplicates. It involves generating and assigning sequence numbers to messages.
Features
of Message-passing system «continue
5.Correctness Correctness is a feature related to IPC protocols for group communication. Atomicity : message sent to a group of receivers will be delivered to either all of them or none of them. Ordered Delivery: messages arrive at all receivers in order acceptable to an application. Survivability: guarantees that messages will be delivered correctly despite partial failures of processes, machines, or communication links.
Features
of Message-passing system «continue
6.Flexibilty Not all applications require the same degree of correctness and reliability of IPC protocols. Thus , the IPC protocol of message passing system must be flexible enough to cater to the various needs of different applications. IPC primitives must also have the flexibility to permit any kind of control flow between the cooperating processes, including synchronous and asynchronous send/receive .
Features
of Message-passing system «continue
7. Security A good message-passing system must be capable of providing a secure end-to-end communication. A message in transit on the network should not be accessible to any user other than those to whom it is addressed and the sender. This involves a) authentication of receiver(s) of a message by sender b) authentication of sender of message by receiver(s) c) encryption of message before sending it over o ver N/W. N/W.
Features
of Message-passing system «continue
8. Portability the message-passing system system should itself be portable. the applications written by using the primitives of IPC protocols of message-passing system should be portable.
Issues in IPC by message-passing Message is a block of information formatted by a sending process in such a manner that it is meaningful to the receiving process. Structural information Sequence Addresses Actual Number Sending Data or Number Receiving Or Process Pointer Of bytes/ Type P rocess Message Address To data elements Address ID
A typical message structure
Issues in IPC by message-passing «continued w ho is receiver? who is sender and who How many receivers? One or many? Is the message guaranteed to have been accepted by its receiver(s)? Does the sender need to wait w ait for reply? How to handle catastrophic events ( node crash, link(s) failure(s), etc«) occurring during communication? If receiver is not ready; what to do with w ith message? Discard or store in buffer? What to do if buffer is full? Can receiver choose order of acceptance to serve outstanding messages?
Synchronization Central issue in communication structure is synchronization imposed on the communicating processes by the communication primitives. Two semantics Blocking and non-blocking can be used. A primitive is said to have non-blocking semantics if its invocation does not block the execution of its invoker invoker.. If execution of invoker is blocked, it is blocking semantics. semantics. These semantics are primarily used for send for send and receive primitives. Incase of blocking send primitive, sending process is blocked after execution of send until it receives an ACK from receiver that the message is received. Incase of non-blocking send, process proceeds with its execution as soon as the message get copied to buffer .( transferred if Null-buffer is used)
Synchronization« Synchronization «continued Incase of block-receive, receiving process is blocked until it receives a message (ACK). Incase of non-blocking receive, process proceeds with its execution as soon as receive primitive is executed. How non-blocking receive knows that message is arrived in buffer? 1. Polling : a test primitive is provided to allow allo w receiver to check the buffer status. A periodic execution of test is carried out called as polling. 2. Int nte errupt : when buffer get filled and becomes ready to be used by receiving process a software interrupt notifies this to receiver. receiver. Saves repeated unsuccessful check of polling.
Synchronization «continued A variant variant to non-blocking non-blo cking receive primitive is conditional receive primitive. primitive. This returns control con trol to invoking process immediately,, either with a message or with an indicator of noimmediately message. In blocking-send primitive, sending process could get blocked forever if receiver crashes or if message loss due to other reasons. To To avoid this blocking primitives uses time-out value ( time-stamp, waiting-time) specifying interval of time after which the operation of blocking-primitive blocking-p rimitive ( blockingsend) is terminated with an error status. Time-out value is either default ( system calculated ) or user defined ( human-time) with respect to communication criteria.
Synchronization «continued
Sender
Receiver
When both send and receive Send (message) Receive (message) Execution primitives of a communication Execution suspended between two processes use suspended blocking semantics, the Execution message communication is said to be Blocked Resumed synchronous; otherwise it is State said to be asynchronous. Send (ACK)
Synchronous communication Execution ACK comparatively easy to Resumed Executing State implement than asynchronous communication.
Synchronous mode of communication with both send and receive having blocking semantics
Synchronization «continued Synchronous communication is more reliable. If message get lost or is undelivered, un delivered, no backward error recovery is necessary. It limits concurrency It is subject to communication deadlocks. Less flexible than asynchronous communication. Requires unnecessary waits for ACK. And it is slower than asynchronous communication.
Buffering In communication, in some cases receiving process may not be ready to receive a message.Such messages need to be stored somewhere, usually in the buffer of receiver receiver,, for later reception and processing.
Sending Process
Message
Null Buffering ( No Buffering) [ Synchronous] No temporary storage at receiver to store the message. The message remains in the sender process¶s address space and execution of send is delayed until the receiver executes the corresponding receive.
ACK
Or message is simply discarded and time-out mechanism Receiving Process is used to resend the message after a time-out period. Null Buffering
Buffering« continued Single-Message Buffer [ Synchronous] Null buffer is not suitable for communication in DS if receiver is not ready, ready, it may require more than two repeated message transfer of same message. Also receiver need to wait for time taken to transfer the message across the N/W N/W.. To avoid this a buffer with w ith a capacity to store one message is used at receivers end. This is because in synchronous mode an application module may have at most one message outstanding at a time.
Sender
Buffer to Store One Message
The message buffer may either be located in the kernel¶s address space or in the receiver process¶s address Receiver space. Logical path of message transfer involves two copy operations.
Buffering« continued Unbounded capacity buffer [ asynchronous]
Sender
In asynchronous communication sender never wait for receiver to be ready; causing many pending messages that yet not have been accepted by receiver; and thus requires unbounded unbou nded capacity buffer to store all unreceived messages. Unbounded capacity of buffer is practically impossible. So in practice asynchronous communication uses finite bound buffers. buffers.
Buffer to Store Many Messages
Receiver
Buffering« continued Finite
bound or Multiple-message Multiple-message buffer [ asynchronous]
1. Unsu Unsucce ccessf ssful ul com communi municat cation ion : message transfers simply fail whenever there is no more buffer space.send normally returns an error message to sender : receiver buffer is full, message can¶t be delivered. makes message passing less reliable.
2. Flow-controlled communication : sender is blocked until the receiver accepts some messages. introduces synchronization : may results in deadlocks Communication based on Finite bound buffer implementation is more complex to implement and use than null buffer or single message buffer.
Multidatagram messages All N/W has upper bound on data to be transmitted at a time, known as Maximum Transfer Unit (MTU). If sizeof ( message) > MTU : fragmentMTU( sizeof (eachfragment)<=MTU) : number each fragment serially : if each fragment numbered : send them in packets ( datagram) Packet = control information + message data. If MTU> message data to be send : single-datagram message, else multidatagram messages.
Multidatagram messages « continued At receiver side check packets for sequence number If packet is numbered store it in buffer b uffer Receive all packets with sequence numbers based on common control information. report any error to sender for retransmission of missing packet arrange packets in order accepts packets in order reassemble packets to form complete message acknowledge the sender
Encoding and Decoding A message data should be meaningful to the receiving process. This needs preservation of program objects while transmission from senders address space to receivers address space. It is not not easy to achieve this on homogeneous systems and it is impossible to achieve this on heterogeneous systems. An absolute pointer value loses its meaning when w hen transferred from one process address space to another another.. Different program objects occupy varying amount of storage space. And to make a message meaningful to the receiver re ceiver,, there must be some way receiver to identify which program object is stored where w here in the message buffer and how much space each program object occupies.
Encoding and Decoding «continued Due to problems in transferring program objects in their th eir original form, they get converted to a more mo re suitable stream form for transmission. Conversion process from original to stream form taking place at sender side is known as encoding. Conversion of stream form of program objects in to their original form at receiver side prior to their use is known as Decoding. There are two types of representation used for encoding encodin g and decoding. Tagged Tagged and untagged representation.
Encoding and Decoding «continued Tagged representation : type of each program object along with w ith its value is encoded in the message. Receiving process checks the type of each program object in message due to self-describing nature of coded data. More expensive than untagged representation.
Untagged representation representation : message data only contains program objects. No information about type of any program object is given. Receiver must have prior knowledge to decode the received data as coded data format is not self-describing.
Process Addressing Addressing ( Naming ) of parties p arties involved in communication is an important issue in message based communication.
Explicit addressing : the process with which communication is desired is explicitly named as a parameter in the communication primitive used. Eg .
Send ( process_id, message) Receive( process_id, message)
In above we are sending/receiving message to/from a process identified using process_id
Process Addressing «continued Implicit Addressing : a process willing to communicate does not explicitly name a process for communication in communication primitive used. Useful in client server communication when client is concerned with service and not the server from set of server-farm, who is going to serve its purpose. This is also known know n as functional addressing as address used is of service and not of process. Eg. Send_any( service_id, message) send message to any process which can provide the service identified by service_id. Receive_any(process_id, message) Receive message from any process and return the process_id on reception r eception of message.
Failure
handling
DS is prone to following follow ing failures. sender
Receiver
Send message request
sender Receiver sender Receiver Send Send message received requestmessage request message
lost lost Request message is Lost either because Of link failure or Because receiver Node may be down At time of request For communication
Send response
Crash
restarted Receiver¶s node Response loss either due Crashed before To down sender node or Receiving request Failed link of For communication comunication
Failure
handling«continued
Client
To cope with these problems, IPC protocols are designed based on the idea of internal retransmissions of messages after time-outs. And prompt return of ACK messages from receiver. Four-message
Server
Request
ACK
IPC protocol for client-server:
1. Cli Client ent send sends s reques requestt messa message ge to serv server er..
Reply
2. On rece recepti ption on of reque request st serve serverr sends sends ACK ACK to client. ACK
3. On proce processi ssing ng of client client¶¶s request request server server sends sends Reply containing results of processing to client.
Blocked state
4. On recept reception ion of reply reply client client sends ACK to server server
Executing state
Failure
handling«continued
Three message reliable IPC protocol for client server :
Client
Server Request
1. Clie Client nt sen sends ds a requ request est me mess ssag age e to server 2. On rec recept eption ion ser server ver pro proces cesses ses the request and prepares a reply and sends it to client; meanwhile client remain blocked. 3. On rec recept eption ion of repl reply y from from ser server ver client resumes its execution and sends a ACK to server server.. Server remain blocked until the ACK from client.
Reply
ACK Blocked state Executing state
Failure
handling«continued
Two message IPC protocol p rotocol for client server communication :
Client
Server Request
1. Clie Client nt sen sends ds a requ request est me mess ssag age e to server and enters in block state for time=time-out. 2. On re rece cept ptio ion n of req reque uest st fro from m client server processes it and prepares a reply and sends it to client. 3. Serve Servers rs ker kerne nell waits waits for for ACK ACK from clients kernel for time=timeout; In absence server retransmits the reply to client.
Reply
Blocked state Executing state
Failure
handling «continued
Unsuccessful Request server execution
client Request message Send request Time-out lost Send request Retransmit request message Time-out
crash
Send request
Retransmit request message Time-out
lost Send request
ACK
Successful request Executions May produce Different results
Send response
Retransmit request message ACK
Send response
Example of fault tolerant communication between client-server
Idempotency & handling duplicate request messages Idempotency means repeatabilty repeatabilty..
Idempotent operation produces same results without any side effects no matter how many times it is performed with the same arguments. Eg. sqrt(64)=8 for any repeated execution.
Nonidempotent Nonidempote nt operations do not necessarily produce the same results when executed repeatedly with the same arguments. Eg. See the following follow ing code Debit(amount) { if ((balance-amount) >=( minbalance)) { balance=balance-amount; return(³success´,balance); } else return(³failure´,balance); } // produces different results for amount=100 for every operation
Nonidempotentt operation Nonidempoten client Send request Send request Time-out
Debit(100) Debit( 100)
Server Minbalance Balance=1000 =200
lost Balance=1000--100 Balance=1000 =900 =9 00
Retransmit Debit(100) Debit(100) crash
Send request
Retransmit Debit(100) Debit(100) Send request Received Balance =700 =700 Desired=900 Desired=9 00
Balance=900 Balance=9 00--100 lost ACK( Success,8 Success,800) 00) =800 =8 00 Retransmit Debit(100) Debit(100) ACK( Success,700) Success,700)
Balance=800--100 Balance=800 =700 =7 00
Example of nonidempotent operation without any measures for f ault detection
Idempotency & handling duplicate request messages«continued Problem of nonidempotency can be solved using by avoiding orphan executions ( executions of client request done at server side, results of which won¶t reach to client and so may client keep retransmitting the same request; yielding in wrong result(s) ) of
requests from client. This can be achieved by using exactly-once semantics, semantics, which ensures that only one execution of server¶s server ¶s operation is performed for one request. Requires identification of orphan executions. Primitives based on exactly-once semantics are most desired but difficult to implement.
Exactly-once semantics Request ID
uses unique identifier for identifier for every request that a client makes. Sets up a reply cache in the kernel¶s address space on the server machine to cache replies.
Execution Status Executed / Before forwarding a request to server server,, kernel of server Not machine checks to see if a reply already exists in reply received
cache or not.
If yes, yes, that means request is duplicate and already executed. So previously computed result is extracted from reply cache and new response is send to client. If no, no, request is forwarded to appropriate approp riate server server by kernel.
Result obtained
Reply-cache contents of Exactlyonce semantics
Exactly-once operation
Minbalance Server =200 Balance=1000 client Debit(100) Debit( 100) Send request-1 request-1 lost Reqest-1 Reqest1 Send request-1 request-1 Retransmit Debit(100) Debit(100) Balance=1000 Balance= 1000--100 Time-out =900 =9 00 crash Send request-1 request-1 Retransmit Debit(100) Debit(100)
Send request-1 request-1 Received Balance =900 =900 Desired=900 Desired=9 00
Reqest-1 Reqest-1 lost already executed ACK( Success,9 Success,900) 00) Balance=900 Balance=9 00 Retransmit Debit(100) Debit(100) Reqest-1 Reqest1 already executed ACK( Success,900) Success,900) Balance=900 Balance=9 00
Example of exactly-once operation
Lost and out of sequence packets Keeping track of lost and out o ut of sequence packets is a issue in multidatagram messages. In multidatagram message transmission is said to be complete iff all packets are received by a process to which w hich it is sent. Simple way is acknowledge each packet p acket separately. separately. ( stopand-wait protocol) . This leads to communication overhead. Better approach is sending one acknowledgement ackno wledgement for complete multidatagram message when all packets get received at receiver end.( end.( blast protocol)
Lost and out of sequence packets «continued In blast protocol a node crash or a link failure may lead to following problems: one or more packets of multidatagram message are lost in communication the packets are received out of sequence by the receiver. Efficient mechanism is to use a bitmap to identify the packets of a message. In this approach header part of each eac h packet consists of two of two extra fields, fields, one of which w hich specifies the total number of packets in multidatagram message and other is the bitmap field that specifies the position of this packect in the complete message. Sender Receiver Type of address address message
Message ID
No of packets
Packet Sequence no Rest of Or bitmap message
Lost and out of sequence packets «continued In multidatagram message a suitable buffer is set aside by receiver rece iver using No_of_packets field in first packet. Bitmap field gives information where exactly a received packet must be stored in set aside buffer for the particular message.
selective repeat : After time-out, if all packets are not received, r eceived, Bitmap ids of nonreceived packets are communicated with the sender. On receiving this information sender sends only those packets that have not been received by receiver. The process get repeated until transmission of multidatagram message won¶t get completed. i.e. when all packets of message get received by receiver this retransmission of select packets stops.
Sender of multidatagram multidatagram Message Mess age that consists of Five packets
Receiver of multidatagram Message Send request message
(M1,5,P1)
lost Packets of The Response Message
Time-out (M1,5,P2)
(M1,5,P3) Place this packet In position 3
(M1,5,P4)
lost (M1,5,P5)
Create buffer for five Packets and store this Packet in position 2
Buffer For 5 Packets
Place this packet In position 5
Missing packets info. (M1,5,P1)
Resend missing packets (M1,5,P4)
ACK
Retransmit request Retransmit For missing packets
Place this packet In position 1 Place this packet In position 4 M1- Messea Messeage ge ID=1 5=packets in M1 Send ACK P1,P2«= Ith packet
1 2 3 4 5
Use of bitmap to keep track of lost and out of sequence packets in multidatagram message transmission
Group communication Elementary form of communication is one-to-one or unicast communication. DS require group communication (in addition to unicast) those are 1. One to many ( single sender and multiple multiple recei receiver ver ) multicast ( no of receivers are predefined and known) broadcast(( no of receivers are unknown and broadcast undefined) 2. Many to one ( multiple multiple sender sender and single single receiv receiver) er) 3. Many to many many ( multiple multiple sender and multiple multiple receiv receiver) er)
Group Management In one to many communication, Receiver processes of a message form a group; closed and open. A closed group is one in which only the members of the group can send a message to the the group. In close group, group, an outside process cannot send a message to group as a whole, although it may send a message to an individual member of group. Open group is one in which w hich any process in system can send message to group as a whole. Usage of close/open group is application specific and any flexible message passing system must support both types of groups. Facility
of Dynamic of Dynamic creation and deletion of groups is must. And a process must be allowed to enter or leave the group at any time.
Group Managemen Managementt Simple mechanism for this is to use centralized group server to manage groups and their membership information. Centralized server approach suffers from the problems of poor reliability and poor scalability common to all centralized systems. Replication of group servers adds communication overhead in keeping group information of all group servers consistent.
Group addressing Two level naming scheme is normally used. High level group name is in ASCII string that is independent of the location information of the processes in group. Low level group name depends to a large extent on underlying hardware. Special N/W address to which multiple machines can listen is called as multicast address, address, possible on some N/Ws. Packet sent to multicast address is delivered to the machines linked to multicast address. N/Ws, which can not create multicast address may have broadcasting facility by declaring a particular address such as a s zero as broadcast address.. Packet sent to broadcast address is delivered to all address machines in entire N/W N/W..
Message delivery to receiver processes User uses high level group names in programs. Centralized group server maintains mapping between high and low level group addresses ( names) along with w ith the process identifiers of all processes for each group. Kernels of sender, receiver , and group server does appropriate mapping and unmapping operations with rest of other operations like encoding/decoding to deliver message to correctly to receiver. Sender is not at all aware of either size of group or actual mechanism used for group addressing. Sender simply sends the message to a group by specifying its high level name, and the OS takes the responsibility r esponsibility to deliver the message to all the group members.
Buffered and unbuffered multicast Multicasting is asynchronous communication mechanism due to following reasons. 1. It is is unrealisti unrealistic c to expect expect a sending sending proces process s to wait until until all all the receiv receiving ing processes that belong to multicast multicast group are ready to receive rece ive the multicast message. 2. Sending Sending process may not be aware aware of all all receivi receiving ng processes processes that belon belong g to the multicast group. For
an unbuffered multicast, multicast, the message is not buffered for the receiving process and is lost if receiving process is not in a state ready to receive it. So the message is received only by those processes of multicast group that are ready to receive it.
or a a For
buffered multicast, multicast, the message is buffered for the receiving processes, so each process of multicast group eventually eventually receive the message.
Same is true for broadcasting communication.
Send to all and bulletin-board semantics
Send-to-all semantics : a copy of message is sent to each process of the multicast group and message is buffered until it is accepted by the process. Following
two factors are ignored by send-to-all semantics.
Relevance of a message to a particular receiver may depend on the receiver¶s state. Message not accepted within a certain time after transmission may no longer be useful; their value may depend on sender¶s state.
Bulletin-board semantics : a message to be multicast is addressed to a channel instead of being sent to every individual process of multicast group. Receiving process copies message from channel instead of removing it when it makes a receive request on the channel. Process that have receive access right on the channel constitute the multicast group.Thus channel acts as a bulletin-board.
Flexible
reliability in multicast communication
1. The 00-re reli liabi abili lity ty : no response is expected by sender from any of the receiver. Useful in asynchronous multicast. 2. The 11-re reli liabi abili lity ty : sender expects the reply from any of receivers. 3. m-out m-out-o -off-nn-re reli liabl able e : the multicast group consists of n receivers and sender expects a response from m ( 1