Cluster Gossip
A gossip protocol is a procedure or process of computer peer-to-peer communication that is based on the way epidemics spread.Some distributed systems use peer-to-peer gossip to ensure that data is disseminated to all members of a group. Some ad-hoc networks have no central registry and the only way to spread common data is to rely on each member to pass it along to their neighbours. The term epidemic protocol is sometimes used as a synonym for a gossip protocol, as gossip spreads information in a manner similar to the spread of a virus in a biological community.
Gossip in Proto.Actor
GossipState
Each member has a GossipState
, this state is made up of a dictionary of MemberId
to GossipMemberState
objects
GossipMemberState
objects are in turn dictionaries of keys, system and user defined, mapping to a Protobuf.Any
state.
This means we can have a GossipMemberState
object for “member123”.
Inside this member state object, we can have entries for known payloads.
e.g. “topology”, “heartbeat”, “banned-members” etc.
This state is replicated from the owner member to other members of the cluster via gossip. State is always defined per member, allowing us to know what other members see and know.
To get a complete view of say banned-members, we could take the banned-member entry from each member, and merge those results. giving us an eventual consistent set of banned members.
Committed Offsets
Each member has a dictionary of committed offsets for itself, and all other known members.
These offsets represent the highest SequenceNumber
this node has seen for a given key.
Keys are made up of MemberId
and StateEntryKey
e.g.
- “Member1234.topology” : 123
- “Member1234.heartbeat” : 567
During the gossip transmission, the sender member will transfer a delta of all state changes from what the sender knows that the target knows, upto what the sender believes the target doesn’t know.
The sender member can know that some state has been transmitted to the target, we know this via the CommittedOffsets
, the sender does not however know if any of the state after this point, has already been transmitted by any other node, unless they have gossipped this to the sender that is.
Gossip fan-out
Gossip between member nodes occur at intervals and target members are picked at random with a selection of ClusterConfig.GossipFanout
number of members.
graph LR
Topology(Topology)
class Topology message
ClusterProvider
class ClusterProvider red
Gossip2(Gossip)
class Gossip2 message
Gossip4(Gossip)
class Gossip4 message
Gossip5(Gossip)
class Gossip5 message
ClusterProvider --- Topology
Topology --> Member1
Member1 --- Gossip2 --> Member2
Member1 --> empty --> Member3
Member1 --- Gossip4 --> Member4
Member1 --- Gossip5 --> Member5
Member1 --> empty2 --> Member6
Sending user state
To set cluster state, there is a Cluster.Gossip.SetKey(key, value)
method.
This method takes a key for the state you wish to set, e.g. “MyState” and a value, in the form of a Protobuf.Any
message.
Once set, the cluster will start to sync this information over to other cluster members.
Query gossip state
//get the heartbeat entry in the gossip state
var memberHeartbeats = await system.Cluster().Gossip.GetState<MemberHeartbeat>(GossipKeys.Heartbeat);
//create a list with tuple (MemberId, Kind, Count)
var stats = (from x in memberHeartbeats
let memberId = x.Key
from y in x.Value.ActorStatistics.ActorCount
select (MemberId: memberId, Kind: y.Key, Count: y.Value))
.ToList();
Subscribing to gossip state
system
.EventStream
.Subscribe<GossipUpdate>(
x => x.Key == GossipKeys.MemberHeartbeat,
update => {....});
Configuring cluster gossip interval
The default interval between gossip updates is 300 ms. Sometimes we might need to increase this interval, based on the overall load on the system. So, we can configure this interval using ClusterConfig.WithGossipInterval(interval)
, where interval
is of type TimeSpan
. But setting the interval
to a higher value might lead to some unexpected results, because it would take interval
amount of time for all the members to come to a consensus, in case of updates in the actor system.
Configuring cluster gossip request timeout
The default timeout for sending the gossip to other members is 1500 ms. This can also be configured to the desired value using ClusterConfig.WithGossipRequestTimeout(timeout)
, where timeout
is of type TimeSpan
.