Using Replicated Entities
Conflict-free Replicated Data Types (CRDTs) are data structures that can be used to support highly available and scalable sharing of state in a distributed system. Cloudstate entities that use CRDTs as their persistence model are called Replicated Entities. To use Replicated Entities, you must be able to represent data using the available replicated types.
Replicated entity data is replicated to every node in the system. Each node can read and update without requiring any coordination from other nodes. If two or more nodes modify a Replicated Entity’s state concurrently, the modifications can be merged together. Replicated entity data structures guarantee that all nodes eventually agree on what the current state of that entity is.
Replicated entities are useful in cases where strong consistency is not needed—only eventual consistency is required. There is no guarantee that an update made on one node will immediately be visible on all nodes. Rather, such an update will eventually be visible on all nodes.
They work well for very low latency reads and writes. Reading data requires no network communication, since the value is simply read from the node’s local replica. Writing also requires no network communication, since writing only updates the node’s local replica. Updates to the local replica are later replicated to other nodes.
Replicated entities can also help meet high availability requirements. If one or more nodes become unresponsive in the cluster, this in no way impacts other nodes ability to read and update the state they hold. In the case of network partitions, any updates performed on a Replicated Entity will be replicated to the once unreachable nodes once the network partition is resolved.
Finally, Replicated Entities excel in some cases where very high throughput writes are required. This depends on the particular data type being used. Counters and votes, for example, can support very high throughput writes because they don’t need to replicate every single update to every node, it is sufficient to just replicate their updates occasionally.
The following data types are available to Replicated Entities:
A Grow-only Counter, or GCounter, is a counter that can only be incremented. It works by tracking a separate counter value for each node, and taking the sum of the values for all the nodes to get the current counter value. Since each node only updates its own counter value, each node can coordinate those updates to ensure they are consistent. If the merge function sees two different values for the same node, it simply takes the highest value, because that has to be the most recent value that the node published.
A Positive-Negative Counter, or PNCounter, is a counter that can both be incremented and decremented. It works by combining two GCounters, a positive one, that tracks increments, and a negative one, that tracks decrements. The final counter value is computed by subtracting the negative GCounter from the positive GCounter.
A Grow-only Set, or GSet, is a set that can only have items added to it. A GSet is a very simple Replicated Entity data type, its merge function is defined by taking the union of the two GSets being merged.
An Observed-Removed Set, or ORSet, is a set that can have items both added and removed from it. It is implemented by maintaining a set of unique tags for each element which are generated on addition into the set. When an element is removed, all the tags that that node currently observes are added to the removal set, so as long as there haven’t been any new additions that the node hasn’t seen when it removed the element, the element will be removed. The Cloudstate reference implementation cleans up tombstones.
A Flag is a boolean value that starts as false, and can be set to true. Once set to true, it cannot be set back to false. A flag is a very simple Replicated Entity data type, the merge function is simply a boolean or over the two flag values being merged.
A Last-Write-Wins Register, or LWWRegister, is a data type that can hold any value, along with a clock value and node id to indicate when it was updated by which node. If two nodes have two different versions of the value, the one with the highest clock value wins. If the clock values are equal, then a stable function on the nodes is used to determine it (eg, the node with the lowest address). Note that LWWRegisters do not support partial updates of their values. If the register holds a person object, and one node updates the age property, while another concurrently updates the name property, only one of those updates will eventually win. By default, LWWRegister’s are vulnerable to clock skew between nodes. Cloudstate supports optionally providing a custom clock value should a more trustworthy ordering for updates be available.
An Observed-Removed Map, or ORMap, is similar to an ORSet, with the addition that the values of the set serve as keys for a map, and the values of the map are themselves, Replicated Entities. When a value for the same key in an ORMap is modified concurrently on two different nodes, the values from the two nodes are merged together.
A Vote is a Replicated Entity which allows nodes to vote on a condition. It’s similar to a GCounter, each node has its own counter, and an odd value is considered a vote for the condition, while an even value is considered a vote against. The result of the vote is decided by taking the votes of all nodes that are currently members of the cluster (when a node leave, its vote is discarded). Multiple decision strategies can be used to decide the result of the vote, such as at least one, majority and all.
Replicated entities support handling server streamed calls when the gRPC service call marks the return type as
streamed. When a service implementation receives a streamed message, it is allowed to update the Replicated Entity, on two occasions - when the call is first received, and when the client cancels the stream. If it wishes to make updates at other times, it can do so by emitting effects with the streamed messages that it sends down the stream.
A service implementation can send a message down a stream in response to anything, however the Cloudstate supplied support libraries only allow sending messages in response to the Replicated Entities changing. In this way, use cases that require monitoring the state of a Replicated Entity can be implemented.
By default, updates are performed on the local node, and are asynchronously replicated to other nodes. Updates can also be performed at other write consistencies, Cloudstate supports majority and all. Note that when non-local write consistencies are used, then many of the advantages of Replicated Entities are no longer available, writes will require network communication and can be impacted by failures on other nodes. Hence, non local write consistencies should be used with caution, they are primarily useful in situations where the rare case of a write being lost due to a node crashing before it replicates it to another node is unacceptable.