Coherence Performance tuning, Network , Communication Delay

there is one Performance Tuning tips in coherence wiki. besides that, you should use the JMX client tool to make sure you applied the correct settings.
   for example, when you get an warning like buffersize is too small, after you change the setting, make sure the Node pick up this settings.

UnicastUdpSocket failed to set receive buffer size to 1428 packets (2096304 bytes); actual size is 89 packets (131071 bytes). Consult your OS documentation regarding increasing the maximum socket buffer size. Proceeding with the actual value may cause sub-optimal performance.

run the jconsole, click the Node mbeans, you should be able to see the setting get pickedup .


if you get the Communication Delay error , there must be something wrong with the Network Or the Remote Node.

Experienced a 4172 ms communication delay (probable remote GC) with Member(Id=7, Timestamp=2006-10-20 12:15:47.511, Address=, MachineId=13838); 320 packets rescheduled, PauseRate=0.31, Threshold=512

Check the following attributes if your guess is a network issue. ( Remote Node IS Fine, No CPU spike, No big GC activities)


  then go to Member 7, Check its statistics.  (CPU,network)

PacketPublisher: Cpu=641ms (0.0%), PacketsSent=4945, PacketsResent=5, SuccessRate=0.9989, Throughput=7714pkt/sec PacketSpeaker  : Cpu=0ms (0.0%), PacketsSent=63, Bursts=3, Throughput=0pkt/sec, Queued=0 PacketReceiver : PacketsReceived=5292, PacketsRepeated=2, SuccessRate=0.9996 TcpRing        : TotalPings=3382, Timeouts=0, Failures=0, SuccessRate=1.0

If you find something wrong with Node 7, Run a Performance monitor make sure there is no other CPU hungry application. if another application is consuming all the CPU. then all the Nodes on this server can’t be communicated by other nodes. and vice versa.

here is a summary of all those attributes and its meaning.

BufferPublishSize Integer RW The buffer size of the unicast datagram socket used by the Publisher, measured in the number of packets. Changing this value at runtime is an inherently unsafe operation that will pause all network communications and may result in the termination of all cluster services.
BufferReceiveSize Integer RW The buffer size of the unicast datagram socket used by the Receiver, measured in the number of packets. Changing this value at runtime is an inherently unsafe operation that will pause all network communications and may result in the termination of all cluster services.
BurstCount Integer RW The maximum number of packets to send without pausing. Anything less than one (e.g. zero) means no limit.
BurstDelay Integer RW The number of milliseconds to pause between bursts. Anything less than one (e.g. zero) is treated as one millisecond.
CpuCount Integer RO Number of CPU cores for the machine this Member is running on.
FlowControlEnabled Boolean RO Indicates whether or not FlowControl is enabled.
Id Integer RO The short Member id that uniquely identifies the Member at this point in time and does not change for the life of this Member.
LoggingDestination String RO The output device used by the logging system. Valid values are stdout, stderr, jdk, log4j, or a file name.
LoggingFormat String RW Specifies how messages will be formatted before being passed to the log destination
LoggingLevel Integer RW Specifies which logged messages will be output to the log destination. Valid values are non-negative integers or -1 to disable all logger output.
LoggingLimit Integer RW The maximum number of characters that the logger daemon will process from the message queue before discarding all remaining messages in the queue. Valid values are integers in the range [0...]. Zero implies no limit.
MachineId Integer RO The Member`s machine Id.
MachineName String RO A configured name that should be the same for all Members that are on the same physical machine, and different for Members that are on different physical machines.
MemberName String RO A configured name that must be unique for every Member.
MemoryAvailableMB Integer RO The total amount of memory in the JVM available for new objects in MB.
MemoryMaxMB Integer RO The maximum amount of memory that the JVM will attempt to use in MB.
MulticastAddress String RO The IP address of the Member`s MulticastSocket for group communication.
MulticastEnabled Boolean RO Specifies whether or not this Member uses multicast for group communication. If false, this Member will use the WellKnownAddresses to join the cluster and point-to-point unicast to communicate with other Members of the cluster.
MulticastPort Integer RO The port of the Member`s MulticastSocket for group communication.
MulticastTTL Integer RO The time-to-live for multicast packets sent out on this Member`s MulticastSocket.
MulticastThreshold Integer RW The percentage (0 to 100) of the servers in the cluster that a packet will be sent to, above which the packet will be multicasted and below which it will be unicasted.
NackEnabled Boolean RO Indicates whether or not the early packet loss detection protocol is enabled.
NackSent Long RO The total number of NACK packets sent since the node statistics were last reset.
PacketDeliveryEfficiency Float RO The efficiency of packet loss detection and retransmission. A low efficiency is an indication that there is a high rate of unnecessary packet retransmissions.
PacketsBundled Long RO The total number of packets which were bundled prior to transmission. The total number of network transmissions is equal to (PacketsSent - PacketsBundled).
PacketsReceived Long RO The number of packets received since the node statistics were last reset.
PacketsRepeated Long RO The number of duplicate packets received since the node statistics were last reset.
PacketsResent Long RO The number of packets resent since the node statistics were last reset. A packet is resent when there is no ACK received within a timeout period.
PacketsResentEarly Long RO The total number of packets resent ahead of schedule. A packet is resent ahead of schedule when there is a NACK indicating that the packet has not been received.
PacketsResentExcess Long RO The total number of packet retransmissions which were later proven unnecessary.
PacketsSent Long RO The number of packets sent since the node statistics were last reset.
Priority Integer RO The priority or "weight" of the Member; used to determine tie-breakers.
ProcessName String RO A configured name that should be the same for Members that are in the same process (JVM), and different for Members that are in different processes. If not explicitly provided, for processes running with JRE 1.5 or higher the name will be calculated internally as the Name attribute of the system RuntimeMXBean, which normally represents the process identifier (PID).
ProductEdition String RO The product edition this Member is running. Possible values are: Standard Edition (SE), Enterprise Edition (EE), Grid Edition (GE).
PublisherPacketUtilization Float RO The publisher packet utilization for this cluster node since the node socket was last reopened. This value is a ratio of the number of bytes sent to the number that would have been sent had all packets been full. A low utilization indicates that data is not being sent in large enough chunks to make efficient use of the network.
PublisherSuccessRate Float RO The publisher success rate for this cluster node since the node statistics were last reset. Publisher success rate is a ratio of the number of packets successfully delivered in a first attempt to the total number of sent packets. A failure count is incremented when there is no ACK received within a timeout period. It could be caused by either very high network latency or a high packet drop rate.
RackName String RO A configured name that should be the same for Members that are on the same physical "rack" (or frame or cage), and different for Members that are on different physical "racks".
ReceiverPacketUtilization Float RO The receiver packet utilization for this cluster node since the socket was last reopened. This value is a ratio of the number of bytes received to the number that would have been received had all packets been full. A low utilization indicates that data is not being sent in large enough chunks to make efficient use of the network.
ReceiverSuccessRate Float RO The receiver success rate for this cluster node since the node statistics were last reset. Receiver success rate is a ratio of the number of packets successfully acknowledged in a first attempt to the total number of received packets. A failure count is incremented when a re-delivery of previously received packet is detected. It could be caused by either very high inbound network latency or lost ACK packets.
RefreshTime Date RO The timestamp when this model was last retrieved from a corresponding node. For local servers it is the local time.
ResendDelay Integer RW The minimum number of milliseconds that a packet will remain queued in the Publisher`s re-send queue before it is resent to the recipient(s) if the packet has not been acknowledged. Setting this value too low can overflow the network with unnecessary repetitions. Setting the value too high can increase the overall latency by delaying the re-sends of dropped packets. Additionally, change of this value may need to be accompanied by a change in SendAckDelay value.
RoleName String RO A configured name that can be used to indicate the role of a Member to the application. While managed by Coherence, this property is used only by the application.
SendAckDelay Integer RW The minimum number of milliseconds between the queueing of an Ack packet and the sending of the same. This value should be not more then a half of the ResendDelay value.
SendQueueSize Integer RO The number of packets currently scheduled for delivery. This number includes both packets that are to be sent immediately and packets that have already been sent and awaiting for acknowledgment. Packets that do not receive an acknowledgment within ResendDelay interval will be automatically resent.
SiteName String RO A configured name that should be the same for Members that are on the same physical site (e.g. data center), and different for Members that are on different physical sites.
SocketCount Integer RO Number of CPU sockets for the machine this Member is running on.
Statistics String RO Statistics for this cluster node in a human readable format.
TcpRingFailures Long RO The number of recovered TcpRing disconnects since the node statistics were last reset. A recoverable disconnect is an abnormal event that is registered when the TcpRing peer drops the TCP connection, but recovers after no more then maximum configured number of attempts.This value will be -1 if the TcpRing is disabled.
TcpRingTimeouts Long RO The number of TcpRing timeouts since the node statistics were last reset. A timeout is a normal, but relatively rare event that is registered when the TcpRing peer did not ping this node within a heartbeat interval. This value will be -1 if the TcpRing is disabled.
Timestamp Date RO The date/time value (in cluster time) that this Member joined the cluster.
TrafficJamCount Integer RW The maximum total number of packets in the send and resend queues that forces the publisher to pause client threads. Zero means no limit.
TrafficJamDelay Integer RW The number of milliseconds to pause client threads when a traffic jam condition has been reached. Anything less than one (e.g. zero) is treated as one millisecond.
UnicastAddress String RO The IP address of the Member`s DatagramSocket for point-to-point communication.
UnicastPort Integer RO The port of the Member`s DatagramSocket for point-to-point communication.
WeakestChannel Integer RO The id of the cluster node to which this node is having the most difficulty communicating, or -1 if none is found. A channel is considered to be weak if either the point-to-point publisher or receiver success rates are below 1.0.
WellKnownAddresses String[] RO An array of well-known socket addresses that this Member uses to join the cluster.

