there is one Performance Tuning tips in coherence wiki. besides that, you should use the JMX client tool to make sure you applied the correct settings.
for example, when you get an warning like buffersize is too small, after you change the setting, make sure the Node pick up this settings.
UnicastUdpSocket failed to set receive buffer size to 1428 packets (2096304 bytes); actual size is 89 packets (131071 bytes). Consult your OS documentation regarding increasing the maximum socket buffer size. Proceeding with the actual value may cause sub-optimal performance. |
run the jconsole, click the Node mbeans, you should be able to see the setting get pickedup .
if you get the Communication Delay error , there must be something wrong with the Network Or the Remote Node.
Experienced a 4172 ms communication delay (probable remote GC) with Member(Id=7, Timestamp=2006-10-20 12:15:47.511, Address=192.168.0.10:8089, MachineId=13838); 320 packets rescheduled, PauseRate=0.31, Threshold=512 |
Check the following attributes if your guess is a network issue. ( Remote Node IS Fine, No CPU spike, No big GC activities)
then go to Member 7, Check its statistics. (CPU,network)
PacketPublisher: Cpu=641ms (0.0%), PacketsSent=4945, PacketsResent=5, SuccessRate=0.9989, Throughput=7714pkt/sec PacketSpeaker : Cpu=0ms (0.0%), PacketsSent=63, Bursts=3, Throughput=0pkt/sec, Queued=0 PacketReceiver : PacketsReceived=5292, PacketsRepeated=2, SuccessRate=0.9996 TcpRing : TotalPings=3382, Timeouts=0, Failures=0, SuccessRate=1.0
If you find something wrong with Node 7, Run a Performance monitor make sure there is no other CPU hungry application. if another application is consuming all the CPU. then all the Nodes on this server can’t be communicated by other nodes. and vice versa.
here is a summary of all those attributes and its meaning.
BufferPublishSize | Integer | RW | The buffer size of the unicast datagram socket used by the Publisher, measured in the number of packets. Changing this value at runtime is an inherently unsafe operation that will pause all network communications and may result in the termination of all cluster services. |
BufferReceiveSize | Integer | RW | The buffer size of the unicast datagram socket used by the Receiver, measured in the number of packets. Changing this value at runtime is an inherently unsafe operation that will pause all network communications and may result in the termination of all cluster services. |
BurstCount | Integer | RW | The maximum number of packets to send without pausing. Anything less than one (e.g. zero) means no limit. |
BurstDelay | Integer | RW | The number of milliseconds to pause between bursts. Anything less than one (e.g. zero) is treated as one millisecond. |
CpuCount | Integer | RO | Number of CPU cores for the machine this Member is running on. |
FlowControlEnabled | Boolean | RO | Indicates whether or not FlowControl is enabled. |
Id | Integer | RO | The short Member id that uniquely identifies the Member at this point in time and does not change for the life of this Member. |
LoggingDestination | String | RO | The output device used by the logging system. Valid values are stdout, stderr, jdk, log4j, or a file name. |
LoggingFormat | String | RW | Specifies how messages will be formatted before being passed to the log destination |
LoggingLevel | Integer | RW | Specifies which logged messages will be output to the log destination. Valid values are non-negative integers or -1 to disable all logger output. |
LoggingLimit | Integer | RW | The maximum number of characters that the logger daemon will process from the message queue before discarding all remaining messages in the queue. Valid values are integers in the range [0...]. Zero implies no limit. |
MachineId | Integer | RO | The Member`s machine Id. |
MachineName | String | RO | A configured name that should be the same for all Members that are on the same physical machine, and different for Members that are on different physical machines. |
MemberName | String | RO | A configured name that must be unique for every Member. |
MemoryAvailableMB | Integer | RO | The total amount of memory in the JVM available for new objects in MB. |
MemoryMaxMB | Integer | RO | The maximum amount of memory that the JVM will attempt to use in MB. |
MulticastAddress | String | RO | The IP address of the Member`s MulticastSocket for group communication. |
MulticastEnabled | Boolean | RO | Specifies whether or not this Member uses multicast for group communication. If false, this Member will use the WellKnownAddresses to join the cluster and point-to-point unicast to communicate with other Members of the cluster. |
MulticastPort | Integer | RO | The port of the Member`s MulticastSocket for group communication. |
MulticastTTL | Integer | RO | The time-to-live for multicast packets sent out on this Member`s MulticastSocket. |
MulticastThreshold | Integer | RW | The percentage (0 to 100) of the servers in the cluster that a packet will be sent to, above which the packet will be multicasted and below which it will be unicasted. |
NackEnabled | Boolean | RO | Indicates whether or not the early packet loss detection protocol is enabled. |
NackSent | Long | RO | The total number of NACK packets sent since the node statistics were last reset. |
PacketDeliveryEfficiency | Float | RO | The efficiency of packet loss detection and retransmission. A low efficiency is an indication that there is a high rate of unnecessary packet retransmissions. |
PacketsBundled | Long | RO | The total number of packets which were bundled prior to transmission. The total number of network transmissions is equal to (PacketsSent - PacketsBundled). |
PacketsReceived | Long | RO | The number of packets received since the node statistics were last reset. |
PacketsRepeated | Long | RO | The number of duplicate packets received since the node statistics were last reset. |
PacketsResent | Long | RO | The number of packets resent since the node statistics were last reset. A packet is resent when there is no ACK received within a timeout period. |
PacketsResentEarly | Long | RO | The total number of packets resent ahead of schedule. A packet is resent ahead of schedule when there is a NACK indicating that the packet has not been received. |
PacketsResentExcess | Long | RO | The total number of packet retransmissions which were later proven unnecessary. |
PacketsSent | Long | RO | The number of packets sent since the node statistics were last reset. |
Priority | Integer | RO | The priority or "weight" of the Member; used to determine tie-breakers. |
ProcessName | String | RO | A configured name that should be the same for Members that are in the same process (JVM), and different for Members that are in different processes. If not explicitly provided, for processes running with JRE 1.5 or higher the name will be calculated internally as the Name attribute of the system RuntimeMXBean, which normally represents the process identifier (PID). |
ProductEdition | String | RO | The product edition this Member is running. Possible values are: Standard Edition (SE), Enterprise Edition (EE), Grid Edition (GE). |
PublisherPacketUtilization | Float | RO | The publisher packet utilization for this cluster node since the node socket was last reopened. This value is a ratio of the number of bytes sent to the number that would have been sent had all packets been full. A low utilization indicates that data is not being sent in large enough chunks to make efficient use of the network. |
PublisherSuccessRate | Float | RO | The publisher success rate for this cluster node since the node statistics were last reset. Publisher success rate is a ratio of the number of packets successfully delivered in a first attempt to the total number of sent packets. A failure count is incremented when there is no ACK received within a timeout period. It could be caused by either very high network latency or a high packet drop rate. |
RackName | String | RO | A configured name that should be the same for Members that are on the same physical "rack" (or frame or cage), and different for Members that are on different physical "racks". |
ReceiverPacketUtilization | Float | RO | The receiver packet utilization for this cluster node since the socket was last reopened. This value is a ratio of the number of bytes received to the number that would have been received had all packets been full. A low utilization indicates that data is not being sent in large enough chunks to make efficient use of the network. |
ReceiverSuccessRate | Float | RO | The receiver success rate for this cluster node since the node statistics were last reset. Receiver success rate is a ratio of the number of packets successfully acknowledged in a first attempt to the total number of received packets. A failure count is incremented when a re-delivery of previously received packet is detected. It could be caused by either very high inbound network latency or lost ACK packets. |
RefreshTime | Date | RO | The timestamp when this model was last retrieved from a corresponding node. For local servers it is the local time. |
ResendDelay | Integer | RW | The minimum number of milliseconds that a packet will remain queued in the Publisher`s re-send queue before it is resent to the recipient(s) if the packet has not been acknowledged. Setting this value too low can overflow the network with unnecessary repetitions. Setting the value too high can increase the overall latency by delaying the re-sends of dropped packets. Additionally, change of this value may need to be accompanied by a change in SendAckDelay value. |
RoleName | String | RO | A configured name that can be used to indicate the role of a Member to the application. While managed by Coherence, this property is used only by the application. |
SendAckDelay | Integer | RW | The minimum number of milliseconds between the queueing of an Ack packet and the sending of the same. This value should be not more then a half of the ResendDelay value. |
SendQueueSize | Integer | RO | The number of packets currently scheduled for delivery. This number includes both packets that are to be sent immediately and packets that have already been sent and awaiting for acknowledgment. Packets that do not receive an acknowledgment within ResendDelay interval will be automatically resent. |
SiteName | String | RO | A configured name that should be the same for Members that are on the same physical site (e.g. data center), and different for Members that are on different physical sites. |
SocketCount | Integer | RO | Number of CPU sockets for the machine this Member is running on. |
Statistics | String | RO | Statistics for this cluster node in a human readable format. |
TcpRingFailures | Long | RO | The number of recovered TcpRing disconnects since the node statistics were last reset. A recoverable disconnect is an abnormal event that is registered when the TcpRing peer drops the TCP connection, but recovers after no more then maximum configured number of attempts.This value will be -1 if the TcpRing is disabled. |
TcpRingTimeouts | Long | RO | The number of TcpRing timeouts since the node statistics were last reset. A timeout is a normal, but relatively rare event that is registered when the TcpRing peer did not ping this node within a heartbeat interval. This value will be -1 if the TcpRing is disabled. |
Timestamp | Date | RO | The date/time value (in cluster time) that this Member joined the cluster. |
TrafficJamCount | Integer | RW | The maximum total number of packets in the send and resend queues that forces the publisher to pause client threads. Zero means no limit. |
TrafficJamDelay | Integer | RW | The number of milliseconds to pause client threads when a traffic jam condition has been reached. Anything less than one (e.g. zero) is treated as one millisecond. |
UnicastAddress | String | RO | The IP address of the Member`s DatagramSocket for point-to-point communication. |
UnicastPort | Integer | RO | The port of the Member`s DatagramSocket for point-to-point communication. |
WeakestChannel | Integer | RO | The id of the cluster node to which this node is having the most difficulty communicating, or -1 if none is found. A channel is considered to be weak if either the point-to-point publisher or receiver success rates are below 1.0. |
WellKnownAddresses | String[] | RO | An array of well-known socket addresses that this Member uses to join the cluster. |
No comments:
Post a Comment