Showing posts with label Oracle Coherence. Show all posts
Showing posts with label Oracle Coherence. Show all posts

Wednesday, February 16, 2011

Decode Oracle Coherence cache extend protocol, Tangosol.Net.RequestTimeoutException

on day, C# applications which use coherence cache extend get an Tangosol.Net.RequestTimeoutException. I spent a lot time to figure out what happened? what’s the potential reason of the requestimeout. whole stack trace

Tangosol.Net.RequestTimeoutException: request timed out after 30000 millis       at Tangosol.Net.Messaging.Impl.Request.RequestStatus.get_Response()

at Tangosol.Net.Messaging.Impl.Request.RequestStatus.WaitForResponse(Int64 millis)

at Tangosol.Net.Messaging.Impl.Channel.Request(IRequest request, Int64 millis)

at Tangosol.Net.Messaging.Impl.Channel.Request(IRequest request)

at Tangosol.Net.Impl.RemoteNamedCache.BinaryNamedCache.get_Item(Object key)

at Tangosol.Util.ConverterCollections.ConverterDictionary.get_Item(Object key)

at Tangosol.Net.Impl.RemoteNamedCache.get_Item(Object key)

at Tangosol.Net.Impl.SafeNamedCache.get_Item(Object key)

at CacheClient.CustomerProfile.button12_Click(Object sende



Question 1, is it a network problem? C# application will talk to proxy node (JVM on another machine), then proxy node talk to storage node to get the data back.
then I capture the network traffic between C# application and the Proxy jvm node. you can use tcpview to figure out which proxy node has the TCP connection established with the client.

the traffic looks good, request out, and response immediately get back. (so no firewall blackout, no package drop)

image
 

here, 10.1.111.22 is the client running C# application. It send out the request to Proxy Node (10.1.25.151)
   then after 0.024 seconds, it get the response back.(client send out ack in package 7)

all looks great on network level. Then it comes to my second question, are those data valid in the return.

image

in this special case, the server returns 38 bytes array here it will be

25:90:d0:92:cb:05:00:02:00:6b:03:4c:18:15:a4:37:01:00:4e:04:74:65:73:74:01:42:cf:ff:ce:91:b8:f7:b1:b2:ee:01:40:40

what the encoding mechanism, I can only figure out 1st byte 25 which is 37 means that package data length is 38-1=37, how about the rest?

then I use the .net reflector to read the Coherence.dll. and figure out the raw format.

[datalength][chennelid][typeid][versionid][objectitself]

it used packed int32 format here.
I write a Simple c# program. here to decode the channel id and tyepid/versionid

String s =”25:90:d0:92:cb:05:00:02:00:6b:03:4c:18:15:a4:37:01:00:4e:04:74:65:73:74:01:42:cf:ff:ce:91:b8:f7:b1:b2:ee:01:40:40”;
byte[] data = new byte[(s.Length + 1) / 3];
for (int i = 0; i < data.Length; i++)
{
data[i] = (byte)
  ( "0123456789ABCDEF".IndexOf(s[i * 3]) * 16 + "0123456789ABCDEF".IndexOf(s[i * 3 + 1]));
}

MemoryStream ms = new MemoryStream(data);

DataReader dr = new DataReader(ms);
string msg = string.Format("Package Length  {0} \n Channel ID {1} \n TypeID {2}\n Version id {3} ",
dr.ReadPackedInt32(), dr.ReadPackedInt32(), dr.ReadPackedInt32(), dr.ReadPackedInt32());
MessageBox.Show(msg);


When run the code, you will see the decoded value.
image

it you try decode the output package sending to proxy, it has the same naming format.

Client send a package with the Channel ID and receive the response with channelID, if the channel state is maintained incorrectly in the client side, you will get the exception>
here the reflector code for client  decode message.
image

if the channel =null or is closed, even server returns back the response. it will cause the requestimeoutexception.
stupid code;(

you can also do the inspecting on the runtime level.
Define one filter.

public class MyFilterDump : IWrapperStreamFactory
    {
        public MyFilterDump()
        {

        }
        public System.IO.Stream GetInputStream(System.IO.Stream stream)
        {
            System.Diagnostics.Debug.WriteLine("Get Response " + stream.Length);

            DataReader dr = new DataReader(stream);
            System.Diagnostics.Debug.WriteLine("Chennel ID " + dr.ReadPackedInt32());

            stream.Seek(0, SeekOrigin.Begin);
            return stream;
        }

        public System.IO.Stream GetOutputStream(System.IO.Stream stream)
        {
            System.Diagnostics.Debug.WriteLine("Send Request " + stream.Length);
            return stream;
            //did the same thing on input
        }
    }

then put it into client-coherence.xml and client-cache-control.xml

<coherence xmlns="http://schemas.tangosol.com/coherence">
<cluster-config>
<filters>
<filter>
<filter-name>debugfilter</filter-name>
<filter-class>ExtendLib.MyFilterDump, ExtendLib</filter-class>
</filter>
</filters>

….

client-cache-control.xml

<remote-cache-scheme>
      <scheme-name>extend-dist</scheme-name>
      <service-name>ExtendTcpCacheService</service-name>
      <initiator-config>
        <tcp-initiator>
          <remote-addresses>
            <socket-address>
              <address>localhost</address>
              <port>9999</port>
            </socket-address>
           
          </remote-addresses>
        </tcp-initiator>
      
        <outgoing-message-handler>
        <request-timeout>30s</request-timeout>
      </outgoing-message-handler>
    <use-filters>
        <filter-name>debugfilter</filter-name>
        </use-filters>

Have fun@

Monday, August 2, 2010

SQlServer StreamInsight , Unable to Create Application Microsoft.ComplexEventProcessing.ConnectionException: Access is denied

StreamInside is the Microsoft version of CEP (Complex Event Processing) solution. Even it’s called Server server 2008 R2 StreamInside, It has nothing to do with the SQLServer 2008 R2. but in term of licensing. So you don’t need to install SQl server 2008 r2 in order to run the CEP.

after finished installing the StreamInside and setup one instance called StreamInside. time to write a hello world application.

System.ServiceModel.WSHttpBinding binding = new System.ServiceModel.WSHttpBinding(System.ServiceModel.SecurityMode.Message);
System.ServiceModel.EndpointAddress ea=new System.ServiceModel.EndpointAddress("http://localhost/StreamInsight/StreamInsght");

using (Server server = Server.Connect(ea,binding  ))
{

try
{
// Create application in the server. The application will serve
// as a container for actual CEP objects and queries.
Console.WriteLine("Creating CEP Application");
Application application = server.CreateApplication("TrafficJoinSample");

Then click and run, get the following error, access is denied.

c2

How comes ? then I run a cordbg and attach to the Host Server. Get the error stack trace.  It requires the user to be the Group of StreamInsightUser$[instanceName]
  image
 

Can I override to just remove the security enforcement. the short answer is No. It hardcoded the roles Check.
image

image

what’s the principalpermission?

image

image

even you override the .config and change the binding security mode to None. it still need this role check.

 

What' you have to do is make sure you are in the named group. It looks for me that I have to restart the PC to get refreshed on the group ownership.

you can use the following C# code to query all the groups you have.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Security.Principal;

namespace ConsoleApplication2
{
    class Program
    {
        static void Main(string[] args)
        {

            System.Security.Principal.WindowsIdentity.GetCurrent().Groups.ToList().ForEach

                (e => Console.WriteLine(((System.Security.Principal.NTAccount)e.Translate(typeof(System.Security.Principal.NTAccount)))));

        }
    }
}

More troubleshooting entries.

Coherence Performance tuning, Network , Communication Delay

there is one Performance Tuning tips in coherence wiki. besides that, you should use the JMX client tool to make sure you applied the correct settings.
   for example, when you get an warning like buffersize is too small, after you change the setting, make sure the Node pick up this settings.

UnicastUdpSocket failed to set receive buffer size to 1428 packets (2096304 bytes); actual size is 89 packets (131071 bytes). Consult your OS documentation regarding increasing the maximum socket buffer size. Proceeding with the actual value may cause sub-optimal performance.

run the jconsole, click the Node mbeans, you should be able to see the setting get pickedup .

image

if you get the Communication Delay error , there must be something wrong with the Network Or the Remote Node.

Experienced a 4172 ms communication delay (probable remote GC) with Member(Id=7, Timestamp=2006-10-20 12:15:47.511, Address=192.168.0.10:8089, MachineId=13838); 320 packets rescheduled, PauseRate=0.31, Threshold=512

Check the following attributes if your guess is a network issue. ( Remote Node IS Fine, No CPU spike, No big GC activities)

image

  then go to Member 7, Check its statistics.  (CPU,network)

PacketPublisher: Cpu=641ms (0.0%), PacketsSent=4945, PacketsResent=5, SuccessRate=0.9989, Throughput=7714pkt/sec PacketSpeaker  : Cpu=0ms (0.0%), PacketsSent=63, Bursts=3, Throughput=0pkt/sec, Queued=0 PacketReceiver : PacketsReceived=5292, PacketsRepeated=2, SuccessRate=0.9996 TcpRing        : TotalPings=3382, Timeouts=0, Failures=0, SuccessRate=1.0

If you find something wrong with Node 7, Run a Performance monitor make sure there is no other CPU hungry application. if another application is consuming all the CPU. then all the Nodes on this server can’t be communicated by other nodes. and vice versa.

here is a summary of all those attributes and its meaning.

BufferPublishSize Integer RW The buffer size of the unicast datagram socket used by the Publisher, measured in the number of packets. Changing this value at runtime is an inherently unsafe operation that will pause all network communications and may result in the termination of all cluster services.
BufferReceiveSize Integer RW The buffer size of the unicast datagram socket used by the Receiver, measured in the number of packets. Changing this value at runtime is an inherently unsafe operation that will pause all network communications and may result in the termination of all cluster services.
BurstCount Integer RW The maximum number of packets to send without pausing. Anything less than one (e.g. zero) means no limit.
BurstDelay Integer RW The number of milliseconds to pause between bursts. Anything less than one (e.g. zero) is treated as one millisecond.
CpuCount Integer RO Number of CPU cores for the machine this Member is running on.
FlowControlEnabled Boolean RO Indicates whether or not FlowControl is enabled.
Id Integer RO The short Member id that uniquely identifies the Member at this point in time and does not change for the life of this Member.
LoggingDestination String RO The output device used by the logging system. Valid values are stdout, stderr, jdk, log4j, or a file name.
LoggingFormat String RW Specifies how messages will be formatted before being passed to the log destination
LoggingLevel Integer RW Specifies which logged messages will be output to the log destination. Valid values are non-negative integers or -1 to disable all logger output.
LoggingLimit Integer RW The maximum number of characters that the logger daemon will process from the message queue before discarding all remaining messages in the queue. Valid values are integers in the range [0...]. Zero implies no limit.
MachineId Integer RO The Member`s machine Id.
MachineName String RO A configured name that should be the same for all Members that are on the same physical machine, and different for Members that are on different physical machines.
MemberName String RO A configured name that must be unique for every Member.
MemoryAvailableMB Integer RO The total amount of memory in the JVM available for new objects in MB.
MemoryMaxMB Integer RO The maximum amount of memory that the JVM will attempt to use in MB.
MulticastAddress String RO The IP address of the Member`s MulticastSocket for group communication.
MulticastEnabled Boolean RO Specifies whether or not this Member uses multicast for group communication. If false, this Member will use the WellKnownAddresses to join the cluster and point-to-point unicast to communicate with other Members of the cluster.
MulticastPort Integer RO The port of the Member`s MulticastSocket for group communication.
MulticastTTL Integer RO The time-to-live for multicast packets sent out on this Member`s MulticastSocket.
MulticastThreshold Integer RW The percentage (0 to 100) of the servers in the cluster that a packet will be sent to, above which the packet will be multicasted and below which it will be unicasted.
NackEnabled Boolean RO Indicates whether or not the early packet loss detection protocol is enabled.
NackSent Long RO The total number of NACK packets sent since the node statistics were last reset.
PacketDeliveryEfficiency Float RO The efficiency of packet loss detection and retransmission. A low efficiency is an indication that there is a high rate of unnecessary packet retransmissions.
PacketsBundled Long RO The total number of packets which were bundled prior to transmission. The total number of network transmissions is equal to (PacketsSent - PacketsBundled).
PacketsReceived Long RO The number of packets received since the node statistics were last reset.
PacketsRepeated Long RO The number of duplicate packets received since the node statistics were last reset.
PacketsResent Long RO The number of packets resent since the node statistics were last reset. A packet is resent when there is no ACK received within a timeout period.
PacketsResentEarly Long RO The total number of packets resent ahead of schedule. A packet is resent ahead of schedule when there is a NACK indicating that the packet has not been received.
PacketsResentExcess Long RO The total number of packet retransmissions which were later proven unnecessary.
PacketsSent Long RO The number of packets sent since the node statistics were last reset.
Priority Integer RO The priority or "weight" of the Member; used to determine tie-breakers.
ProcessName String RO A configured name that should be the same for Members that are in the same process (JVM), and different for Members that are in different processes. If not explicitly provided, for processes running with JRE 1.5 or higher the name will be calculated internally as the Name attribute of the system RuntimeMXBean, which normally represents the process identifier (PID).
ProductEdition String RO The product edition this Member is running. Possible values are: Standard Edition (SE), Enterprise Edition (EE), Grid Edition (GE).
PublisherPacketUtilization Float RO The publisher packet utilization for this cluster node since the node socket was last reopened. This value is a ratio of the number of bytes sent to the number that would have been sent had all packets been full. A low utilization indicates that data is not being sent in large enough chunks to make efficient use of the network.
PublisherSuccessRate Float RO The publisher success rate for this cluster node since the node statistics were last reset. Publisher success rate is a ratio of the number of packets successfully delivered in a first attempt to the total number of sent packets. A failure count is incremented when there is no ACK received within a timeout period. It could be caused by either very high network latency or a high packet drop rate.
RackName String RO A configured name that should be the same for Members that are on the same physical "rack" (or frame or cage), and different for Members that are on different physical "racks".
ReceiverPacketUtilization Float RO The receiver packet utilization for this cluster node since the socket was last reopened. This value is a ratio of the number of bytes received to the number that would have been received had all packets been full. A low utilization indicates that data is not being sent in large enough chunks to make efficient use of the network.
ReceiverSuccessRate Float RO The receiver success rate for this cluster node since the node statistics were last reset. Receiver success rate is a ratio of the number of packets successfully acknowledged in a first attempt to the total number of received packets. A failure count is incremented when a re-delivery of previously received packet is detected. It could be caused by either very high inbound network latency or lost ACK packets.
RefreshTime Date RO The timestamp when this model was last retrieved from a corresponding node. For local servers it is the local time.
ResendDelay Integer RW The minimum number of milliseconds that a packet will remain queued in the Publisher`s re-send queue before it is resent to the recipient(s) if the packet has not been acknowledged. Setting this value too low can overflow the network with unnecessary repetitions. Setting the value too high can increase the overall latency by delaying the re-sends of dropped packets. Additionally, change of this value may need to be accompanied by a change in SendAckDelay value.
RoleName String RO A configured name that can be used to indicate the role of a Member to the application. While managed by Coherence, this property is used only by the application.
SendAckDelay Integer RW The minimum number of milliseconds between the queueing of an Ack packet and the sending of the same. This value should be not more then a half of the ResendDelay value.
SendQueueSize Integer RO The number of packets currently scheduled for delivery. This number includes both packets that are to be sent immediately and packets that have already been sent and awaiting for acknowledgment. Packets that do not receive an acknowledgment within ResendDelay interval will be automatically resent.
SiteName String RO A configured name that should be the same for Members that are on the same physical site (e.g. data center), and different for Members that are on different physical sites.
SocketCount Integer RO Number of CPU sockets for the machine this Member is running on.
Statistics String RO Statistics for this cluster node in a human readable format.
TcpRingFailures Long RO The number of recovered TcpRing disconnects since the node statistics were last reset. A recoverable disconnect is an abnormal event that is registered when the TcpRing peer drops the TCP connection, but recovers after no more then maximum configured number of attempts.This value will be -1 if the TcpRing is disabled.
TcpRingTimeouts Long RO The number of TcpRing timeouts since the node statistics were last reset. A timeout is a normal, but relatively rare event that is registered when the TcpRing peer did not ping this node within a heartbeat interval. This value will be -1 if the TcpRing is disabled.
Timestamp Date RO The date/time value (in cluster time) that this Member joined the cluster.
TrafficJamCount Integer RW The maximum total number of packets in the send and resend queues that forces the publisher to pause client threads. Zero means no limit.
TrafficJamDelay Integer RW The number of milliseconds to pause client threads when a traffic jam condition has been reached. Anything less than one (e.g. zero) is treated as one millisecond.
UnicastAddress String RO The IP address of the Member`s DatagramSocket for point-to-point communication.
UnicastPort Integer RO The port of the Member`s DatagramSocket for point-to-point communication.
WeakestChannel Integer RO The id of the cluster node to which this node is having the most difficulty communicating, or -1 if none is found. A channel is considered to be weak if either the point-to-point publisher or receiver success rates are below 1.0.
WellKnownAddresses String[] RO An array of well-known socket addresses that this Member uses to join the cluster.

Wednesday, July 28, 2010

Oracle Coherence 3.6. ad-hoc query without using the filter, aggregator, entryproessor. using SQL like syntax called CohQL

one of the enhancement of Coherence 3.6 is the Coherence Query language, they call CohQL. Shall I spell CQL along with LinQ, SQL?
 

Before this version. you have to hardcode different filters to enable filtering which corresponds to Where clause in SQL.  It will be more intuitive to write a query just like the SQL.  Now it is possible in 3.6

SELECT (properties* aggregators* | * | alias) FROM "cache-name" [[AS] alias] [WHERE conditional-expression] [GROUP [BY] properties+]

Given a POJO Class PurchaseOrder with three attributes PoAmount, State, PMName.
   if you want to query those POs in a given state and With a least amount. you need Three Filters. might Be

Filter gt=new GreaterFilter("getPoAmount", 7.0f);
Filter et=new EqualsFilter("getState" , "CA");
Filter caandgreate7 =new AndFilter(gt,et);
System.out.println(pCache.entrySet(caandgreate7).size());

In 3.6, you can just put a query like the where clause syntax directly. the following code will do the same query

Filter AdHoc=com.tangosol.util.QueryHelper.createFilter("PoAmount >7.0f AND State='CA'");
System.out.println(pCache.entrySet(AdHoc).size());

If you use Coherence to store a lot data for Analytics, another good news is that it comes up with a new utility like the SQL client for DB.  
  before, you want to run a group by State and get the average poAmount.

EntryAggregator gp=GroupAggregator.createInstance("getState", new DoubleAverage("getPoAmount"));
Object o=pCache.aggregate((Filter)null, gp);
System.out.println(o);

you will get some result like

{CA=46.42647790186333, OR=51.46033203601837, WA=46.86704759886771}

Now with the new query client. you just run a sql like group
image 
Is it sweet? I think So.

even some management features like Backup DB
image


More query Syntax and support, check http://download.oracle.com/docs/cd/E15357_01/coh.360/e15723/api_cq.htm#CEGDBJBI

Thursday, July 15, 2010

asp.net 4.0 , writing Custom outputcache provider for Oracle coherence Memory Cache

One of my Favorite feature of ASP.NET 4.0 is that we can offload the ouputcache from in-Process memory to external storage. i.e Disk, or a big in-memory cache. then we can Cache more Data  instead of being flushed out of memory when asp.net worker process get a lot pressure.  this will be more helpful during the holidays season.

Coherence is one popular in-memory distributed cache cluster. I just spent 30 minutes to build one simple outputcache provider. less than 100 lines of Code. much easier than the diskcache example:)
http://weblogs.asp.net/gunnarpeipman/archive/2009/11/19/asp-net-4-0-writing-custom-output-cache-providers.aspx

Key Points,

  • Keep the object in-memory, we need take care the object serialization /deserialization
    • this can be done easily in .NET. there are 5+ different serializer/formatters
    • I will use the BinaryFormatter which is more space -efficient
  • every cache item has a expiry setting, need flush and purge it when it reaches lifetime settings.
    • Coherence has the overflow setting,
    • you can specify the ttl when inserting object into the cache.

Steps.

  • Setup your distributed Cache Cluster and one dedicated cache.
  • Reference Coherence Assembly and fill up 4 methods which are required by OutputCacheProvider
  • Config asp.net web.config, point the provider to ours.

Setup your distributed Cache Cluster and one dedicated cache. here I will define one cache called outputcache to hold all the cached data.

Build the CoherencCacheOutputCacheProvider.

  • Create one Class library project named CoherencCacheOutputCacheProviderLib
  • reference several dlls.
    • System.web.dll (where OutputCacheProdiver exists)
    • System.configuration
    • Coherence.dll
  • Create one Class named CoherencCacheOutputCacheProvider
  • Compile
  • here is the source code. you may just copy and paste .

change your asp.net application to using the New cache provider.

  • reference the dll we just compiled
  • change the web.config
  • tune you outputcache settings.

here is the web.config

testing and make sure cache works as it should be.

first, I create one simple page just pringout the current time, enable the outputcache and setup duration to 30 seconds. and very by url parameter x

<%@ Page Language="C#" AutoEventWireup="true" CodeFile="testcache.aspx.cs" Inherits="testcache" %>
<%@ OutputCache Duration="30" VaryByParam="x" %>
Last Update :<% =System.DateTime.Now %>

When I browse http://localhost:39847/WebSite1/testcache.aspx?x=1

the runtime will create two cache items, one is the Cachevary, another one is page itself.
image

when try different x value, will get more cached object.

image

the key for the item is url+xvalue.
Let me change the cache logic to vary by browser.
<%@ OutputCache Duration="30" VaryByParam="none" VaryByHeader="user-agent" %>

then try IE/Chrome/Opera.
image

Since we config the cache expiry setting to 30 seconds. so let’s verify the cache item are gone once they get expired.
it is!

image

Conclusion:

asp.net 4.0 provider model is very convenient for extension.  OutputCacheProvider  comes to a citizen of the provervider model.
  just like the coherence session provider for asp.net, it’s plug and play. write once , and enjoy the benefit everywhere.

Tuesday, July 13, 2010

using Btrace to Make sure the filter is using the index you created for Oracle coherence

I am always wondering that Is there any way we can tell the execution plan of the given filter? given the following example.  all in C# code. when the Cache is setup to run in Distributed Mode/Replicated Mode. will the query pickup the index ? will the index get refreshed promptly?
C# sample code,

INamedCache cache = CacheFactory.GetCache(CacheName);
int ctt = 10;
Random rand = new Random();
for (int i = 0; i < ctt; i++)
{
    PurchaseOrder o = new PurchaseOrder();
    o.PoAmount=rand.Next(50000);
    cache.Add( i , o);
}

//Add one Index
INamedCache cache = CacheFactory.GetCache(CacheName);
IValueExtractor extractor = new ReflectionExtractor("getPoAmount");
cache.AddIndex(extractor, true, null);

//Try one Query , like GreatFilter.

INamedCache cache = CacheFactory.GetCache(CacheName);
GreaterFilter filter1 = new GreaterFilter("getPoAmount",10000f);
MessageBox.Show("F" + cache.GetEntries(filter1).Length.ToString() );

I tried several profiling tools. memory profiler,for the question, What’s the memory different after we inserted some objects, and then created some Index. are the replicated mode use different format to store the object. Binary format for the serialized object, or just raw OBJ format.

you can load the jvisualvm.  run two snapshots , and compare it. try searching Index on the comparison report. you will find the following new created objects. Basically, there is one instance called SimpleMapIndex.
  
Then the second question, If there is one Index map, When will the query use it , whether or Not? When will the index get refreshed? say , object update or removal?

Answer: Use Btrace to Inspect the method get called inside SimpleMapIndex.
Here comes my tracing script.

/* BTrace Script Template */

import java.lang.reflect.Field;
import java.util.Map;
import java.util.logging.LogRecord;

import com.sun.btrace.BTraceUtils;
import com.sun.btrace.annotations.*;
import com.tangosol.util.ValueExtractor;

import static com.sun.btrace.BTraceUtils.*;

@BTrace
public class TracingScript {
    /* put your code here */

    @OnMethod(clazz = "com.tangosol.util.SimpleMapIndex", method = "insert", location = @Location(where = Where.BEFORE))
    public static void insert(@Self com.tangosol.util.SimpleMapIndex self,
            Map.Entry a) {
        println("insert Index");
        String s;

    }

    @OnMethod(clazz = "com.tangosol.util.SimpleMapIndex", method = "update", location = @Location(where = Where.BEFORE))
    public static void update(@Self com.tangosol.util.SimpleMapIndex self,
            Map.Entry a) {
        println("update Index");
    }

    @OnMethod(clazz = "com.tangosol.util.SimpleMapIndex", method = "delete", location = @Location(where = Where.BEFORE))
    public static void delete(@Self com.tangosol.util.SimpleMapIndex self,
            Map.Entry a) {
        println("Delete Index");

    }

    // Who is querying index content
    @OnMethod(clazz = "com.tangosol.util.SimpleMapIndex", method = "getIndexContents", location = @Location(value = Kind.RETURN))
    public static void getIndexContents(
            @Self com.tangosol.util.SimpleMapIndex self, @Return Map map) {
             Field msgField = field("com.tangosol.util.SimpleMapIndex", "m_extractor");

        println("getIndexContents");
        println("------------index used ---------------");
        Object o=get(msgField,self);
        println( str(o));
        printFields(o);
        Class cz=classForName("com.tangosol.util.extractor.AbstractCompositeExtractor");
        if(isInstance(cz,get(msgField,self) ))
        {
            println("++++++++++++++++++++++++Chained+++");
            //dump chanied fields;
            Field f2 = field("com.tangosol.util.extractor.AbstractCompositeExtractor", "m_aExtractor");
            Object [] rfs=(Object[])get(f2,o);
            println(BTraceUtils.strcat("Chain Count: ", str(rfs.length)));
            int i=0;
            if(i<rfs.length)
            {
            printFields(rfs[i]);
            i++;
            }
            if(i<rfs.length)
            {
            printFields(rfs[i]);
            i++;
            }
            if(i<rfs.length)
            {
            printFields(rfs[i]);
            i++;
            }
            if(i<rfs.length)
            {
            printFields(rfs[i]);
            i++;
            }

            println("++++++++++++++++++++++++Chained+++");
        }

        println("------------index used ---------------");
        jstack();
    }

}

you may just copy this scipt, and save it to a local folder as TracingScript.java.

run jps to get the cacheserver PID.

   Btrace –cp “path of the coherence.jar” PID “pathofthe sciprt”

then try running the cache in different Mode, i.e replicated mode vs distributed mode.

Answer is interesting, For replicated Mode. all query never pickup the Index.

image

More Coherence Blogs:

Friday, July 9, 2010

Oracle Coherence, java.lang.IllegalArgumentException: Unsupported key or value

When I define one replicated cluster, and Setup the unit-calculator to binary. hopefully, from the Coherence Mbeans, I should be able to see the object size vs units. By default, size is the same with unit.  When I put some data to the replicated cluster. From the cluster jvm console, I get the following Errors.  the cache extend client didn’t get any error. But when you query the cluster, No objects founds. they are all gone.

2010-07-09 15:50:47.790/300.414 Oracle Coherence GE 3.5.3/465 <Error> (thread=ReplicatedCache, member=1):
java.lang.IllegalArgumentException: Unsupported key or value: Key=9, Value=ID 9PMName PM9PoAmount   135.0PoNumber  null
       at com.tangosol.net.cache.BinaryMemoryCalculator.calculateUnits(BinaryMemoryCalculator.java:43)
        at com.tangosol.net.cache.OldCache$Entry.calculateUnits(OldCache.java:2397)
        at com.tangosol.net.cache.OldCache$Entry.onAdd(OldCache.java:1990)
        at com.tangosol.util.SafeHashMap.put(SafeHashMap.java:244)
        at com.tangosol.net.cache.OldCache.put(OldCache.java:266)
        at com.tangosol.coherence.component.util.CacheHandler.onLeaseUpdate(CacheHandler.CDB:45)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.ReplicatedCache.performUpdate(Replic
atedCache.CDB:11)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.ReplicatedCache.onLeaseUpdateRequest
(ReplicatedCache.CDB:22)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.ReplicatedCache$LeaseUpdateRequest.o
nReceived(ReplicatedCache.CDB:5)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onMessage(Grid.CDB:9)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onNotify(Grid.CDB:136)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.ReplicatedCache.onNotify(ReplicatedC
ache.CDB:3)
        at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
        at java.lang.Thread.run(Thread.java:619)

What does this error means?  it looks like there are something wrong with the BinaryMemoryCalculator.calculateUnits

what’s the logic in this method. we can turn to JD-GUI [ java decompilers, like the .net reflector. ] I get the code here

/*    */   public int calculateUnits(Object oKey, Object oValue)
/*    */   {
/* 35 */     if ((oKey instanceof Binary) && (oValue instanceof Binary))
/*    */     {
/* 37 */       return padMemorySize(SIZE_ENTRY + 2 * SIZE_BINARY + ((Binary)oKey).length() + ((Binary)oValue).length());
/*    */     }
/*    */
/* 43 */     throw new IllegalArgumentException("Unsupported key or value: Key=" + oKey + ", Value=" + oValue);
/*    */   }

cache configuration

  <replicated-scheme>
      <scheme-name>repl-default</scheme-name>
      <service-name>ReplicatedCache</service-name>
      <serializer>
        <class-name>com.tangosol.io.pof.ConfigurablePofContext</class-name>
      </serializer>
      <lease-granularity>member</lease-granularity>
      <backing-map-scheme>
        <local-scheme>
        <unit-calculator>BINARY</unit-calculator>
        </local-scheme>
      </backing-map-scheme>
      <autostart>true</autostart>
    </replicated-scheme>

So the error means either Key or Object is not a Binary type. How comes it is not a Binary? the system controls the internal storage.

how about dumping out the class type of key and the value?

BTrace is my friend then , basically, I want to print out the class of okey/ovalue before the method get called. 

here is the Btrace script

/* BTrace Script Template */

import com.sun.btrace.BTraceUtils;
import com.sun.btrace.annotations.*;
import com.tangosol.net.cache.*;
import static com.sun.btrace.BTraceUtils.*;

@BTrace
public class TracingScript {
    /* put your code here */
    @OnMethod(clazz="com.tangosol.net.cache.BinaryMemoryCalculator",method="calculateUnits",location=@Location(where=Where.BEFORE))
        public static void oncalculateUnits(@Self com.tangosol.net.cache.BinaryMemoryCalculator self,
                Object a, Object b) {
        print("a  ");
        println( a);
        println(  BTraceUtils.classOf(a));
        print("b  ");
        println( b);
        println(  BTraceUtils.classOf(b));
     }
}

then your can trace the cacheserver either by startuping btrace.bat pidofthejvm pathoftrace.java, or using btrace plugin for jvisualvm

as the error implys, it do store the object as the native format instead of the binary format

 

a  98
class java.lang.Integer
b  POFLib.PurchaseOrder@12fc61a
class POFLib.PurchaseOrder
a  99
class java.lang.Integer
b  POFLib.PurchaseOrder@cef65
class POFLib.PurchaseOrder

if I change the cachem schema back to distributed. No error, and the object is stored as Binary format.

a  com.tangosol.util.Binary@153b2cb
class com.tangosol.util.Binary
b  com.tangosol.util.Binary@1ff2e1b
class com.tangosol.util.Binary

C# code

INamedCache cache = CacheFactory.GetCache("repl-customer");
                      PurchaseOrder o = new PurchaseOrder();
                   o.ID = i;
                   o.PMName = "PM" + i;
                    o.PoAmount=rand.Next(50000);
                   cache.Add( i , o);
          

 

conclusion:

  • if you use the replicated cache. the unit-calculator has to be Fixed.
  • Objects is stored in different format. Native vs Binary  for replicated mode and distributed mode
  • if Object is stored in native format, will there be more footprint?
  • replicated mode doesn’t support Index, check the blog here.

Wednesday, June 23, 2010

Installing Oracle coherence .bat file as Windows Service

When you get the oracle coherence bit,there is one bootstrap file located in coherence\bin\cache-server.cmd  , when you run that , a command prompt will show up. and you need keep it open otherwise the clustering service will be closed. So how to make it runs as a windows service?

image 

Here is the steps to install it as a windows service, so the cluster service will be started up when the machine boot up.

1. Get the Parameters to run the Java Application. when you start up the bat file, the runtime will fire up one jave process with giving parameters. you need get those parameters.

how to get the parameters on windows, If you runs on windows vista or 7 or 2008. just right click the taskmanager, turn on the commandline column by selecting View->select Columns->Command Line.

image

for windows  Xp ,or 2003.  you can get this by querying the WMI

Run WMIC on the command prompt, and key in “process where name=”java.exe” get commandline

image

so for this example the full commandline is .

full commandline "c:\jdk1.6.0_20\bin\java"  -server -showversion ""-Xms512m -Xmx512m"" -cp "C:\tangosol-33\coherence\bin\\..\lib\coherence.jar" com.tangosol.net.DefaultCacheServer
JDK "c:\jdk1.6.0_20\bin\java"
classpath C:\tangosol-33\coherence\bin\\..\lib\coherence.jar
jvm parameters -server -showversion ""-Xms512m -Xmx512m""
main class com.tangosol.net.DefaultCacheServer

2. download the javaservice . http://forge.ow2.org/project/showfiles.php?group_id=137&release_id=1560
please note, if your using 64 bit jvm, chose the amd64 zip version of javaservice. then unzip the file, copy the javaservice.exe to the coherence\bin folder. you may rename it as coherence.exe.

3. install one windows service by assign the paramters to coherence.exe. 
  

C:\tangosol-33\coherence\bin>coherence.exe -install MyServiceName "C:\jdk1.6.0_20\jre\bin\server\jvm.dll" -server -showv
ersion ""-Xms512m -Xmx512m"" -Djava.class.path="C:\tangosol-33\coherence\bin\\..\lib\coherence.jar" com.tangosol.net.Def
aultCacheServer -err "c:\tangosol-33\coherence\bin\err.log" -start com.tangosol.net.DefaultCacheServer

then you will be able to see a service named myservicename in services console.

image

4. troubleshooting. you may get error to startup the service. always check the following list.

  • Windows Event Log, if you mess up the jvm 32/64 version, you may get the error like

    “LoadLibrary is not a valid Win32 application.” which means you jdk is 32bit, but javaservice is 64 bit.

  • check the error log file that you specified in the –err parameter.
  • you may tweak the parameters, which is located in HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\services\MyServiceName

image

just change those parameters directly, then restart the service see whether the error will disappear

hope it helps.

some reference

javaservice: http://forge.ow2.org/project/showfiles.php?group_id=137&release_id=1560

Oracle coherence: http://www.oracle.com/technology/products/coherence/index.html

Oracle coherence Book, great book from the core team

Monday, May 10, 2010

java.net.SocketException: Unrecognized Windows Sockets error: 0: Cannot bind

Oracle Coherence is a JVM-based Clustering technology. sometimes, when you start the jvms on windows environment, you may get some strange error. like this, java.net.SocketException: Unrecognized Windows Sockets error: 0 : Cannot bind at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:108) at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11) at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCl uster.CDB:3) at com.tangosol.coherence.component.util.SafeCluster.restartCluster(Safe Cluster.CDB:7) at com.tangosol.coherence.component.util.SafeCluster.ensureRunningCluste r(SafeCluster.CDB:27) at com.tangosol.coherence.component.util.SafeCluster.start(SafeCluster.C DB:2) at com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:998) at com.tangosol.net.DefaultConfigurableCacheFactory.ensureService(Defaul double check you config file, it’s all correct. then check whether you have enabled some network acceleration solution, Like Microsoft ISA Client, or Google Accelerator. they try to hooked up with windows socket provider. disable them, then problem gone. have fun. coherence user guide: http://coherence.oracle.com/display/COH35UG/Coherence+3.5+Home some other googled exception:
 
Locations of visitors to this page