Monday, October 19, 2009

Arbitrary keys & hierarchies, part 3. Evolution of Key

As you might remember, Int64 type was used as a global unique identifier in DataObjects.Net v3. What were the reasons for changing this working approach? I’ll tell you: the main reason was the strong requirement to support Bottom-up development model. In case when a customer has a database and he wants to use an ORM to work with, in that case we just can’t say: “Hey, man, change all primary keys in all your tables to bigint (analogue of Int64 in Microsoft SQL Server) in order to use our super-duper ORM!”.
So if we want to support existing databases, we must support all kinds of primary keys that can potentially be implemented there.

How could we do it?

First of all, let’s enumerate possible logical structures of primary key:

  • single value key
  • multiple value key

Also note that from physical point of view primary key can contain field(s) of various types: int, long, Guid, string and so on. If so, we need some structure that can hold one or more fields of various types: something like List<object> or object[].

Taking into consideration primary key immutability, we must also make Key immutable. So some kind of ReadOnlyList<object> wrapper must be applied on top of initial List<object>.

Thus, at this moment Key class will contain List<object> and implement some interface to expose values in read-only manner.

[Serializable]
public class Key
{
  private List<object> values;

  // Safe way to expose values 
  public object[] GetValues()
  {
    return values.ToArray();
  }

  public Key(params object[] values)

}

Going further.

Should two instances of Key with equal field structure and equal values be considered as equal or not? It seems that yes, they must be equal. Then in order to meet the requirement, we must override GetHashCode & Equals methods (where we are to provide field-by-field value equality check) as well as implement IEquatable<Key> interface.

[Serializable]
public class Key : IEquatable<Key>
{
  ...
  // Equality support methods 
  public bool Equals(Key other)
  public override bool Equals(object obj)
  public static bool operator ==(Key left, Key right)
  public static bool operator !=(Key left, Key right)
  public override int GetHashCode()

  ...
}

The next step is an attempt to implement an Identity map pattern, which will be responsible for correspondence between Keys and Entities. The implementation will be simple and straightforward: we’ll use Dictionary<Key, Entity> for it.

Imagine the following domain model:

We have 2 persistent types here: Dog & Cat. Structure of identity fields are equal: one field of int type. If both classes are mapped to separate tables with simple autoincrement identity field then there is high probability of situation when values of identity fields from Dog & Cat tables will be equal. Say, we could have the following keys: dogKey = new Key(25) for Dog instance and catKey = new Key(25) for Cat instance, where 25 is the value of identity field.

var identityMap = new Dictionary<Key, Entity>();

var dogKey = new Key(25);
var catKey = new Key(25);

var dog = Query<Dog>.Single(dogKey);
identityMap[dogKey] = dog;

Assert.IsNotNull(identityMap[dogKey]); // True
Assert.IsNull(identityMap[catKey]);    // False

Pay attention to the last line. Although we didn’t add any Cat instance to the identity map, it says that it has one, either for dogKey or catKey. The reason is that both keys are considered as equal. So the problem is that keys with equal values for the same type must be equal, but for other types mustn’t be. In order to solve the problem we must distinguish keys made for different types, i.e. inject some type-dependent information into Key and take this into account in Equals & GetHashCode methods implementation. The most evident approach is to add property of Type type.

[Serializable]
public class Key : IEquatable<Key>
{
  ...
  public Type Type { get; private set; }

  public Key(Type type, params object[] values)
}

Now we’ve got good chances to build a fully functional identity map.

So what’s then? Let’s look at the result and calculate the cost of Key usage.

  • Every Key instance has a reference to List<object> where values of identity fields are stored. Identity field values are stored as objects, so creating a Key with 1 identity field leads to creating of 2 objects in managed heap: first is for List<object> (identity field values container), and the second is for identity field value. More identity fields in Key => more small objects in heap. Not good.
    Fortunately, we can easily substitute ineffective List<object> to quite effective and compact Tuple implementation with typed access to fields (and yes, Xtensive made its own Tuple implementation. I’m going to describe it in details in a separate post).
    This approach is highly scalable and can be used in a wide variety of scenarios: from widespread scenario with only one identity field (type of int, long, Guid, etc.) to rare ones with a group of identity fields (we call such keys as complex keys).
  • Every Key instance has a reference to Type object. But before DataObjects.Net v4 could somehow manipulate with Key instance it must get corresponding TypeInfo object for it. TypeInfo is a class from Xtensive.Storage.Model namespace that fully describes persistent type in DataObjects.Net v4 domain. Resolving TypeInfo for Type costs 1 lookup in Dictionary<Type, TypeInfo>. In order to prevent these lookups (as they are going to be rather frequent) we decided to put reference to TypeInfo directly into Key instead of reference to Type.

Here is how Key class is really designed:

public class Key : IEquatable<Key>
{
  public Tuple Value { get; }
  public TypeInfo Type { get; }

  // Instance methods 
  public bool Equals(Key other)
  public override bool Equals(object obj)
  public static bool operator ==(Key left, Key right)
  public static bool operator !=(Key left, Key right)
  public override int GetHashCode()
  public override string ToString()

  // Static methods 
  public static Key Parse(string source) + 1 overload
  public static Key Create() + many overloads
}

In the next posts I’m going to describe other topics related to Key: the absence of public constructor, serialization, patterns of creations, key generators (key providers), identity fields mappings, other speed and resource utilization improvements.

Stay tuned.

Part 2. Hierarchies, Part 4. Working with keys

CodeProject

No comments:

Post a Comment