Saturday, January 31, 2009

Inherited classes in Hibernate

Few days ago I made some refactoring of a Hibernate based JavaEE application. There was a table and a view on that table which included all columns, like this:
CREATE TABLE Person (
  Id NUMBER,
  Name VARCHAR(10),
  BirthDate DATE,
);

CREATE VIEW PersonExtended AS
  SELECT p.*, YearsFromNow(p.BirthDate) AS Age FROM Person p;
Assuming we have a corresponding function this view will include all columns from Person and have an additional column named Age. Before refactoring, there were 2 corresponding entity classes. In the actual code entities have full annotated getters and corresponding setters, but for readability I'll use the most compact and not recommended format here:
//Person.java

@Entity
class Person {
  @Id long id;
  String name;
  Date birthDate;
}

//PersonExtended.java

@Entity
class PersonExtended {
  @Id long id;
  String name;
  Date birthDate;
  int age;
}

Friday, January 9, 2009

Python class slots

Today I came over __slots__ feature of Python. It's used to define the list of possible attributes at the class creation time, so by default no dictionary is kept for every instance. This can save memory, if such instances are stored in big lists. To use slots, class should be defined like this:
class Point(object):
  __slots__=["x","y"]
The next example demonstrates the difference between a class with slots and a regular class.
class OldPoint(object):
  pass

p=OldPoint()
p.x=10
p.y=20   # these are OK
p.z=30   # this is OK as well - any attributes are allowed

p=Point()
p.x=10
p.y=20   # this are OK
p.z=30   # this causes AttributeError: 'Point' object has no attribute 'z'
Defining __slots__ affects not only the dictionary of the instances, but also the way they are serialized (or pickled in Python terminolodgy). Also a weak reference (__weakref__) is not enabled by default (can be overriden)

Links

Tuesday, January 6, 2009

lj-cut on blogger

LifeJournal has a useful feature lj-cut. It allows to show only a part of the post on the main page, and reveal the rest on a separate page. I was looking for a similar feature on blogger, as my posts with code examples are quite lengthy.

Trivial resolution of Datastore performance

In addition to Model.put() Datastore has db.put(). I did not notice the latter can put several entities at once until Arachnid told me so. So in my code I changed this:
for cell in cells:
  cell.put()
To this:
db.put(cells)
That's all what was needed to fix the performance.

Sunday, January 4, 2009

Improved Datastore performance

Looks like the problem with Datastore performance is that the information was very fine-grained. I created the test following Google's suggestion (look at the tip at the end of the page). So this time I made an opposite test:
  • Instead of having a single integer, each entity has a text with 10,000 characters
  • A half of records is written in transactions by 10 records, and another half - record by record
The results show that the size of entity had no effect unlike entities' count. So it's better to write a few large objects than many small ones.

Datastore performance

Something strange with the performance of the AppEngine Datastore. I tried to run the following code:

from google.appengine.ext import db
from time import time

print 'Content-Type: text/plain'
print ''

total_t=time()

class C(db.Model):
 i=db.IntegerProperty()

for i in range(10):
 t=time()
 for j in range(10):
  c=C(i=i)
  c.save()
 print time()-t

print "total time:", time()-total_t

Saturday, January 3, 2009

Querying for None in Datastore

I got a weird problem with GAE Datastore, when tried to search for None value. If I use gql, then the query works as expected:

from game.models import *
for c in Cell.gql("WHERE game=:g", g=None):
 print c

The above code prints the expected cells which are not bound to any game. But I need to iterate through cells of a certain board type, so instead of Cell.gql I start from board.cell_set and am trying to define a filter on game=None. The following code should give the same outcome as the previous one:

from game.models import *
for c in Cell.all().filter("game=", None):
 print c

But this time I get no results. Why?

Cached ReferenceProperty: now with round trip

One thing was really missing in a CachedReferenceProperty - cached round trip. Suppose we have the following one-to-many relationship:

class Master(db.Model):
  pass

class Detail(db.Model):
  master=CachedReferenceProperty(Master)

By cached round trip here I mean that when a master holds a cached collection of details, those details reference the same master, so going back and forth from master to details does not make any database hits.

To make it possible, I replaced collection builder in _CachedReverseReferenceProperty from this:

  res=[c for c in query]

to this:

  res=[]
  for c in query:
    resolved_name='_RESOLVED_'+self.__prop #WARNING: using internal
    setattr(c, resolved_name, model_instance)
    res += [c]

Very ugly, need an idea how to eliminate using internal attribute. The whole source file is here.