Showing posts with label solution. Show all posts
Showing posts with label solution. Show all posts

Tuesday, January 6, 2009

Trivial resolution of Datastore performance

In addition to Model.put() Datastore has db.put(). I did not notice the latter can put several entities at once until Arachnid told me so. So in my code I changed this:
for cell in cells:
  cell.put()
To this:
db.put(cells)
That's all what was needed to fix the performance.

Thursday, December 25, 2008

Cached ReferenceProperty

Piece of cake

Earlier I wrote about my wish to subclass ReferenceProperty so the collection would not be fetched every time I iterate though it. Well, it was so easy I can post the whole implementation here.
from google.appengine.ext import db

class CachedReferenceProperty(db.ReferenceProperty):

  def __property_config__(self, model_class, property_name):
    super(CachedReferenceProperty, self).__property_config__(model_class,
                                                       property_name)
    #Just carelessly override what super made
    setattr(self.reference_class,
            self.collection_name,
            _CachedReverseReferenceProperty(model_class, property_name,
                self.collection_name))

class _CachedReverseReferenceProperty(db._ReverseReferenceProperty):

    def __init__(self, model, prop, collection_name):
        super(_CachedReverseReferenceProperty, self).__init__(model, prop)
        self.__collection_name = collection_name

    def __get__(self, model_instance, model_class):
        if model_instance is None:
            return self
        if self.__collection_name in model_instance.__dict__:# why does it get here at all?
            return model_instance.__dict__[self.__collection_name]

        query=super(_CachedReverseReferenceProperty, self).__get__(model_instance,
            model_class)
        #replace the attribute on the instance
        res=[c for c in query]
        model_instance.__dict__[self.__collection_name]=res
        return res

    def __delete__ (self, model_instance):
        if model_instance is not None:
            del model_instance.__dict__[self.__collection_name]
Having these classes now we can rewrite previous example as:
class Master(db.Model):
  pass

class Detail(db.Model):
  master=CachedReferenceProperty(Master)
Try to run the same cycle and you will see it executes instantly even with 100,000 iterations instead of 1000.

Is it a free cake?

Not exactly. Try this:
m=Master()
m.put()
d1=Detail(master=m)
d1.put()
print m.detail_set
d2=Detail(master=m)
d2.put()
print m.detail_set
The second time it returned a wrong result, which did not include d2. So we need a way to reset the cached value and fetch up-to-date values from the datastore. Fortunately, it's achieved easily:
del m.detail_set
print m.detail_set
This is why I implemented _CachedReverseReferenceProperty.__delete__. When m.__dict__ has no key'detail_set', m.detail_set is dispatched to type(m).__dict__('detail_set'), and there I call the base class to access the datastore. What surprised me is when I do have m.__dict__('detail_set'), m.detail_set is still dispatched to Master.__dict__('detail_set'). I don't understand why that happens, so I worked around this problem. Have to learn Python better to answer that question.