Sunday, December 28, 2008

Hello, google-app-engine-django

Going to try Django patch for Appengine. These are my problems in the genuine Appengine with built-in Django:

Documentation
It is not satisfactory - covers everything, but not enough details.
Debugging
Still cannot debug, get opaque tracebacks every time when trying.
Testing
Explained the problem here.
Flexible persistance and caching
Too much hacking necessary for trivial things, like cached one-to-many collections.

To be honest, I'm thinking to switch to the full stack of Django with a normal database. Can find less and less reasons to stick to Appengine.

Friday, December 26, 2008

what memcache stores?

Now after I learned how to store one-to-many collections with AppEngine datastore, I want to use Memcache API to eliminate DB hits between requests. Quick test shows it would not work out of the box:

class C(object):
 def __init__(self):
   self.field='field'

c=C()
c.attr='attr'
c.__dict__['dict']='dict'
print dir(c)
memcache.add('c', c)
c=memcache.get("c")
print dir(c)

I would expect this code to output the same list of public attributes before and after, but it does not happen. We can see that only c.field was saved, while c.attr and c.dict are missing. Need to investigate memcache internals to understand why.

Thursday, December 25, 2008

Cached ReferenceProperty

Piece of cake

Earlier I wrote about my wish to subclass ReferenceProperty so the collection would not be fetched every time I iterate though it. Well, it was so easy I can post the whole implementation here.
from google.appengine.ext import db

class CachedReferenceProperty(db.ReferenceProperty):

  def __property_config__(self, model_class, property_name):
    super(CachedReferenceProperty, self).__property_config__(model_class,
                                                       property_name)
    #Just carelessly override what super made
    setattr(self.reference_class,
            self.collection_name,
            _CachedReverseReferenceProperty(model_class, property_name,
                self.collection_name))

class _CachedReverseReferenceProperty(db._ReverseReferenceProperty):

    def __init__(self, model, prop, collection_name):
        super(_CachedReverseReferenceProperty, self).__init__(model, prop)
        self.__collection_name = collection_name

    def __get__(self, model_instance, model_class):
        if model_instance is None:
            return self
        if self.__collection_name in model_instance.__dict__:# why does it get here at all?
            return model_instance.__dict__[self.__collection_name]

        query=super(_CachedReverseReferenceProperty, self).__get__(model_instance,
            model_class)
        #replace the attribute on the instance
        res=[c for c in query]
        model_instance.__dict__[self.__collection_name]=res
        return res

    def __delete__ (self, model_instance):
        if model_instance is not None:
            del model_instance.__dict__[self.__collection_name]
Having these classes now we can rewrite previous example as:
class Master(db.Model):
  pass

class Detail(db.Model):
  master=CachedReferenceProperty(Master)
Try to run the same cycle and you will see it executes instantly even with 100,000 iterations instead of 1000.

Is it a free cake?

Not exactly. Try this:
m=Master()
m.put()
d1=Detail(master=m)
d1.put()
print m.detail_set
d2=Detail(master=m)
d2.put()
print m.detail_set
The second time it returned a wrong result, which did not include d2. So we need a way to reset the cached value and fetch up-to-date values from the datastore. Fortunately, it's achieved easily:
del m.detail_set
print m.detail_set
This is why I implemented _CachedReverseReferenceProperty.__delete__. When m.__dict__ has no key'detail_set', m.detail_set is dispatched to type(m).__dict__('detail_set'), and there I call the base class to access the datastore. What surprised me is when I do have m.__dict__('detail_set'), m.detail_set is still dispatched to Master.__dict__('detail_set'). I don't understand why that happens, so I worked around this problem. Have to learn Python better to answer that question.

Wednesday, December 24, 2008

Testing AppEngine django application with Nose

Here Shlomo gives a basic example of creating unit tests for AppEngine application.

I tried to adopt this method in my project and it gave me a strange trace:

----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Library/Python/2.5/site-packages/nose-0.10.4-py2.5.egg/nose/loader.py", line 364, in loadTestsFromName
    addr.filename, addr.module)
  File "/Library/Python/2.5/site-packages/nose-0.10.4-py2.5.egg/nose/importer.py", line 39, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/Library/Python/2.5/site-packages/nose-0.10.4-py2.5.egg/nose/importer.py", line 84, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
ImportError: Loaded module game.test_models not found in sys.modules

First I thought that nosegae installation is invalid. But then I found out that nose --with-gae does not like when a test module is not in a root directory of the project. So I moved test_models.py from game package into the root package and it worked as expected.

AppEngine Datastore and memcache

I miss Hibernate collections. In the following code I access the collection a thousand times:

class Master(db.Model):
  pass

class Detail(db.Model):
  master=db.ReferenceProperty(Master)

m=Master()
m.put()
d=Detail(master=m)
d.put()

for i in range(1000):
  for tmp_d in m.detail_set:
    pass

The above code takes a few second to execute. The reason is Datastore fetches the collection from the storage every time, and in Hibernate the collection would be fetched from the database only once until the end of the session. Oops, no sessions with Datastore. So Datastore developers were right when they opted to fetch collection every time - they don't know when the details change.

This is the reason Master cannot be put in memcache effectively: it would be stored without the Details. Master.detail_set holds only the definition of the query needed to get the details. So I'm thinking of a way I could decorate ReferenceProperty to make one-to-many relations suitable for the memcache. So big object trees will be read from Datastore once and then accessible in a fast way.

Saturday, December 20, 2008

Polymorphism in AppEngine Datastore Models - continued

Playing around with the Master and Details classes in the previous post, I tried to subclass a Master this time:
class MoreMaster(Master):
 mmp = db.StringProperty()

mm=MoreMaster(mp='f', mmp='g')
mm.put()

d4 = Detail(dp='h', master=mm)
d4.put()

for d in mm.detail_set:
 print d.master.mmp
It printed 'g', thus resolving correctly that d4.master is derived class. So the "many to one" relation supports polymorphism, but "one to many" does not. The reason is db.Key already contains the name of the Model class. So when we do Master.get('LongLongKeyHere') the Datastore is able to create the correct class.

Polymorphism in AppEngine Datastore Models

There is a problem with inherited classes in AppEngine

Let's suppose we have the following models:

class Master(db.Model):
  mp = db.StringProperty()

class Detail(db.Model):
  dp = db.StringProperty()
  master = db.ReferenceProperty(Master)
When these are declared, Datastore appends automatically Detail_set property to the Master. So if we made
m=Master(mp='foo')
m.put()
d1=Detail(dp='bar', master=m)
d1.put()
d2=Detail(dp='zee', master=m)
d2.put()
then we have m.Detail_set property which will fetch [d1, d2]. But if we define
class MoreDetail (Detail):
  mdp=db.StringProperty()

d3=MoreDetail (dp='org', mdp='jee', master=m)
d3.put()
then m.detail_set will fetch the third d3 but de-serialize it as Detail instead of MoreDetail class. Here is how I checked it:
>>> for d in m.detail_set.fetch(10):
...  print d.properties()
{'master': <ReferenceProperty object at 0x018B8330>, 'dp': <StringProperty object at 0x023A8C10>}
{'master': <ReferenceProperty object at 0x018B8330>, 'dp': <StringProperty object at 0x023A8C10>}
{'master': <ReferenceProperty object at 0x018B8330>, 'dp': <StringProperty object at 0x023A8C10>}
One of these objects should have an mdp property defined in MoreDetail, but that did not happen.

Thursday, December 18, 2008

JavaEdge 2008

Today I attended Java Edge, and want to share my impression while it's fresh in my memory.

Java for the cloud by Baruch Sadogursky

In 19 century every manufacturer was running their private electricity generator. Eventually people started to consume electricity provided by a few big companies. In the same way now the tendency will be to stop running our own servers and to use the computing power and the storage provided as a service. Today leading services are Google AppEngine, Amazon S3 and EC2, goGrid, AppNexus, FlexiScale. Soon there will be Microsoft Azure, and there are rumors about Yahoo Cloud.

Very important for today's crisis state of mind: fail cheap. That means that we don't spend a lot up front to buy servers, to pay for installations, etc.

Overall this area is very important.

OSGi and SpringSource application platform by Alef Arendsen from SpringSource

Most of the session Alef demonstrated trivial OSGi stuff. But there is one amazing feature which is unique in Spring - ability to easily share service instances. That means 2 or more OSGi consumers get the same service instance from the container. So they can share cache or state there. It is achieved very easily in Spring. Declare the service bean with osgi:service and wire it to consumer with osgi:reference. The service bean does not even have to be serializable, as no remoting happens behind the scene. Voila, beans are wired, with all dynamic features, like automatically rewiring consumers to another provider when current goes down.

Smoothing your Java with DSLs by Dmitry Jemerov

I'm very glad I chose this session. Dmitry demonstrated how heavy, sure-to-be-copy-pasted code can be transformed into something which a non-programmer can read and may be even write. What I liked the most, is how he prepared this session going every time one step further from the original code - first old-fashion java code, then applied fluent API patterns, then used Groovy to make it even more readable, then using JetBrains MPS to make text format to be translated into java. And the computer generated Java code looked exactly like the original code, thanks to powerful MPS declarations. Very impressive, Dmitry.

Taking control of your Maven build by Yossi Shaul

Yossi shared his experience with troubleshooting Maven builds and gave some really useful tips. In short summary it's impossible to list all the general recommendations so I'll just list some important maven commands here:

mvn help:effective-pom
mvn help:effective-settings
mvn dependency:list
mvn dependency:tree
mvn dependency:analyze -DignoreNonCompile
mvn dependency:analyze-dep-mgt
mvn project-info-reports:dependency-convergence
mvn enforcer:enfore

The last one has to be accompanied by some restrictions in parent pom, like don't allow to rely on some dangerous defaults, etc. See Yossi's presentation for the details.

Testing Web UI by Alex Waisburd

Good overview of web UI testing tools. Focus on Selenium with a demonstration of a simple way to perform integration test from the nightly build server. I saw similar capabilities in Microsoft Team System and was really missing them in Java land. With Selenium it's possible to record necessary user actions with Firefox, to generate Java code from this recording, and to replay them when needed. Selenium Java API enables to access URLs, to make any user actions (clicks, type text) on the returned page, to wait until the page is changed by AJAX, and to check if the resulting page contains the expected elements. Wow!

General impression: interesting and well-organized event, became better from the last year. Hope to be there again in 2009 :-)

Slides are available here: http://www.javaedge.net/sessions.html