Sunday, January 4, 2009

Improved Datastore performance

Looks like the problem with Datastore performance is that the information was very fine-grained. I created the test following Google's suggestion (look at the tip at the end of the page). So this time I made an opposite test:
  • Instead of having a single integer, each entity has a text with 10,000 characters
  • A half of records is written in transactions by 10 records, and another half - record by record
The results show that the size of entity had no effect unlike entities' count. So it's better to write a few large objects than many small ones.
Also, this time I had a huge difference between the real appengine server and dev_appserver after having many records in the database (real server was much faster). Grouping few records in a transaction also helped. This is the test code:
from google.appengine.ext import db
from time import time

print 'Content-Type: text/plain'
print ''

total_t=time()
class Root(db.Model):
    pass

class C(db.Model):
 i=db.TextProperty()

t1000="a"*10000

def add_in_transaction(root, text, amount):
     for j in range(amount):
        c=C(parent=root, i=text)
        c.put()

print "with transactions - big"
for i in range(5):
    t=time()
    root=Root()
    root.put()
    db.run_in_transaction(add_in_transaction, root, t1000, 10)
    print time()-t
print "without transactions - big"
for i in range(5):
    t=time()
    root=Root()
    root.put()
    add_in_transaction(None, t1000, 10)
    print time()-t
print "without transactions - small"
for i in range(5):
    t=time()
    root=Root()
    root.put()
    add_in_transaction(None, "a", 10)
    print time()-t

print "total time:", time()-total_t
And this is the result
with transactions - big
0.161096096039
0.154489994049
0.367100000381
0.152635812759
0.153033971786
without transactions - big
0.315757989883
0.359083890915
0.559228181839
0.360776901245
0.330877780914
without transactions - small
0.279601812363
0.541454076767
0.324053049088
0.311630964279
0.306309938431
total time: 4.67810916901
I think it's worth to open a bug on appengine documentation so they mention these performance considerations.

P.S. changed the test a little to demonstrate that writing one character or 10K characters has no difference.

2 comments:

Unknown said...

Are you aware that you can write multiple entities - even from different entity groups - in a single db.put() request, without a transaction? The same goes for db.get() and db.delete(). I think you'll find the number of roundtrips has a larger effect than the number of entities.

Andrew Skiba said...

No, I did not know that you can put a few entities without transactions. Will try to test that, but even without testing I'm pretty sure you are right - the number of round trips is what has the effect here.