Showing posts with label appengine. Show all posts
Showing posts with label appengine. Show all posts

Thursday, May 7, 2009

Debugging AppEngine application on NetBeans

Earlier I explained how to open and compile a Java AppEngine application on NetBeans. Now let's see what it takes to debug it.
If you are familiar with remote debug mode of NetBeans, it's actually very easy to connect to a running AppEngine dev_appserver. But first we should open a port to connect to. This is how it's done on Windows.
Edit AppEngine Java SDK dev_appserver script. It's located in appengine-java-sdk-1.2.0/bin folder. There are two versions of this script: for Windows (dev_appserver.cmd) and for Unix/Linux/OSX (dev_appserver.sh). There is also appcfg script, which we will not change. Open the script corresponding to your operating system (File|Open File... in NetBeans). The Windows command script looks like this:
@java -cp "%~dp0\..\lib\appengine-tools-api.jar" ^
    com.google.appengine.tools.KickStart ^
       com.google.appengine.tools.development.DevAppServerMain %*
You need to edit this file to look like this:
@java -cp "%~dp0\..\lib\appengine-tools-api.jar" ^
    com.google.appengine.tools.KickStart ^
       --jvm_flag=-Xdebug ^
       --jvm_flag=-Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=n ^
       com.google.appengine.tools.development.DevAppServerMain %*
This will open port 8000 so a remote debugger can attach. Now open a command prompt and change current directory to appengine-java-sdk-1.2.0. Then run dev_appserver with your application, for example by typing bin\dev_appserver.cmd demos\guestbook\war. This will run the dev_appserver as usual, but this time the debug port is open. It should print on the very first line something like: Listening for transport dt_socket at address: 8000. Now attach to this port from NetBeans. In Debug menu select Attach Debugger.... This will open the following dialog box:

Fill the values like on the screen shot, and press OK. If the debugger attaches correctly, the stop and pause buttons in the toolbar and the corresponding menu items in Debug menu should become enabled.
Let's set a break point now. Press Ctrl-O and type "Greeting" to open a persistent class and set break point in getAuthor method. Now go to http://localhost:8080/guestbook.jsp in your browser and NetBeans should stop on this break point for every record in the guest book. Enjoy!

Friday, April 17, 2009

AppEngine project on NetBeans

This is a beginning of discussion, continued here

Recently Google released an early look of AppEngine for Java. It includes an Eclipse plugin for developing with AppEngine Java SDK. I wanted to check is it possible to develop AppEngine Java application using NetBeans

Environment

  1. Sun JDK 1.6.0_12
  2. NetBeans 6.7 M2 (pre-release)
  3. gae-java-sdk-1.2.0

Opening project

Let's start with basic AppEngine demo Guestbook. It's located in demos/guestbook directory of gae-sdk-java.

Create project wizard

Start the wizard with File|New Project.

Step 1

Select options like this:

Step 2

Enter the location of the project in the edit box, or click browse:

The rest of the lines will be filled automatically.

Step 3

Leave the default options on the "Build and Run Actions" page.

Step 4

Don't change anything on the "Web Sources" page.

Step 5

Click "Next" on "Source Package Folders" page.

Step 6

Click "Add JAR/Folder" on the "Java Sources Classpath" page and add all jars located in war/WEB-INF/lib folder under guestbook root:

Click Finish to leave default settings on the last two pages.

Fixing classpath

The resulting project will look like this:

Two Servlet files have errors because NetBeans has limited abilities on parsing ant build files. It could not extract the compile time dependencies from build.xml, so we pointed to WEB-INF/lib libraries at Step 6. But one of the compile dependencies (Servlet API jar) is located outside of the project tree. It's because this jar is supplied by the application server. In my pre-release version of NetBeans the UI is not able to use dependencies outside of the project tree, but it's easy to work around.

Edit project.xml

Press Ctrl-F2 or select Window|Files to switch to files panel. You see all files under your project root:

Click on + sign to open nbproject, right-click on project.xml and select Edit. This opens internal NetBeans project file which contains all the settings we selected in the wizard. Find line which contains <classpath> element near the end of the file. Go to the end of the line and add the path Servlet API jar. This jar is located in lib/shared folder under the Google AppEngine Java SDK folder, in my case the full path was C:\work\appengine-java-sdk-1.2.0\lib\shared\geronimo-servlet_2.5_spec-1.2.jar

Press Ctrl-S to save the project.xml file and return to Projects pane (press Ctrl-1). Now NetBeans is happy and no errors are reported.

Running the application

You can run this application as you run any project in NetBeans. If this is the main project, simply press F6. When the project is running, you will see the following line in the Output window: The server is running at http://localhost:8080/. You can enter this URL in a browser and start using the Guestbook application. In this post you can find out how to debug AppEngine java web application using NetBeans.

Wednesday, February 4, 2009

AppEngine dev_appserver logging

For some weird reason I cannot debug on dev_appserver. My breakpoints are simply ignored. So I placed logging code in troublesome places. By default, dev_appserver sets the root logger level to INFO. If run with -d option, it's DEBUG, and it logs all environment for every request. I tried to set the root level to WARNING or higher, but it was either ignored or made logger totally silent. So the best option I found is to leave the root level to INFO and to use module loggers for application-specific DEBUG messages. In the __main__ function I added the following lines:
logger=logging.getLogger("my")
logger.setLevel(logging.DEBUG)
Every module has to get its own logger like this:
# module engine.py
import logging

logger=logging.getLogger("my.engine")
Because of the dot separator my.engine logger inherits the configuration of my logger, so DEBUG messages are printed on the console.

I also did not find the correct way to add handlers to my logger, because if I change the logger initialization like this:

logger=logging.getLogger("my")
logger.setLevel(logging.DEBUG)
ch=logging.StreamHandler()
logger.addHandler(ch)
It adds a new handler for every request, so each message is printed many times. Of course, it's possible to remove the handler after run_wsgi_app call, but it looked weird to add and immediately remove the handler every time. If you know a better way to configure logging with dev_appengine, please let me know.

Tuesday, January 6, 2009

Trivial resolution of Datastore performance

In addition to Model.put() Datastore has db.put(). I did not notice the latter can put several entities at once until Arachnid told me so. So in my code I changed this:
for cell in cells:
  cell.put()
To this:
db.put(cells)
That's all what was needed to fix the performance.

Sunday, January 4, 2009

Improved Datastore performance

Looks like the problem with Datastore performance is that the information was very fine-grained. I created the test following Google's suggestion (look at the tip at the end of the page). So this time I made an opposite test:
  • Instead of having a single integer, each entity has a text with 10,000 characters
  • A half of records is written in transactions by 10 records, and another half - record by record
The results show that the size of entity had no effect unlike entities' count. So it's better to write a few large objects than many small ones.
Also, this time I had a huge difference between the real appengine server and dev_appserver after having many records in the database (real server was much faster). Grouping few records in a transaction also helped. This is the test code:
from google.appengine.ext import db
from time import time

print 'Content-Type: text/plain'
print ''

total_t=time()
class Root(db.Model):
    pass

class C(db.Model):
 i=db.TextProperty()

t1000="a"*10000

def add_in_transaction(root, text, amount):
     for j in range(amount):
        c=C(parent=root, i=text)
        c.put()

print "with transactions - big"
for i in range(5):
    t=time()
    root=Root()
    root.put()
    db.run_in_transaction(add_in_transaction, root, t1000, 10)
    print time()-t
print "without transactions - big"
for i in range(5):
    t=time()
    root=Root()
    root.put()
    add_in_transaction(None, t1000, 10)
    print time()-t
print "without transactions - small"
for i in range(5):
    t=time()
    root=Root()
    root.put()
    add_in_transaction(None, "a", 10)
    print time()-t

print "total time:", time()-total_t
And this is the result
with transactions - big
0.161096096039
0.154489994049
0.367100000381
0.152635812759
0.153033971786
without transactions - big
0.315757989883
0.359083890915
0.559228181839
0.360776901245
0.330877780914
without transactions - small
0.279601812363
0.541454076767
0.324053049088
0.311630964279
0.306309938431
total time: 4.67810916901
I think it's worth to open a bug on appengine documentation so they mention these performance considerations.

P.S. changed the test a little to demonstrate that writing one character or 10K characters has no difference.

Datastore performance

Something strange with the performance of the AppEngine Datastore. I tried to run the following code:

from google.appengine.ext import db
from time import time

print 'Content-Type: text/plain'
print ''

total_t=time()

class C(db.Model):
 i=db.IntegerProperty()

for i in range(10):
 t=time()
 for j in range(10):
  c=C(i=i)
  c.save()
 print time()-t

print "total time:", time()-total_t
As you can see, this is a complete python module, not dependent on django or anything else. Just add a corresponding mapping to app.yaml and you can try it by yourself. So the output of this code, which adds 100 records to the Datastore is:
0.307200908661
0.279258012772
0.305376052856
0.310864925385
0.286242008209
0.283288002014
0.299383878708
0.286517858505
0.281584024429
0.268044948578
total time: 2.90873217583

I tried to add 200 records, and got a time-out as AppEngine does not allow long-running queries. I had pretty similar timings on the dev_appserver. This is very slow, and I cannot understand where is the catch.

Saturday, January 3, 2009

Querying for None in Datastore

I got a weird problem with GAE Datastore, when tried to search for None value. If I use gql, then the query works as expected:

from game.models import *
for c in Cell.gql("WHERE game=:g", g=None):
 print c

The above code prints the expected cells which are not bound to any game. But I need to iterate through cells of a certain board type, so instead of Cell.gql I start from board.cell_set and am trying to define a filter on game=None. The following code should give the same outcome as the previous one:

from game.models import *
for c in Cell.all().filter("game=", None):
 print c

But this time I get no results. Why?

Cached ReferenceProperty: now with round trip

One thing was really missing in a CachedReferenceProperty - cached round trip. Suppose we have the following one-to-many relationship:

class Master(db.Model):
  pass

class Detail(db.Model):
  master=CachedReferenceProperty(Master)

By cached round trip here I mean that when a master holds a cached collection of details, those details reference the same master, so going back and forth from master to details does not make any database hits.

To make it possible, I replaced collection builder in _CachedReverseReferenceProperty from this:

  res=[c for c in query]

to this:

  res=[]
  for c in query:
    resolved_name='_RESOLVED_'+self.__prop #WARNING: using internal
    setattr(c, resolved_name, model_instance)
    res += [c]

Very ugly, need an idea how to eliminate using internal attribute. The whole source file is here.

Sunday, December 28, 2008

Hello, google-app-engine-django

Going to try Django patch for Appengine. These are my problems in the genuine Appengine with built-in Django:

Documentation
It is not satisfactory - covers everything, but not enough details.
Debugging
Still cannot debug, get opaque tracebacks every time when trying.
Testing
Explained the problem here.
Flexible persistance and caching
Too much hacking necessary for trivial things, like cached one-to-many collections.

To be honest, I'm thinking to switch to the full stack of Django with a normal database. Can find less and less reasons to stick to Appengine.

Friday, December 26, 2008

what memcache stores?

Now after I learned how to store one-to-many collections with AppEngine datastore, I want to use Memcache API to eliminate DB hits between requests. Quick test shows it would not work out of the box:

class C(object):
 def __init__(self):
   self.field='field'

c=C()
c.attr='attr'
c.__dict__['dict']='dict'
print dir(c)
memcache.add('c', c)
c=memcache.get("c")
print dir(c)

I would expect this code to output the same list of public attributes before and after, but it does not happen. We can see that only c.field was saved, while c.attr and c.dict are missing. Need to investigate memcache internals to understand why.

Thursday, December 25, 2008

Cached ReferenceProperty

Piece of cake

Earlier I wrote about my wish to subclass ReferenceProperty so the collection would not be fetched every time I iterate though it. Well, it was so easy I can post the whole implementation here.
from google.appengine.ext import db

class CachedReferenceProperty(db.ReferenceProperty):

  def __property_config__(self, model_class, property_name):
    super(CachedReferenceProperty, self).__property_config__(model_class,
                                                       property_name)
    #Just carelessly override what super made
    setattr(self.reference_class,
            self.collection_name,
            _CachedReverseReferenceProperty(model_class, property_name,
                self.collection_name))

class _CachedReverseReferenceProperty(db._ReverseReferenceProperty):

    def __init__(self, model, prop, collection_name):
        super(_CachedReverseReferenceProperty, self).__init__(model, prop)
        self.__collection_name = collection_name

    def __get__(self, model_instance, model_class):
        if model_instance is None:
            return self
        if self.__collection_name in model_instance.__dict__:# why does it get here at all?
            return model_instance.__dict__[self.__collection_name]

        query=super(_CachedReverseReferenceProperty, self).__get__(model_instance,
            model_class)
        #replace the attribute on the instance
        res=[c for c in query]
        model_instance.__dict__[self.__collection_name]=res
        return res

    def __delete__ (self, model_instance):
        if model_instance is not None:
            del model_instance.__dict__[self.__collection_name]
Having these classes now we can rewrite previous example as:
class Master(db.Model):
  pass

class Detail(db.Model):
  master=CachedReferenceProperty(Master)
Try to run the same cycle and you will see it executes instantly even with 100,000 iterations instead of 1000.

Is it a free cake?

Not exactly. Try this:
m=Master()
m.put()
d1=Detail(master=m)
d1.put()
print m.detail_set
d2=Detail(master=m)
d2.put()
print m.detail_set
The second time it returned a wrong result, which did not include d2. So we need a way to reset the cached value and fetch up-to-date values from the datastore. Fortunately, it's achieved easily:
del m.detail_set
print m.detail_set
This is why I implemented _CachedReverseReferenceProperty.__delete__. When m.__dict__ has no key'detail_set', m.detail_set is dispatched to type(m).__dict__('detail_set'), and there I call the base class to access the datastore. What surprised me is when I do have m.__dict__('detail_set'), m.detail_set is still dispatched to Master.__dict__('detail_set'). I don't understand why that happens, so I worked around this problem. Have to learn Python better to answer that question.

Wednesday, December 24, 2008

AppEngine Datastore and memcache

I miss Hibernate collections. In the following code I access the collection a thousand times:

class Master(db.Model):
  pass

class Detail(db.Model):
  master=db.ReferenceProperty(Master)

m=Master()
m.put()
d=Detail(master=m)
d.put()

for i in range(1000):
  for tmp_d in m.detail_set:
    pass

The above code takes a few second to execute. The reason is Datastore fetches the collection from the storage every time, and in Hibernate the collection would be fetched from the database only once until the end of the session. Oops, no sessions with Datastore. So Datastore developers were right when they opted to fetch collection every time - they don't know when the details change.

This is the reason Master cannot be put in memcache effectively: it would be stored without the Details. Master.detail_set holds only the definition of the query needed to get the details. So I'm thinking of a way I could decorate ReferenceProperty to make one-to-many relations suitable for the memcache. So big object trees will be read from Datastore once and then accessible in a fast way.

Saturday, December 20, 2008

Polymorphism in AppEngine Datastore Models

There is a problem with inherited classes in AppEngine

Let's suppose we have the following models:

class Master(db.Model):
  mp = db.StringProperty()

class Detail(db.Model):
  dp = db.StringProperty()
  master = db.ReferenceProperty(Master)
When these are declared, Datastore appends automatically Detail_set property to the Master. So if we made
m=Master(mp='foo')
m.put()
d1=Detail(dp='bar', master=m)
d1.put()
d2=Detail(dp='zee', master=m)
d2.put()
then we have m.Detail_set property which will fetch [d1, d2]. But if we define
class MoreDetail (Detail):
  mdp=db.StringProperty()

d3=MoreDetail (dp='org', mdp='jee', master=m)
d3.put()
then m.detail_set will fetch the third d3 but de-serialize it as Detail instead of MoreDetail class. Here is how I checked it:
>>> for d in m.detail_set.fetch(10):
...  print d.properties()
{'master': <ReferenceProperty object at 0x018B8330>, 'dp': <StringProperty object at 0x023A8C10>}
{'master': <ReferenceProperty object at 0x018B8330>, 'dp': <StringProperty object at 0x023A8C10>}
{'master': <ReferenceProperty object at 0x018B8330>, 'dp': <StringProperty object at 0x023A8C10>}
One of these objects should have an mdp property defined in MoreDetail, but that did not happen.