[GAE] iterate through all entities and process them.
When I start to code on google app engine, I have the strong need to iterate through and
process all my entities (in background). I was caught by two things. The 30 seconds
limitation makes this a little tough. The lack of “Task queue API" make this task almost
The easiest way to get rid of DeadlineExceededError (DEE) is execute your routine by
using remote_API_shell. There is no such constrains under remote API shell. However, the
drawback is the bad performance when your code involve with entity query/update.
The release of task queue API is a big step of GAE (1.2.3). It makes life easier.
Recently, I have a neat solution to my problem. Combine two articles in GAE with some
First, the mapper class (in article 1) abstracts the concept of iteration through entities. It
use __key__ as a default sort condition. Which is not suitable in my case. So i added a
property to make it configurable. ( when I google, i found there is already a document did
exactly what I did. Pretty cool. )
Once you have a way to iterate, you have to worry about DEE. The original post catch the
DEE and push it back to task queue again (by calling deferred.defer). Which in my
experience is totally unnecessary and might duplicate the task. When encounter DEE, we
only have less then a second to response, intercept DEE and push it back not a good idea.
Because you mostly run out of time and then trigger a HDEE.
I check the document on how task queue handle the case that a task throw exception. It
turns out that GAE takes care of everything. According to task queue api document, if the
request handler return status code outside off the range 200~299, GAE will retry it at least
once a day.
Leverage with this feature, I think the best way to do is save the task state and read it back
when GAE retry it. Therefore, the task can continue from where it stop.
So, the idea is pretty clear now.
1. Prepare your query
2. Override map( entity ) function to process every entity
3. Write code to serialize/unserialize state to/from memcache. ( start_key, to_put,
When the task starts to run, check if there is any unfinished job left, if there are, do them
first, if not, continue the task by given parameters. When DEE occurs, serialize the state.
I don’t have time to structure the code to post. If anyone is interested, drop me a mail.