Hacks

    Optimistic locking is useful in situations where you might ordinarily use a SELECT FOR UPDATE (or in SQLite, BEGIN IMMEDIATE). For example, you might fetch a user record from the database, make some modifications, then save the modified user record. Typically this scenario would require us to lock the user record for the duration of the transaction, from the moment we select it, to the moment we save our changes.

    In optimistic locking, on the other hand, we do not acquire any lock and instead rely on an internal version column in the row we’re modifying. At read time, we see what version the row is currently at, and on save, we ensure that the update takes place only if the version is the same as the one we initially read. If the version is higher, then some other process must have snuck in and changed the row – to save our modified version could result in the loss of important changes.

    It’s quite simple to implement optimistic locking in Peewee, here is a base class that you can use as a starting point:

    Here’s an example of how this works. Let’s assume we have the following model definition. Note that there’s a unique constraint on the username – this is important as it provides a way to prevent double-inserts.

    1. username = CharField(unique=True)
    2. favorite_animal = CharField()

    Example:

    1. >>> u = User(username='charlie', favorite_animal='cat')
    2. >>> u.save_optimistic()
    3. True
    4. >>> u.version
    5. 1
    6. >>> u.save_optimistic()
    7. Traceback (most recent call last):
    8. File "<stdin>", line 1, in <module>
    9. File "x.py", line 18, in save_optimistic
    10. raise ValueError('No changes have been made.')
    11. ValueError: No changes have been made.
    12. >>> u.favorite_animal = 'kitten'
    13. >>> u.save_optimistic()
    14. True
    15. # Simulate a separate thread coming in and updating the model.
    16. >>> u2 = User.get(User.username == 'charlie')
    17. >>> u2.favorite_animal = 'macaw'
    18. >>> u2.save_optimistic()
    19. True
    20. # Now, attempt to change and re-save the original instance:
    21. >>> u.favorite_animal = 'little parrot'
    22. >>> u.save_optimistic()
    23. Traceback (most recent call last):
    24. File "<stdin>", line 1, in <module>
    25. File "x.py", line 30, in save_optimistic
    26. raise ConflictDetectedException()
    27. ConflictDetectedException: current version is out of sync

    These examples describe several ways to query the single top item per group. For a thorough discuss of various techniques, check out my blog post Querying the top item by group with Peewee ORM. If you are interested in the more general problem of querying the top N items, see the section below .

    In these examples we will use the User and Tweet models to find each user and their most-recent tweet.

    The most efficient method I found in my testing uses the MAX() aggregate function.

    1. # When referencing a table multiple times, we'll call Model.alias() to create
    2. # a secondary reference to the table.
    3. TweetAlias = Tweet.alias()
    4. # Create a subquery that will calculate the maximum Tweet create_date for each
    5. # user.
    6. subquery = (TweetAlias
    7. .select(
    8. TweetAlias.user,
    9. fn.MAX(TweetAlias.create_date).alias('max_ts'))
    10. .group_by(TweetAlias.user)
    11. .alias('tweet_max_subquery'))
    12. # Query for tweets and join using the subquery to match the tweet's user
    13. # and create_date.
    14. query = (Tweet
    15. .select(Tweet, User)
    16. .join(User)
    17. .switch(Tweet)
    18. .join(subquery, on=(
    19. (Tweet.user == subquery.c.user_id))))

    SQLite and MySQL are a bit more lax and permit grouping by a subset of the columns that are selected. This means we can do away with the subquery and express it quite concisely:

    1. query = (Tweet
    2. .select(Tweet, User)
    3. .join(User)
    4. .group_by(Tweet.user)
    5. .having(Tweet.create_date == fn.MAX(Tweet.create_date)))

    These examples describe several ways to query the top N items per group reasonably efficiently. For a thorough discussion of various techniques, check out my blog post Querying the top N objects per group with Peewee ORM.

    In these examples we will use the User and Tweet models to find each user and their three most-recent tweets.

    are a neat Postgres feature that allow reasonably efficient correlated subqueries. They are often described as SQL for each loops.

    The desired SQL is:

    To accomplish this with peewee we’ll need to express the lateral join as a , which gives us greater flexibility than the join() method.

    1. # We'll reference `Tweet` twice, so keep an alias handy.
    2. TweetAlias = Tweet.alias()
    3. # The "outer loop" will be iterating over the users whose
    4. # tweets we are trying to find.
    5. user_query = User.select(User.id, User.username).alias('uq')
    6. # The inner loop will select tweets and is correlated to the
    7. # outer loop via the WHERE clause. Note that we are using a
    8. # LIMIT clause.
    9. tweet_query = (TweetAlias
    10. .select(TweetAlias.message, TweetAlias.create_date)
    11. .where(TweetAlias.user == user_query.c.id)
    12. .order_by(TweetAlias.create_date.desc())
    13. .limit(3)
    14. .alias('pq'))
    15. # Now we join the outer and inner queries using the LEFT LATERAL
    16. # JOIN. The join predicate is *ON TRUE*, since we're effectively
    17. # joining in the tweet subquery's WHERE clause.
    18. join_clause = Clause(
    19. user_query,
    20. SQL('LEFT JOIN LATERAL'),
    21. tweet_query,
    22. SQL('ON %s', True))
    23. # Finally, we'll wrap these up and SELECT from the result.
    24. query = (Tweet
    25. .select(SQL('*'))
    26. .from_(join_clause))

    Window functions, which are , provide scalable, efficient performance.

    The desired SQL is:

    1. SELECT subq.message, subq.username
    2. FROM (
    3. SELECT
    4. t2.message,
    5. t3.username,
    6. RANK() OVER (
    7. PARTITION BY t2.user_id
    8. ORDER BY t2.create_date DESC
    9. ) AS rnk
    10. FROM tweet AS t2
    11. INNER JOIN user AS t3 ON (t2.user_id = t3.id)
    12. ) AS subq
    13. WHERE (subq.rnk <= 3)
    1. TweetAlias = Tweet.alias()
    2. # The subquery will select the relevant data from the Tweet and
    3. # User table, as well as ranking the tweets by user from newest
    4. # to oldest.
    5. subquery = (TweetAlias
    6. .select(
    7. TweetAlias.message,
    8. User.username,
    9. fn.RANK().over(
    10. partition_by=[TweetAlias.user],
    11. order_by=[TweetAlias.create_date.desc()]).alias('rnk'))
    12. .join(User, on=(TweetAlias.user == User.id))
    13. # Since we can't filter on the rank, we are wrapping it in a query
    14. # and performing the filtering in the outer query.
    15. query = (Tweet
    16. .select(subquery.c.message, subquery.c.username)
    17. .where(subquery.c.rnk <= 3))

    If you’re not using Postgres, then unfortunately you’re left with options that exhibit less-than-ideal performance. For a more complete overview of common methods, check out this blog post. Below I will summarize the approaches and the corresponding SQL.

    Using COUNT, we can get all tweets where there exist less than N tweets with more recent timestamps:

    1. TweetAlias = Tweet.alias()
    2. # Create a correlated subquery that calculates the number of
    3. # tweets with a higher (newer) timestamp than the tweet we're
    4. # looking at in the outer query.
    5. subquery = (TweetAlias
    6. .select(fn.COUNT(TweetAlias.id))
    7. .where(
    8. (TweetAlias.create_date >= Tweet.create_date) &
    9. (TweetAlias.user == Tweet.user)))
    10. # Wrap the subquery and filter on the count.
    11. query = (Tweet
    12. .select(Tweet, User)
    13. .join(User)
    14. .where(subquery <= 3))

    We can achieve similar results by doing a self-join and performing the filtering in the HAVING clause:

    The last example uses a LIMIT clause in a correlated subquery.

    1. TweetAlias = Tweet.alias()
    2. # The subquery here will calculate, for the user who created the
    3. # tweet in the outer loop, the three newest tweets. The expression
    4. # will evaluate to `True` if the outer-loop tweet is in the set of
    5. # tweets represented by the inner query.
    6. query = (Tweet
    7. .select(Tweet, User)
    8. .join(User)
    9. .where(Tweet.id << (
    10. TweetAlias
    11. .select(TweetAlias.id)
    12. .where(TweetAlias.user == Tweet.user)
    13. .order_by(TweetAlias.create_date.desc())
    14. .limit(3))))

    SQLite is very easy to extend with custom functions written in Python, that are then callable from your SQL statements. By using the and the func() decorator, you can very easily define your own functions.

    Here is an example function that generates a hashed version of a user-supplied password. We can also use this to implement login functionality for matching a user and password.

    1. from hashlib import sha1
    2. from random import random
    3. from playhouse.sqlite_ext import SqliteExtDatabase
    4. db = SqliteExtDatabase('my-blog.db')
    5. def get_hexdigest(salt, raw_password):
    6. data = salt + raw_password
    7. return sha1(data.encode('utf8')).hexdigest()
    8. @db.func()
    9. def make_password(raw_password):
    10. salt = get_hexdigest(str(random()), str(random()))[:5]
    11. hsh = get_hexdigest(salt, raw_password)
    12. return '%s$%s' % (salt, hsh)
    13. @db.func()
    14. def check_password(raw_password, enc_password):
    15. salt, hsh = enc_password.split('$', 1)
    16. return hsh == get_hexdigest(salt, raw_password)

    Here is how you can use the function to add a new user, storing a hashed password:

    1. query = User.insert(
    2. username='charlie',
    3. password=fn.make_password('testing')).execute()

    If we retrieve the user from the database, the password that’s stored is hashed and salted:

    1. >>> user = User.get(User.username == 'charlie')