WEB Farms and Caching

    To generate a users profile, we would need more than one SQL query (friends, album names and picture counts, profile information, last status etc.).

    As long as a user didn’t update her profile, the information that is shown on her page would be almost stastic. Thus, a snapshot of profile pages could be cached for 5 minutes or 1 hour etc.

    At a certain time, all these servers might have cached a very important persons (VIP) profile in local cache. When the VIP makes a change in her profile, all these servers should renew their local cached profile, and this would happen in a few seconds. We now have a problem of load per server instead of load per user.

    Actually, once one these of servers loaded the VIP profile from SQL database and cached it, other servers could make use of the same information without hitting database. But, as each server stores cached information in its own local memory, it is not trivial to access this information by other servers.

    Let’s call this memory the distributed cache. If all servers have a look at this common memory before trying DB we would avoid the load per server problem.

    You can find many variations of distributed cache systems including Memcached, Couchbase and Redis. They are also called NoSQL database. You can think of them simply as a remote dictionary. They store key/value pairs in their memory and let you access them as fast as possible.

    When the cached data becomes too much, one computer memory might be not enough to store all key/value pairs. In this case servers like memcached distribute data by clustering. This could be done by the first letter of keys. One server could hold pairs starting with A, other with B etc. In fact, they use hash of keys for this purpose.