Architecture

    Puma is a threaded web server, processing requests across a TCP or UNIX socket.

    Clustered mode is shown/discussed here. Single mode is analogous to having a single worker process.

    Connection pipeline

    • Upon startup, Puma listens on a TCP or UNIX socket.
      • This socket backlog is distinct from the “backlog” of work as reported by the control server stats. The latter is the number of connections in that worker’s “todo” set waiting for a worker thread.
    • By default, a single, separate thread is used to receive HTTP requests across the socket.
      • When at least one worker thread is available for work, a connection is accepted and placed in this request buffer
      • This thread waits for entire HTTP requests to be received over the connection
    • Worker threads pop work off the “todo” set for processing
      • The thread processes the request via the rack application (which generates the HTTP response)
      • The thread writes the response to the connection
      • Finally, the thread become available to process another connection in the “todo” set

    If set to , this buffer will not be used for connections while waiting for the request to arrive.
    In this mode, when a connection is accepted, it is added to the “todo” queue immediately, and a worker will synchronously do any waiting necessary to read the HTTP request from the socket.