Spiders Contracts

    Testing spiders can get particularly annoying and while nothing prevents youfrom writing unit tests the task gets cumbersome quickly. Scrapy offers anintegrated way of testing your spiders by the means of contracts.

    This allows you to test each callback of your spider by hardcoding a sample urland check various constraints for how the callback processes the response. Eachcontract is prefixed with an and included in the docstring. See thefollowing example:

    This callback is tested using three built-in contracts:

    • class scrapy.contracts.default.UrlContract[source]
    • This contract (@url) sets the sample URL used when checking othercontract conditions for this spider. This contract is mandatory. Allcallbacks lacking this contract are ignored when running the checks:

    1. @url url

    • class scrapy.contracts.default.CallbackKeywordArgumentsContract[source]
    • This contract (@cb_kwargs) sets the cb_kwargsattribute for the sample request. It must be a valid JSON dictionary.

    • class scrapy.contracts.default.ReturnsContract
    • This contract (@returns) sets lower and upper bounds for the items andrequests returned by the spider. The upper bound is optional:

    1. @returns item(s)|request(s) [min [max]]

    • class ScrapesContract

    Use the command to run the contract checks.

    If you find you need more power than the built-in Scrapy contracts you cancreate and load your own contracts in the project by using theSPIDER_CONTRACTS setting:

    1. SPIDER_CONTRACTS = {
    2. 'myproject.contracts.ResponseCheck': 10,
    3. 'myproject.contracts.ItemValidate': 10,
    4. }
    • class scrapy.contracts.Contract(method, *args)

    Parameters:

    • method (function) – callback function to which the contract is associated
    • args () – list of arguments passed into the docstring (whitespaceseparated)
    • adjustrequest_args(_args)[source]
    • This receives a dict as an argument containing default argumentsfor request object. Request is used by default,but this can be changed with the attribute.If multiple contracts in chain have this attribute defined, the last one is used.

    Must return the same or a modified version of it.

    • preprocess(_response)
    • postprocess(_output)

    • This allows processing the output of the callback. Iterators areconverted listified before being passed to this hook.

    Raise frompre_process or if expectations are not met:

    • class scrapy.exceptions.ContractFail[source]
    • Error raised in case of a failing contract

    Here is a demo contract which checks the presence of a custom header in theresponse received:

    Detecting check runs

    1. import os
    2. import scrapy
    3.  
    4. class ExampleSpider(scrapy.Spider):
    5. name = 'example'
    6.  
    7. def __init__(self):
    8. pass # Do some scraper adjustments when a check is running