Users can configure automated alerts and reports to send dashboards or charts to an email recipient or Slack channel.

  • Alerts are sent when a SQL condition is reached
  • Reports are sent on a schedule

Alerts and reports are disabled by default. To turn them on, you need to do some setup, described here.

Commons

In your superset_config.py
  • "ALERT_REPORTS" must be turned to True.
  • CELERYBEAT_SCHEDULE in CeleryConfig must contain schedule for reports.scheduler.
  • At least one of those must be configured, depending on what you want to use:
    • emails: SMTP_* settings
    • Slack messages: SLACK_API_TOKEN
In your Dockerfile
  • You must install a headless browser, for taking screenshots of the charts and dashboards. Only Firefox and Chrome are currently supported.

Note : All the components required (headless browser, redis, postgres db, celery worker and celery beat) are present in the docker image if you are following . All you need to do is add the required config (See Detailed Config). Set ALERT_REPORTS_NOTIFICATION_DRY_RUN to False in superset config to disable dry-run mode and start receiving email/slack notifications.

Slack integration

To send alerts and reports to Slack channels, you need to create a new Slack Application on your workspace.

  1. Connect to your Slack workspace, then head to https://api.slack.com/apps.
  2. Create a new app.
  3. Go to “OAuth & Permissions” section, and give the following scopes to your app:
    • incoming-webhook
    • files:write
    • chat:write
  4. At the top of the “OAuth and Permissions” section, click “install to workspace”.
  5. Select a default channel for your app and continue. (You can post to any channel by inviting your Superset app into that channel).
  6. The app should now be installed in your workspace, and a “Bot User OAuth Access Token” should have been created. Copy that token in the SLACK_API_TOKEN variable of your superset_config.py.
  7. Restart the service (or run superset init) to pull in the new configuration.

Note: when you configure an alert or a report, the Slack channel list take channel names without the leading ‘#’ e.g. use alerts instead of #alerts.

Kubernetes specific

  • You must have a celery beat pod running. If you’re using the chart included in the GitHub repository under helm/superset, you need to put supersetCeleryBeat.enabled = true in your values override.
  • You can see the dedicated docs about for more generic details.

Docker-compose specific

You must have in yourdocker-compose.yaml
  • a redis message broker
  • PostgreSQL DB instead of SQLlite
  • one or more celery worker
  • a single celery beat

Detailed config

The following configurations need to be added to the superset_config.py file. This file is loaded when the image runs, and any configurations in it will override the default configurations found in the config.py.

You can find documentation about each field in the default config.py in the GitHub repository under .

You need to replace default values with your custom Redis, Slack and/or SMTP config.

In the CeleryConfig, only the CELERYBEAT_SCHEDULE is relative to this feature, the rest of the CeleryConfig can be changed for your needs.

Using Firefox

  1. FROM apache/superset:1.0.1
  2. USER root
  3. RUN apt-get update && \
  4. apt-get install --no-install-recommends -y firefox-esr
  5. ENV GECKODRIVER_VERSION=0.29.0
  6. RUN wget -q https://github.com/mozilla/geckodriver/releases/download/v${GECKODRIVER_VERSION}/geckodriver-v${GECKODRIVER_VERSION}-linux64.tar.gz && \
  7. tar -x geckodriver -zf geckodriver-v${GECKODRIVER_VERSION}-linux64.tar.gz -O > /usr/bin/geckodriver && \
  8. chmod 755 /usr/bin/geckodriver && \
  9. rm geckodriver-v${GECKODRIVER_VERSION}-linux64.tar.gz
  10. RUN pip install --no-cache gevent psycopg2 redis
  11. USER superset

Using Chrome

Don’t forget to set WEBDRIVER_TYPE and WEBDRIVER_OPTION_ARGS in your config if you use Chrome.

Summary of steps to turn on alerts and reporting:

Using the templates below,

  1. Create a new directory and create the Dockerfile
  2. Build the extended image using the Dockerfile
  3. Create the docker-compose.yaml file in the same directory
  4. Create a new subdirectory called config
  5. Create the superset_config.py file in the config subdirectory
  6. Run the image using docker-compose up in the same directory as the docker-compose.py file
  7. In a new terminal window, upgrade the DB by running docker exec -it superset-1.0.1-extended superset db upgrade
  8. Then run docker exec -it superset-1.0.1-extended superset init
  9. Then setup your admin user if need be, docker exec -it superset-1.0.1-extended superset fab create-admin
  10. Finally, restart the running instance - CTRL-C, then docker-compose up

(note: v 1.0.1 is current at time of writing, you can change the version number to the latest version if a newer version is available)

The docker compose file lists the services that will be used when running the image. The specific services needed for alerts and reporting are outlined below.

Redis message broker

To ferry requests between the celery worker and the Superset instance, we use a message broker. This template uses Redis.

Replacing SQLite with Postgres

While it might be possible to use SQLite for alerts and reporting, it is highly recommended using a more production ready DB for Superset in general. Our template uses Postgres.

Celery worker

The worker will process the tasks that need to be performed when an alert or report is fired.

Celery beat

The beat is the scheduler that tells the worker when to perform its tasks. This schedule is defined when you create the alert or report.

Full docker-compose.yaml configuration

The Redis, Postgres, Celery worker and Celery beat services are defined in the template:

Config for docker-compose.yaml:

  1. version: '3.6'
  2. redis:
  3. image: redis:6.0.9-buster
  4. restart: on-failure
  5. volumes:
  6. - redis:/data
  7. postgres:
  8. image: postgres
  9. restart: on-failure
  10. POSTGRES_DB: superset
  11. POSTGRES_PASSWORD: superset
  12. POSTGRES_USER: superset
  13. volumes:
  14. - db:/var/lib/postgresql/data
  15. worker:
  16. image: superset-1.0.1-extended
  17. restart: on-failure
  18. healthcheck:
  19. disable: true
  20. depends_on:
  21. - superset
  22. - postgres
  23. - redis
  24. command: "celery --app=superset.tasks.celery_app:app worker --pool=gevent --concurrency=500"
  25. volumes:
  26. - ./config/:/app/pythonpath/
  27. beat:
  28. image: superset-1.0.1-extended
  29. restart: on-failure
  30. healthcheck:
  31. disable: true
  32. depends_on:
  33. - superset
  34. - postgres
  35. - redis
  36. command: "celery --app=superset.tasks.celery_app:app beat --pidfile /tmp/celerybeat.pid --schedule /tmp/celerybeat-schedule"
  37. volumes:
  38. - ./config/:/app/pythonpath/
  39. superset:
  40. image: superset-1.0.1-extended
  41. restart: on-failure
  42. environment:
  43. - SUPERSET_PORT=8088
  44. ports:
  45. - "8088:8088"
  46. depends_on:
  47. - postgres
  48. - redis
  49. command: gunicorn --bind 0.0.0.0:8088 --access-logfile - --error-logfile - --workers 5 --worker-class gthread --threads 4 --timeout 200 --limit-request-line 4094 --limit-request-field_size 8190 superset.app:create_app()
  50. volumes:
  51. - ./config/:/app/pythonpath/
  52. volumes:
  53. db:
  54. external: true
  55. redis:
  56. external: false

Summary

With the extended image created by using the Dockerfile, and then running that image using docker-compose.yaml, plus the required configurations in the superset_config.py you should now have alerts and reporting working correctly.

  • The above templates also work in a Docker swarm environment, you would just need to add Deploy: to the Superset, Redis and Postgres services along with your specific configs for your swarm

Old Reports feature

Scheduling and Emailing Reports

Email reports allow users to schedule email reports for:

  • chart and dashboard visualization (attachment or inline)
  • chart data (CSV attachment on inline table)

Enable email reports in your superset_config.py file:

This flag enables some permissions that are stored in your database, so you’ll want to run superset init again if you are running this in a dev environment. Now you will find two new items in the navigation bar that allow you to schedule email reports:

  • Manage > Dashboard Emails
  • Manage > Chart Email Schedules

Schedules are defined in and each schedule can have a list of recipients (all of them can receive a single mail, or separate mails). For audit purposes, all outgoing mails can have a mandatory BCC.

In order get picked up you need to configure a celery worker and a celery beat (see section above “Celery Tasks”). Your celery configuration also needs an entry email_reports.schedule_hourly for CELERYBEAT_SCHEDULE.

To send emails you need to configure SMTP settings in your superset_config.py configuration file.

  1. EMAIL_NOTIFICATIONS = True
  2. SMTP_HOST = "email-smtp.eu-west-1.amazonaws.com"
  3. SMTP_STARTTLS = True
  4. SMTP_SSL = False
  5. SMTP_USER = "smtp_username"
  6. SMTP_PORT = 25
  7. SMTP_PASSWORD = os.environ.get("SMTP_PASSWORD")
  8. SMTP_MAIL_FROM = "insights@komoot.com"

To render dashboards you need to install a local browser on your Superset instance:

You’ll need to adjust the WEBDRIVER_TYPE accordingly in your configuration. You also need to specify on behalf of which username to render the dashboards. In general dashboards and charts are not accessible to unauthorized requests, that is why the worker needs to take over credentials of an existing user to take a snapshot.

Important notes

  • Be mindful of the concurrency setting for celery (using -c 4). Selenium/webdriver instances can consume a lot of CPU / memory on your servers.
  • In some cases, if you notice a lot of leaked geckodriver processes, try running your celery processes with celery worker --pool=prefork --max-tasks-per-child=128 ...
  • It is recommended to run separate workers for the sql_lab and email_reports tasks. This can be done using the queue field in CELERY_ANNOTATIONS.

Schedule Reports

You can optionally allow your users to schedule queries directly in SQL Lab. This is done by adding extra metadata to saved queries, which are then picked up by an external scheduled (like ).

To allow scheduled queries, add the following to SCHEDULED_QUERIES in your configuration file:

  1. SCHEDULED_QUERIES = {
  2. # and saved into the `extra` field of saved queries.
  3. # See: https://github.com/mozilla-services/react-jsonschema-form
  4. 'JSONSCHEMA': {
  5. 'title': 'Schedule',
  6. 'description': (
  7. 'In order to schedule a query, you need to specify when it '
  8. 'should start running, when it should stop running, and how '
  9. 'often it should run. You can also optionally specify '
  10. 'dependencies that should be met before the query is '
  11. 'executed. Please read the documentation for best practices '
  12. 'and more information on how to specify dependencies.'
  13. ),
  14. 'type': 'object',
  15. 'properties': {
  16. 'output_table': {
  17. 'type': 'string',
  18. 'title': 'Output table name',
  19. },
  20. 'start_date': {
  21. 'type': 'string',
  22. 'title': 'Start date',
  23. # date-time is parsed using the chrono library, see
  24. # https://www.npmjs.com/package/chrono-node#usage
  25. 'format': 'date-time',
  26. 'default': 'tomorrow at 9am',
  27. },
  28. 'end_date': {
  29. 'type': 'string',
  30. 'title': 'End date',
  31. # date-time is parsed using the chrono library, see
  32. # https://www.npmjs.com/package/chrono-node#usage
  33. 'format': 'date-time',
  34. 'default': '9am in 30 days',
  35. },
  36. 'schedule_interval': {
  37. 'type': 'string',
  38. 'title': 'Schedule interval',
  39. },
  40. 'dependencies': {
  41. 'type': 'array',
  42. 'title': 'Dependencies',
  43. 'items': {
  44. 'type': 'string',
  45. },
  46. },
  47. },
  48. },
  49. 'UISCHEMA': {
  50. 'schedule_interval': {
  51. 'ui:placeholder': '@daily, @weekly, etc.',
  52. },
  53. 'dependencies': {
  54. 'ui:help': (
  55. 'Check the documentation for the correct format when '
  56. 'defining dependencies.'
  57. ),
  58. },
  59. },
  60. 'VALIDATION': [
  61. # ensure that start_date <= end_date
  62. {
  63. 'name': 'less_equal',
  64. 'arguments': ['start_date', 'end_date'],
  65. 'message': 'End date cannot be before start date',
  66. # this is where the error message is shown
  67. 'container': 'end_date',
  68. },
  69. ],
  70. # link to the scheduler; this example links to an Airflow pipeline
  71. # that uses the query id and the output table as its name
  72. 'linkback': (
  73. 'https://airflow.example.com/admin/airflow/tree?'
  74. 'dag_id=query_${id}_${extra_json.schedule_info.output_table}'
  75. }

This information can then be retrieved from the endpoint /savedqueryviewapi/api/read and used to schedule the queries that have in their JSON metadata. For schedulers other than Airflow, additional fields can be easily added to the configuration file above.