Sanitize untrusted HTML (to prevent XSS)
Use the jsoup HTML with a configuration specified by a Whitelist
.
A better solution may be to use a rich text WYSIWYG editor (like or TinyMCE). These output HTML, and allow the user to work visually. However, their validation is done on the client side: you need to apply a server-side validation to clean up the input and ensure the HTML is safe to place on your site. Otherwise, an attacker can avoid the client-side Javascript validation and inject unsafe HMTL directly into your site
It does not use regular expressions, which are inappropriate for this task.
The cleaner is useful not only for avoiding XSS, but also in limiting the range of elements the user can provide: you may be OK with textual , strong
elements, but not structural div
or elements.
- See the and filter evasion guide, as an example of how regular-expression filters don't work, and why a safe whitelist parser-based sanitizer is the correct approach.
- See the reference for the different canned options, and to create a custom whitelist
- The nofollow link attribute