Some source systems tend to store HTML code in their back-end database, so they can more easily render it in the web UI, and allow the end-user to provide some rich-text experience.

However, when reporting on some of these fields (e.g. comments, notes, …) that are added to significant transactions, these could be difficult to handle, as the content is not pure text, but contains all sorts of HTML tags, paragraph indicators, style elements, etc…

Render HTML

One option is to just also render this HTML code in the PowerBI report, using a visual called HTML Content.

It provides many options to style the PowerBI report, without doing any parsing or otherwise. It’s like having a small web-browser in the report.

Strip HTML

A second option, to get only the text portion out of that HTML nudget, is to do a bit of advanced processing in PowerBI Query editor. (Probably you could also find a place for this transformation upstream in the ETL, so you can store it neatly in the data warehouse, but since many PowerBI projects don’t necessarily have a warehouse backend, I’m showing how to do this in the Query Editor.

    ...
    #"CleanHTML" = Table.AddColumn(#"Extracted Date", "Comments", each Html.Table([Reviewer_Comment], {{"ExtractedText",":root"}})),
    #"Expanded CleanedComment" = Table.ExpandTableColumn(#"CleanHTML", "Comments", {"ExtractedText"}, {"Comments"}),
    #"Changed Comment Type" = Table.TransformColumnTypes(#"Expanded CleanedComment",{{"Comments", type text}})
    ...

This results in a additional field Comments, which has been stripped from it’s HTML tags and can be readable immediately in the report.