use utf-8 throughout htmldocck
This commit improves compatibility with Python 3, which already uses
Unicode throughout.
It also fixes a subtle incompatibility stemming from the use of
`entitydefs`, which contains replacement text _encoded in latin-1_ for
HTML entities. When using Python 3, this would cause `0xa0` to be
incorrectly added to the element tree.
This meant that there was a rustdoc test that would pass under Python 2
but fail under Python 3, due to an incorrect regex match against the
non-breaking space character. This commit triggers that failure in both
versions, and also fixes it.