ref: c8cf0cee47914699e3e7c10a7b7501af94b53b4e
dir: /sys/man/1/uhtml/
.TH UHTML 1 .SH NAME uhtml \- convert foreign character set HTML file to unicode .SH SYNOPSIS .B uhtml [ .B -p ] [ .B -c .I charset ] [ .I file ] .SH DESCRIPTION HTML comes in various character set encodings and has special forms to encode characters. To make it easier to process html, uhtml is used to normalize it to a unicode only form. .LP Uhtml detects the character set of the html input .I file and calls .IR tcs (1) to convert it to utf replacing html-entity forms by ther unicode character representations except for .B lt .B gt .B amp .B quot and .B apos . The converted html is written to standard output. If no .I file was given, it is read from standard input. If the .B -p option is given, the detected character set is printed and the program exits without conversion. In case character set detection fails, the default (utf) is assumed. This default can be changed with the .B -c option. .SH SOURCE .B /sys/src/cmd/uhtml.c .SH SEE ALSO .IR tcs (1)