htmlcxx is a simple non-validating html parser library for C++. It allows to fully dump the original html document, character by character, from the parse tree. It also has an intuitive tree traversal API.