Canonicalization Algorithm
Step 1. Encode the document in UTF-8
Step 2. Change each line break to a single linefeed.
Step 3. Normalize attribute values:
- replace character and entity references by their replacement text
- replace tabs, carriage returns, and linefeeds with a single space
- for all non-CDATA type attributes trim all leading/trailing spaces
Step 4. Replace character and entity references by their replacement text.
Step 5. Replace CDATA sections by their content.
Step 6. Delete the XML declaration and the DOCTYPE declaration.
Step 7. Convert empty tags to start-tag end-tag.
Step 8. Delete blank lines before and after the root element.
Step 9. Normalize white space inside tags:
- Example: <cost currency = "USD"> is normalized to: <cost currency="USD">
Step 10. Change all attribute value delimiters to double quote marks.
Step 11. Replace all illegal characters in attribute values and element content
Step 12. Add default attribute values to each element.
Step 13. Remove all redundant namespace declarations.
Step 14. Sort the attributes.