tree-features: e159f9b5b0b5e776ce273481a27fa466b04d31a2
1: Generic feature extraction from XML documents.
2:
3: For machine learning problems, we often need our inputs to be the same, fixed size.
4: When we have a recursive structure, like a tree, we can fold over the structure to
5: obtain a single value.
6:
7: This is a very basic implementation of this idea: we take arbitrary XML documents,
8: which are tree structured, and assign each element a value based on the md5 of its
9: name and attributes concatenated together. We fold sub-trees together using bitwise
10: circular convolution, to obtain a value for the whole tree.
11:
12: Circular convolution is a linear operation, so it can't preserve as much
13: information as, for example, auto-encoding, but it is reasonably fast, requires no
14: learning and is largely non-commutative/associative, so sub-trees should be
15: distinguishable to a certain extent.
Generated by git2html.