Hierarchical Storage Of Data In Databases
Hello everybody,
today I want to write about hierarchical storage of information in databases.
As usually for storing hierarchy you'll have a choice: fast reading or fast writing. Fast reading as usually related with nested sets, and fast writing is related with adjacency. Also you can consider some kind of combination of both methods.
Following urls give good generalization of what you can have as good generalization
-
One more Nested Intervals vs. Adjacency List comparison: really cool comparison
-
Models for hierarchical data with SQL and PHP: Bill Karwin gives good comparison as well in slides
-
Representing hierarchies in MySQL: nice watching of Nested Set
-
Hierarchical data in RDBMSs: encyclopedic and that's why hard for digesting but quite complete
So for storing hierarchical data following features are fitting:
-
-
Columns: ID, ParentID
-
Easy to do.
-
fast node moves, inserts, and deletes.
-
long to find level (can store as a computed column), ancestry & descendants (Bridge Table combined with level column can solve), path (Lineage Column can solve).
-
Use Common Table Expressions in those databases that support them to traverse.
-
-
Nested Set (a.k.a Modified Preorder Tree Traversal)
-
Popularized by Joe Celko in numerous articles and his book Trees and Hierarchies in SQL for Smarties
-
Columns: Left, Right
-
Cheap level, ancestry, descendants
-
Compared to Adjacency List, moves, inserts, deletes more expensive.
-
Requires a specific sort order (e.g. created). So sorting all descendants in a different order requires additional work.
-
-
-
Combination of Nested Sets and Materialized Path where left/right columns are floating point decimals instead of integers and encode the path information. In the later development of this idea nested intervals gave rise to matrix encoding.
-
-
Bridge Table (a.k.a. Closure Table: some good ideas about how to use triggers for maintaining this approach)
-
Columns: ancestor, descendant
-
Stands apart from table it describes.
-
Can include some nodes in more than one hierarchy.
-
Cheap ancestry and descendants (albeit not in what order)
-
For complete knowledge of a hierarchy needs to be combined with another option.
-
-
-
A modification of the Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.
-
Expensive move and delete
-
Cheap ancestry and descendants
-
Good Use: threaded discussion - forums / blog comments
-
-
Lineage Column (a.k.a. Materialized Path, Path Enumeration)
-
Column: lineage (e.g. /parent/child/grandchild/etc...)
-
Limit to how deep the hierarchy can be.
-
Descendants cheap (e.g. LEFT(lineage, #) = '/enumerated/path')
-
Ancestry tricky (database specific queries)
-
-
-
Columns: one for each lineage level, refers to all the parents up to the root, levels down from the items level are set to NULL
-
Limit to how deep the hierarchy can be
-
Cheap ancestors, descendants, level
-
Cheap insert, delete, move of the leaves
-
Expensive insert, delete, move of the internal nodes
-