Sunday, April 16, 2006

Data in the 3rd dimension

If there's an activity in computer programming that separates the men from the boys, it's recursion and hierarchies. Recursion is such a simple and elegant thing, but so mind-bending that it takes a while to feel comfortable with it. Functional recursion, a method which calls itself, is something that I find easy to understand. If a method is a request for an action, that action makes a request for itself. It's not so hard to follow. Recursion is a natural thing when you're talking about methods in a class.

However, recursion gets goofy is when you force it to go where it doesn't belong. That's right, I'm talking about relational databases. A table of data is inherently not a good structure for storing hierarchical data. Sure, it's easy enough to make a table join to itself. But, in order to pull out the data, you've got to know ahead of time how many joins to use. And with each subsequent join, performance degrades. That, and making the queries to get information about a tree of data can get nasty. Until now, I thought that this was the only means of organizing hierarchical data in a relational database. Fortunately, I was wrong.

Tonight I set out to figure out how else I might be able to represent a hierarchical data set in a relational database. To my delight, I found an answer that I like. Nested sets can be used to store hierarchical data. Although, it's slightly confusing, especially when you talk about adding and deleting new nodes, and double especially confusing when you start thinking about sorting those nodes. I've got an idea for a project where this kind of stuff is going to be essential. I'm excited to find that someone has done the research to figure out the right way to do this stuff.

No comments: