On Microsoft's Channel 9 network, there is an interesting podcast called 'Just Enough Architecture', where the interviewee provides some good recommendations about the balance between how much architecture you need versus just getting on and writing software that does something useful.
The same debate could be applied to taxonomy, specifically the use of metadata properties to classify content.
For some reason, most companies who decide they want to improve how content is classified seem to want extreme taxonomy, swinging from not-enough taxonomy to too-much. The mantra may sound somewhat familiar:
One taxonomy to rule them all, one taxonomy to find them, one taxonomy to bring them all and, in the records management store, define them
Often starting with none at all (i.e. content is organised informally and inconsistently using folders), the desire is to create a single corporate taxonomy to classify everything (using a hierarchical structure of metadata terms). An inordinate amount of time is then spent defining and agreeing the perfect taxonomy (for some reason, many seem to settle on about 10,000 terms). Several months later, heads are being scratched as people try to figure out just how they are going to implement the taxonomy. Do they classify existing content or only apply it to new stuff? Do they have specific roles dedicated to classifying the content, rely on the content owners to do it, or look at automated classification tools. Do they put rules in place to force people to classify content and store it in specific locations that are 'taxonomy-aware'. How do they prevent people bypassing the system, those who figure they can still get their work done by switching to a wiki or a Groove workspace or a MySpace site or a Twitter conversation? How do they validate the taxonomy and check that people are classifying correctly? What do they do if people aren't classifying correctly, who don't understand the hierarchy or have different meanings for the terms in use? What started out as a simple idea to improve the findability of information becomes a huge burden to maintain with questionable benefits, given there are so many opportunities for classification to go wrong.
This dilemma reveals two flaws that make implementing a taxonomy so difficult. The first is the desire to treat taxonomy as a discrete project rather than an organic one. Collaboration and knowledge management projects often share this fate. Making taxonomy a discrete project usually means tackling it all in one go from a technology perspective and then handing it over to the business to run 'as is' for ever more (i.e. until the next technology upgrade). Such projects end up looking like that old cliché - attempting to eat an elephant whole. The project team tries to create a perfect design that will deliver all identified requirements (and the business, knowing this could be their one chance for improved tools, delivers a loooooong list of requirements), implements a solution and then moves on to the next project. As the solution is used, the business finds flaws in their requirements or discover new ways of working enabled by the technology, but it is too late to get the solution changed. The project is closed, the budget spent.
An alternative approach is to treat taxonomy as an organic project or, for those who prefer corporate-speak, a continuous-improvement programme. Instead of planning to create and deploy the perfect taxonomy, concentrate on 'just enough taxonomy'. A good starting point is to find out why taxonomy is needed in the first place. If it is to make it easier for people to find information, first document the specific problems being experienced. Solve those problems as simply as possible, test them and gather feedback. If successful, people will raise the bar on what they consider good findability, generating new demands waiting for IT to solve, and so the cycle continues.
The following is a simple example using a fictitious company.
Current situation: Most information is stored in folders on file shares and shared via email. There is an intranet that is primarily static content published by a few authors. The IT department has been authorised to deploy Microsoft Office SharePoint Server 2007 (MOSS)
General problem: Nobody can find what they are looking for (resist temptation to sing U2 song at this moment...)
Specific problems: Difficult to find information from recently completed projects that could be re-used in future projects; Difficult to differentiate between high quality re-usable project information versus low quality or irrelevant project information; Difficult to find all available documents for a specific customer (contracts, internal notes, project files)
Possible solution: Deploy a search engine to index all file folders and the intranet. Move all project information to a central location. Within the search engine, create a scope (or collection) for the project information location. Users will then be able to perform search queries that will return only project information within the results. Using 'date modified' as the sorting order will locate information from the most recent projects. Create a central location for storing top-rated 'best practice' project information. Set-up a team of subject matter experts to work with project teams and promote documents as 'best practice'. The Best Practices store can be given high visibility throughout the intranet and promoted as high relevance for search queries.
Now that is a very brief answer outlining one possible solution. But the solution is relatively simple to implement and should offer immediate (and measurable) improvements based on feedback regarding the problems people are experiencing. There were two red herrings in the requirements that could have resulted in a very different, more complex, solution: 1. That MOSS was going to be the technology; and 2. The need to find documents for a specific customer. When you have chosen a technology, there is always the temptation to widen the project scope. MOSS has all sorts of features that can help improve information management and the starting point is often to replace an old crusty static intranet. But the highlighted problems did not mention any concerns about the intranet. That's not to say those concerns do not exist, but they are a different problem and not the priority for this project. The second red herring is a classic. When people want to be able to find information based on certain parameters, such as all documents connected to a specific customer, there is the temptation to implement a corporate-wide taxonomy and start classifying all content, starting with the metadata property 'customer name'. But documents about a specific customer will likely contain the customer's name. In this scenario, the simplest solution is to create a central index and provide the ability for users to search for documents containing a given customer's name. If that fails to improve the situation then you may need to consider more drastic measures.
Rejecting the large-scale information management project in favour of small chunks of continuous 'just enough' improvement is not an easy approach to take. The idea of having a centralised, classified and managed store of content, where you can look up information based on any parameter and receive perfect results, continues to be an attractive one with lots of benefits to the business - both value-oriented (i.e. helping people discover information to do their job) and cost-oriented (i.e. managing what people do with information - compliance checks and the like). But a perfectly classified store of content is a utopia. Trying to achieve it can result in creating systems that are harder to use and difficult to maintain when the goal is supposed to be to make them easier.
I mentioned that the common approach to implementing taxonomy has two flaws. The first has been discussed here - how to create just enough taxonomy. The second flaw is the desire to create a single universal taxonomy that can be applied to everything. I'll tackle that challenge in a separate post (a.k.a this post is already too long...)
Reference: Just Enough Architecture (MSDN Channel 9). Highly recommended. There are plenty of similarities between software architecture and information architecture (of which taxonomy is subset). Don't be put off by the techie speak, it debates the pro's and con's of formal processes and informal uses, and includes some great non-technical examples for how to find a balance.
Recent related posts: