Data Mining

First it was tubes, now it’s pipes. I was watching a video from the New Media and Social Memory conference and feel as perplexed as ever about “folksonomies.” It seems natural that pipes (which obviously have walls as a matter of course) would reemerge as a useful construct to deal with the multiple types and pathways of information on the net. I think the difficulty of this conceptualization is the huge gap between thinking of pipes as vectors (a direction which information can be made to follow) and as physical “pipes” subject to the limitations of physicality—notions such as “bandwidth.” To consider a pipe without thinking of some limitations (as half-baked as Ted Stevens argument was) means that the meaning of “pipe” is akin to the concept of “vector” or “path” instead. I suppose I’d be much happier if Yahoo elected to call their approach “pathways” rather than pipes. But vector is really best of all, because a vector passing through any information cloud is sure to encounter information that has been mislabeled, or ultimately doesn’t fit. It’s the byproduct of aiming to universalize information.

As I was researching, trying to remember what was different about the programming concept of “pipe” and the physical one, I was sucked into a weird time loop. I haven’t programmed anything in decades—I started working with the 6502 processor in the early 1980s and then stopped completely about 1986. The 6502 is credited as having the first “instruction pipe,” but it dawned on me reading the wikipedia entry that it was actually more of a cache holding a single instruction ready to go. The Yahoo effort, and Apple’s automator (another “pipe” technology) are great if you already know what you’re doing. But the real hazard of these sort of vector approaches is that in order to be effective you must limit the array of available operations to a carefully controlled, universal vocabulary (or instruction set). This is not the same as tying the tubes (as Ted Stevens would have it) but rather a matter of charting only predetermined destinations—the path to the CPU, to another process, or to executing a complex query. In short, walking only in the ruts.

I digress. Back to the folksonomy thing.

Marisa Olson of rhizome.org spoke of the difficulties of metadata—“It’s not easy to distinguish between types, genres, and keywords.” She was well aware of the need/necessity of having a controlled vocabulary for gross level classifications, and that even these controlled vocabularies shift in response to time and circumstance. For example, the vocabulary used by rhizome to sort artworks would be useless to the Getty museum, and vice versa. Out of these local contexts, vocabularies have emerged to describe artworks in meaningful ways.

The argument for folksonomies in the electronic environment is that machine pipelines will accelerate the emergence of the tags or keywords of the broadest general utility, stabilizing and promoting them to more universal usage. The overall need for a stable vocabulary was exemplified by Michael Katchen’s tale of searching flicker for “beagle” and finding (as the top result) people killing and eating them. Later, the result could not be replicated by any combination of keywords. I suspect the images were probably flagged and pulled by beagle lovers, but this momentary vector of intersection between beagles and butchery is one of the real hazards of an open and universal information space.

Olsen promotes a sort of two tier approach (not unlike Ted Stephen’s commercial/private vision of the Internet) where professionals control one level of vocabulary and private citizens can contribute (in a severely constrained and populist way) to that “universalized” keyword system. I’m still incredibly underwhelmed by folksonomies. I liked The Tubes back in the day, but I’m pretty committed to forging my own ruts without extensive peer pressure. I suppose that’s the real hair-splitting involved—is the experience of art a public or private thing? If it’s public, the vocabulary is essential—otherwise not. Or, perhaps a better distinction might be this: if it’s commercial, then a shared vocabulary would allow the purveyors to move the maximum number of units of “art.” Personally, I don’t think that’s point.