W. J.T. Mitchell, one of the most prolific writers on the subject of image/text combinations, persists in referring to these works as a “composite art.” What happens when texts and images are combined in order to communicate? Does it form a composite? The concept of a composite is rests in the assumption that images and texts represent different modes of communication, separable and insoluble, a concept that Mitchell ultimately rejects. The distinctions between image and text are cultural, constructed, and solvent over time. However, the tension between these elements persists.
Analysis of this tension has often been hypothesized as a difference in semiotic modes. Texts are symbolic in nature, while images are sorted into overlapping categories of icon and index. However, the slippage between these characteristics renders the discussion of semiotic modes problematic. In the case of photographs, the discussion is sometimes assumed to be simpler. Photographs, because they have a specific “real world” referent, are taken to be “natural” signs, less arbitrary and contingent than their linguistic equivalents. Even allowing this gross oversimplification, the difficulty of melding images adjoining texts demonstrates their semantic difference.
Images, like words, are mediated by the conditions of their creation. Explorations of visual semantics are complicated by the lack of a consistency in syntactic markers. The most basic level of integration of image and text is the caption—a short text conjoined to the image that performs a semantic function. The syntax and semantic function of the caption have shifted over time, suggesting that there can be no synchronic inquiry into how captioning works. The exploration of their syntax, and of the implications of captioning in semantic function, can only be explained in cultural terms—it must be diachronic and historical.
However, such historical inquiries are often tinged with “hermeneutics of suspicion,” an overriding impulse to see image/text as conveying a hidden deeply coded semantic message of cultural imperialism. John Tagg, Maren Stange, and others have offered this type of reading of the documentary photography of the 1930s, while other inquiries into image/texts by Foucault and Ian Walker have looked at the surrealist use of realism to subvert such cultural imperialism. What seems lacking, at the practical level, is a pragmatics of the discourse of word and image that is not reliant on decoding a hidden message, but rather confronts the actual semantic (not semiotic) function, in context, of image/text combinations.
Are such works a division of semantic labor? Do they necessarily result in a stratification of signification, a composite comprised of two different ways of knowing? I am not so sure of this as W.J.T. Mitchell. If images and text are like oil and water (in cultural use, not essential character) have there not been some brief moments (such as the works of William Blake) where the combination has been “shaken up” to create an emulsion, which only stratifies as it stands still across time?