We_ describe the world by using words. Thus, when we say “beach”, everybody will understand; and when we say that a specific beach has, e.g., mountains behind it and the sea to the east, everybody will build their own mental view of the scenario described. Such mental views may differ substantially from one individual to another, because of differences in users’ knowledge and experience and because of the context. Consequently, the seemingly straightforward activity of searching media that match one’s mental idea turns out to be hard, if not impossible, to implement on a computer, unless strong assumptions about the application domain are made. Image recognition is still largely an unsolved problem, and tagging media (e.g., photos) is still largely a manual process. The “semantic gap” between our conceptualizations of the world, expressed using language, and our experience of the world, whose most direct representations are photos and media in general, is far beyond the reach of current media understanding systems. Thus, content-based media search is still very much example-driven (e.g., find photos similar to a given one on the basis of a set of features).
On the other hand, our life is a constellation of events which, one after the other, pace our everyday activities and index our memories. Events such as a birthday, a marriage, a summer vacation, or a car accident are the lens through which we see and memorize our own personal experiences. In turn, global events, such as world sport championships or global natural disasters (e.g., the 2004 tsunami, climate change, or the world recession) or, on a smaller scale, a local festival or a soccer match, build collective experiences that allow us to share personal experiences as part of a more social phenomenon that we could call “collective events”. When describing events, we ground in our experience our common and abstract understanding of the world and the language that we use to describe it. The generic notion of “beach” is then associated to a specific time and place, which is frozen in the photo or movie that we have taken.

The first key intuition of GLOCAL is to use events as the primary means for organizing and indexing media, e.g., photos, videos, journal articles, and email exchanges. Instead of starting from media and seeing, a posteriori, how we can meaningfully understand their contents (e.g., by tagging them), we organize a priori our knowledge in terms of events and use media to populate them with memories, thus providing their experiential dimension.


Events have a local dimension, which allows for a local mapping between words, as they appear in tags, and the local experience of the peers involved, as represented in the media, thus making mediaunderstanding easier and doable, with human intervention. At the same time events have also a global dimension in at least two ways. First, the same event can be understood very much in the same way by the peers involved in it; it therefore provides a common ground for social sharing about the same event. In addition, collective events can be built starting from the bottom up, from users’ experiences. Second, events of the same kind (e.g., marriages) all share similar structures (e.g., formal ceremony, eating with friends, etc.) and provide a common way of indexing media that leads to social sharing across different events. Networked communities provide a way to share, together with media, organization and indexing information (what we call networked events), so that the local users can benefit from the common (global) understanding of the world, and vice versa the community can benefit from individual (local) experience.
Events provide the common framework inside which the local experience-driven contextual information can be not only codified but also shared and reduced to a common denominator. Thus, for instance, the photo of a person on a beach taken on vacation will be contextualized to the specific time (night or day? which season?), to the specific location (which part of the world?) and to the specific event (summer vacation or some more specific subevent). These and other attributes of events, to be defined during the project, provide a natural means by which the contextual properties of media (which make it hard to generalize image recognition) can be uniformly clustered and thus shared within communities.
Furthermore, it is feasible to assume that users will support the generation of this contextual information, thus allowing for interactive semi-automatic tagging: Users think in terms of the current location, event, and micro-events that they wanted to capture, e.g., when taking a photo.

The second key intuition of GLOCAL is to exploit the local and global (GLOCAL) knowledge about events and related contents to locally index media and experiences and later to globally share them within networked communities, thus lessening the impact of the semantic gap.


The idea is to define suitable models that describe and share the semantics at various levels, so that each user can organize their personal experience into events, share the description of events with other peers in a networked community, and make the local system evolve in time, benefiting from and enriching universal knowledge constructed in time via agreements.
On the basis of the above concepts, GLOCAL introduces a new approach to media search that we call GLOCAL search. The key ingredients of GLOCAL search can be summarized as follows:

  • A common indexing schema based on models of events designed a priori;
  • A local indexing and tagging methodology and algorithms used to populate event descriptions with media;
  • Global sharing and search algorithms based on the common event-driven understanding of user experience inside the same event and across events.
  • A new type of search query, which can contain any subset of the following three components:
    • A set of keywords (thus reducing, when used in isolation, to standard keyword search)
    • One or more example media, e.g., photos (thus reducing, when used in isolation, to standard media search). The intended meaning is that the media provide the experiential dimension to the set of keywords, thus making it possible to localize search
    • One or more event-dependent contextual parameters (e.g., a location or a person or a property of a person), which make it possible to fully exploit the explicit contextualization of an experience, as well as the words used to describe it.

Besides offering a set of powerful platform-specific tools and applications, GLOCAL will provide an effective framework in which future incoming technologies in different related fields (such as new media, content analysis tools, networked communities instruments, etc.) can be progressively integrated and exploited. The new paradigm will be instantiated and demonstrated in different application domains thanks to the implementation inside three scenarios, two dealing with professional user communities and one focused on common users, based on a huge set of IPR licensed files (images, video, text, audio) to be tested and shared throughout the project. These scenarios have been selected so as to make possible a full validation and exploitation of project results, thanks to the variety of contents to be handled and the diversity of user experiences to be dealt with.


Scientific and technological objectives

The vision inspiring the GLOCAL project is that matching content and concept must be done keeping context into account. Context encompasses several different dimensions: nomenclature (the universal sphere, where concepts and their relationships are represented by a “dictionary knowledge”), domain (where concepts are biased by a specific application scenario), personal experience (the subjective sphere, where contents take a private meaning according to user events and semantics), community experience (the social sphere, where groups of users can share events and semantics). The integration among all those dimensions of the context provides an impressive amount of information that can be used to enrich the semantics of data and to enable both personalization of global knowledge and sharing of local experience. We consider events as the primary means where the local event context can be exploited towards this goal.

To achieve this goal, GLOCAL will have to pursue a number of key scientific and technological objectives, summarized in the following points:

  1. to support a user in managing media according to context and events, to facilitate successive access and sharing
  2. to catch the media semantics in a natural way, without the need for complex user-system and/or user-community interactions
  3. to allow users sharing consistent event schemas, built upon local and global interaction, to better access the requested media
  4. to allow networked communities to exchange in a seamless and contextual way media-populated events
  5. to provide advanced indexing and retrieval tools fully exploiting the contextual information
  6. to customize the developed instruments to fulfill specific application-oriented requirements in different user scenarios.

