DC 2010 Artist paper

From Artist
DC 2010

This page is an archive of a collective writing for the DC 2010 conference


LogoDC2010small.png
DC 2010 Conference
Dublin Core Metadata Initiative
Pittsburgh,
20-22 October 2010.
Title
Metadata for semantic wikis networks
Abstract
bla bla bla (to be done at the end of the process)
Authors
Jacques Ducloy(i), Thierry Daunois(ii), Muriel Foulonneau(iii), Alice Hermann(iv), Jean-Charles Lamirel(v), Stéphane Sire(vi) and Christine Vanoirbeek(vi).

Introduction

Since March 25, 1995, when Ward Cunningham launched WikiWikiWeb, a web site devoted to software development, wikis are playing an increasing role in the field of scientific and technical information. Right now, most wikis we found in many research organizations are quite monolithic. What happens with an editorial collection of scientific information distributed in a network of semantic wikis? This article aims at identifying several metadata issues we faced when starting the Wicri network.

WICRI is an acronym that stands for "WIkis for Communities in Research and Innovation". Right now, Wicri is a demonstrator, which contains about sixty wikis, some of them on a regional basis, the others on a few topics. But the knowledge architecture we must design is quite the same as would be required for several thousands of wikis.

When a research-working group launches a little wiki on a very well identified topic, metadata does not play a role that is perceived as important. This feeling evolves depending on the size and complexity of the application. The first generation of metadata is based mainly on the use of categories, which looks like a traditional indexing practice.

For instance, the size of Wikipedia reaches 3 millions of articles, and now, the need for metadata becomes ubiquitous. Specifically, Wikipedia Statistics for January 2010 [1] gives 259.000 templates and 552.000 categories. Anyway, the global architecture hosted by the Wikimedia Foundation is quite monolithic: a multilingual family structured around the largest wiki, supplemented by several specialized wikis.

Now, generalizing the approach we are experimenting with Wicri, we must consider a network of hundreds and thousands of wikis. Thus metadata does play a crucial an increasing role. Semantic wikis introduce a new generation of metadata, allowing a knowledge modelling in a RDF framework that is interesting to consider.

In this paper, we will first introduce Wicri network; then we will review several existing solutions. Trails to explore in the future will be discussed in two views: that of a contributor who is facing the production of metadata, and that of the computer scientist developing new services.

Note
This article is written while using a collaborative practice (in a same way that we have done for DC 2006)‎[d1] . It will be published in two versions: traditional on the web site of the conference; and wicrified[2] on Artist wiki.

Introducing Wicri

Wicri, a network of wikis for research and innovation

Wicri network has been created in the framework of Mission Ticri (Technologies dealing with Information and Communication for Communities involved in Research and Innovation). This initiative was launched by the Lorraine representative of Ministry in charge of research affairs. Ticri aims at disseminating main results of research communities in order to promote partnership between innovation actors, to encourage outreach, and to develop technology transfers in a multidisciplinary context.

Wikipedia has demonstrated the interest of the wiki approach to build and disseminate a common knowledge on a very large scale. Thus Wikipedia brings us a first answer (and we are using this media) but it is not sufficient to bring us a global response. For instance we would like to publish some results of research activities in a very constrainted way of modifying the original text, i. e. limited in adding links to articles expaining a particular topic, or discussion area. For instance a periodical (AMETIST) is soon being published in the network. An other point is that Wikipedia's contributors must display information attested by external references. Authors can be anonymous as far as their bibliographic references are significant and link to explicitly named people. But now, when we deal with research fields, the academic communities are producing the knowledge that Wikipedia could use. In many cases, knowledge is in progress and many assumptions appear to be hypothesis. For these reasons we think that the authors must be clearly known; thus anonymous contributions are forbidden.

As a result, such a wiki infrastructure must be driven by institutional entities in order to manage registration processes. Thus the institutions must find an advantage in investing in wiki approach and visibility becomes a strong parameter. The network approach allows each partner to promote its own wiki site, and its own visibility.

In a first step, we have built a little demonstrator with several institutional wikis. The limits have appeared quite immediately: if several organizations are working on the same topic, this topic must be developped on a thematic wiki. Thus we have quickly introduced several wikis on thematic or regional design.

A little team, mainly 3 people in the same office, has operated the demonstrator. As soon as we were more than one, several coherency problems have been met and an effective carrying of metadata has been introduced.

Il manque une phrase d'ouverture vers d'autres disciplines (enseignement) pour indiquer le caractère générique de notre réflexion

The current Wicri network

The wiki types

The wiki network accepts two main types of wikis.

  • Institutional wikis : an institutional wiki is handled by an organization. In this paper, we will often use a naming scheme with two parts: region then accronym. For instance, Lorraine/sge stands for the research cluster SGE (environmental sciences and engineering, Sciences et Génie de l'Environnement in French) in Lorraine area. For wikis related with scientific working groups, we use in first part a code identifying the global thematic; for instance, ICT/Artist is the wiki of Artist WG, dealing with Information and Communication Technology.
Being directly related to an identified organization, these wikis may have specific rules, differing from the common rules of the Wicri web. For example, an institutional wiki could be open to anonymous contributions, or, on the contrary, strictly limited. The editorial lign can strongly differ from Wicri's one, as well.
  • Common wikis : a common wiki's design is set up by the global Wicri Community. Be it managed by an organization or not, it fully shares the common rules and is moderated by independent and scientific committees. In this paper we use a naming scheme with Wicri as first part, like in Wicri/Lorraine or Wicri/Water.
Fig 1. CRIS on wiki

A first set of common wikis are designed on a regional framework such as Wicri/Lorraine or Wicri/Alsace. A main objective is to obtain a highly detailed and understandable CRIS (Current Research Information System). This approach looks like Jeffery's ‎[j2] or Erbach's ‎[e2] ones. They would like to merge organization related items (CRIS) with open archives in order to produce an e-Science infrastructure‎[j1]. Wicri adds a wiki, with its editorial facilities, for bringing a readable summary.

An other set of common wikis is devoted to thematic fields. At this time, one of them, Wicri/Ticri is related to Information Science & Technology (a DCMI portal is included). An other part deals with environment and contains 4 wiki families: Wicri/Water, Wicri/Woods, Wicri/Biomass and Wicri/UrbanSoils. They are also organized with information system items, (such as program committees) and editorial contents (scientific articles, scientific surveys).

A few wikis have been designed for a global coherency of the network. The most visible is Wicri/Wicri which gives a global view of the network: all topics must appear and link to more detailed pages or desk in other wikis.

An other, Wicri/Media is an image repository (and plays the same role as Commons in the Wipedia family). It can also host pdf documents, but we are looking for a better solution, using Fedora for instance.

At least, related to metadata handling, a wiki named Wicri/Base contains templates and semantic items which can be used in all other wikis.

Fig 2. The current Wicri network (a subset)

Wikis and families of wikis

In a multilingual approach, most wikis are really families of wikis (i.e. a set of wikis, one for each language, connected by interwiki links). In this paper, we use the notation Wicri/Water(fr) to define the French component of the family, and Wicri/Water(en) for the English one. Wicri/Water(priv) refers to the "private" wiki of a family, where registration is required not only to contribute, but also to read what is written.

Network coherency versus contenus différenciés

Most information should be developed several times on different wikis. For instance, each research project with several partners must be cited and commented in the regional wiki of each partner, as well as in all relevant thematic wikis.

Even in the initial phase of the Wicri project, we have encountered a significant number of cases. Here follow 3 cases quite strongly differentiated: a city description, a scientific paper, a call for paper.

  • The city of Pittsburgh, where the 2010 DC conference will be held, appears at least on 3 wikis. On Wicri/Ticri, Pittsburgh is directly connected to DC 2010 and the corresponding page speaks about main activities related to information science in this geographic area[3]. On Wicri/Water, we describe the confluence of Allegheny and Monongahela rivers for giving the source of Ohio[4]. On Wicri/Wicri, we talk about general facts about this city and introduce commented links on the other pages[5]. These 3 pages are related to the same topic, but display clearly distinct contents.
  • Carl Lagoze has written an article which is becoming very popular in French speaking area: Qu’est-ce qu’une bibliothèque numérique, au juste ? / What Is a Digital Library anymore, anyway? []. In ICT/Artist the paper is integraped in the portal of Ametist journal in which it was first translated[6]. A copy have been done in Wicri/Ticri, as it was considered as a reference paper for a wiki dealing with digital libraries[7]. Anchors and links are sometimes different that on ICT/Artist. At least, the first part (on only the first part has been introduced on Wicri/Wicri[8].Nouvelle proposition : At least, this paper contains an interesting introduction that could get a very large audience. Thus this first part (and only this first part has been introduced on Wicri/Wicri[9]..Trosième proposition : Since this paper's introduction is of general interest, and could get a very large audience, this part of the article is also displayed on Wicri/Wicri[10]..
==Programm Committe==
... 
* [[Has PC member::Paul Dupont]], Nancy (Fr)
* [[Has PC member::John Smith]], London (UK)
...
==Org. Committee==
* [[wicri-lor.fr:Jean Durand|Jean Durand]]
...

Programm Committee

...

Org. Committe

==Programm Committe==
... 
* [[Has PC member::Paul Dupont]], Nancy (Fr)
* [[ticri.en:John Smith|John Smith]], London (UK)
...
==Org. Committee==
...
* [[Has OC member::Jean Durand]]
...

Fig. 3. A part of a page relative to a conference happening in Nancy
  • At least, we present a situation dealing with a conference that is hold in Lorraine and dealing with ICT. The call for papers is duplicated on two wikis Wicri/Lorraine and Wicri/Ticri. The figure 3 shows different ways for managing the relation between this event and committee members. The event model of semanticweb.org is used with properties Has PC member and Has OC member. Paul Dupont who works in Lorraine is always qualified with the property Has PC member. On Wicri/Lorraine John Smith is only linked to Wicri/Ticri with an interwiki mechanism ([[ticri.en:John Smith]]), because he has no author page on Wicri/Lorraine (and, right now, SMW does not provide semantic links between different wikis).


Fig. 4. Metadata coherency in wikis network (W1...) vs repositories (R1...)

All these pages are mainly written by human contributors, and not by computers. Computers could help in various ways but in fine pages are made by contributors. In a repository based network, using OAI-PMH for example, the coherency is done by computer protocols, which share controlled metadata. In a wiki network, a contributor can write on many wikis and interact with metadata. Thus metadata plays a crucial role not only with programming activities but also with authoring process.

Issues about networks of semantic wikis

This section introduces a discussion about the technical choices that have been done in the initial design of Wicri. The Wicri project aims at setting up an operational set of services. At the present time, it is a demonstrator which is becoming a digital infrastructure. So, even if we are close to research projects, we have implemented some pragmatic solutions. Je voulais dire ceci : nous sommes dans un contexte de montage de service à caractère opérationnel et nous devons trouver des solutions pragmatiques qui ne sont pas forcément les meilleures du point de vue de la recherche Jacques Ducloy 12:26, 22 February 2010 (UTC)

Ne t'excuse pas. Je propose d'essayer de rebondir sur l'assertion suivante "Researchers need to stop thinking of themselves as researchers and start thinking of themselves as implementors. " dans un post de Zack Rosen "RDF Semantic web research isn't working". Il pointe justement le fait que les chercheurs en Semantic Web ne sont pas pragmatiques et ne s'intègrent pas aux environnements existants. Je peux rédiger + tard [1] Muriel Foulonneau 16:38, 24 February 2010 (UTC) Proposition : At that point of the project, we cannot, at the same time, face research needs while promoting pragmatic solutions. Consequently, we first try to deal with the reality of Wicri's web environment, considering Zack Rosen's advise : "Researchers need to stop thinking of themselves as researchers and start thinking of themselves as implementors."[11]


Wikis for scientists who approach the world

A first choice we met while starting Wicri project was the wiki engine. A priority issue for this project is to allow a maximum of researchers to disseminate their results to a maximum of actors potentially involved. Thus we have chosen to be fully compatible with Wikipedia, and to use MediaWiki[12] as the wiki engine of Wicri network. This CMS[13] is used by Wikipedia and is becoming very popular.

This option implies several consequences. The first one is to supplement the functionality of MediaWiki with php extensions and templates that are commonly used in Wikipedia, so that an occasional contributor is not disoriented when moving from Wikipedia to the Wicri network. The Wicri/base function is to manage the collection of needed templates (and also semantic items) used throughout the network.

The second consequence is the use of local language (French, for example) to express and manipulate metadata in a given wiki.

Semantic wikis for scientific objects

Scientists and engineers used to work with a lot of technical objects, such as formulas, drawings 3D images, knoledge items; and not only texts. This paragraph gives some issues about the way in which a wiki can address these requirements.

In other words, Wicri's pages must carry many texts that contain scientific results described with scientific objects. Using MediaWiki opens a first set of feature dealing with formulas or drawing. Some of them are very easy to install, for instance "imagemap"[14]. However, our experience is already showing some difficulties, stressing the need for technological support. For instance, downloading LaTeX extensions requires installing LaTeX close to the operating system that supports the wiki.

MediaWiki supports SVG (Scalable Vector Graphics) in a quite poor way. A contributor can upload an SVG image, but, this image is after converted into a png format. So, right now, it appears difficult to manage interactions betwen text and images with the basic SVG facility. Life sciences give a good sample of what could be needed as interactions in a scientific area. The Protopedia project ‎[h1] is carrying 3D images of molecular items such as protein, RNA, DNA and other macromolecules[15]. The contributor can set several kind of interaction while using green links in the wiki text. These links interact with an applet Java (jmol).

Obviously, as the science mission is to build knowledge, semantic tools are ubiquitous. At a first level, Wikipedia has implemented several set of taxonomies. The life species is a quite complete sample. Implementation is made with a tree of categories and related templates (taxobox). This taxonomy is distributed on each language version, on Wikipedia Commons (images), and on WikiSpecies. A comparison between these different wikis shows a multipurpose utilization of 3 classification schemes[16].

At least, Semantic Mediawiki - voir la suivanteallows contributors to enter the Semantic Web with an RDF approach.phrase de liaison à revoir Jacques Ducloy 06:46, 26 February 2010 (UTC). At least, Wicri is in the process of appropriating Semantic MediaWiki[17], which provides an extension that enables wiki-users to semantically annotate wiki pages, based on which the wiki contents can be browsed, searched, and reused in novel ways‎[k2].

Right now, we have found several wikis that handle mostly medadata dealing with science organization. For instance, semanticweb.org or openresearch.org provides a semantic metadata model around scientific events. We began by adapting this model and we have encountered several difficulties, due to a variety of situations in the various scientific communities, and a translation exercise (in French).

We have also find some works whose purpose is to build or curate an ontology; but we would prefer to use ontologies in order to handle scientific data, objects or information in a wiki.

Semantic MediaWiki is not an universal solution for using wikis in scientific fields. For instance SWiM ‎[l1], a semantic wiki for mathematical knowledge management[18], has a better handling of mathemtical formulas than the Latex extension of SMW. For our future plans, Wicri must integrate several kinds of wiki engines. Once again, the coherency will be achieved by a strong handling of metadata.

Networks and Distributed Wiki Applications

A strong issue for a network of wikis deals with replication management. In Wicri network a given data can appear on many pages of many wikis. What happens when this information must be modifyed? In the framework of Wicri, in order to examine our further development strategies, we have identified 5 classes of replication cases.

  1. Wiki replication. A given wiki in its entirety, could be duplicated in a Peer2peer network, distributed on several sites with a distributed replication mechanism ‎[o1]. This feature is useful for technical reasons (strategic wiki as Wicri/wicri) or sometimes for political ones (a wiki that could bring visibility for several institutions). That feature does matter with editorial replications neither metadata.
  2. Page replication. A set of pages are replicated on several wikis. This kind of facility begins to be available ‎[c1], and could be very useful for invariant pages, such as templates related to semantic models. With the same kind of P2P mechanism than in previous case, any change on any wiki is distributed on other wikis. Using DSMW (Distributed Semantic Media Wiki) extension[19], this mechanism is driven by metadata (semantic properties).
  3. Paragraph replication. Until now, we have not found an extension of SMW able to extend the previous mechanism at the paragraph level. This need is quite ubiquitus in Wicri network. In simple cases, a palliative, which consists in creating a page template for each paragraph, might work. In most case, it could not be used by a human contributor. For instance, we cannot ask an author to create explicitely one page for each bibliographic reference.
    With a metadata viewing, this case is interesting because it oblige to use simultaneously two mechanisms: data-centric and document-centric. Identifying pages to be replicated can been done by properties, in a pure data-centric RDF approach. Identifying a paragraph requests an explicit document-centric structuration (XML).
  4. Page ou paragraph replication with transformation. In many cases, the previous mechanims could not be applied because the paragraph must be transformed while replicating. For instance, for editorial reasons, requirements for handling organsation committes can be different in a regional wiki (with semantic links for local members) and in a thematic wiki (no links). We have not yet gone further analysis, but we are again in a document-centric approach.
  5. replication of subset with several pages.
    Fig 4. Interlinks between geographic items
    At least we have met repetitive cases that implies a set of pages. This case could be illustrated by geographic items such as contries, regions, towns, etc.. When a new city appears on a given wiki, the contributor should theoretically keep the connectivity of the networked hypertext. The figure 4 gives an example with the city of Nancy in an institutionnal wiki (Artist). This page which must be linked with page Lorraine, which must be eventualy created; and so one. The page Nancy on wiki must be interlinked with the same on Wicri/Ticri etc..

Xml and RDF handling

Right now, as Wicri is still in a starting step, all replications presented in previous paragraph are performed manually. Thus, for a scientist with a low practice, it is not easy to enter rigth medatata, categories or set of requested links. We have introduce the role of "semantic administrator" (who is also an "interoperability adminitrator") for improving, a posteriori, the semantic quality of the network.

We begin to design tools to help "semantic administrators" in validating contributors proposals. Our feeling is that this task will not be too complex. In contrast, providing consistent assistance to the contributor constitutes a research issue.

We have identified two kinds of difficulties:

  • on a technical design level, two formalisms, "document centric Xml" and "data centric RDF" must live altogether...
  • dealing with contributor practice, ...

On a identifié 2 difficultés :

  • la cohabitation de technologies RDF (semantic wikis) et Xml
    • quelques références sur le sujet - en s'appuyant notamment sur le contrôle des replications, la cohabitation d'informations structurées et non structurée, les objets scientifiques complexes
    • on revient notamment sur les exemples graphiques 3D
  • le fait que le chercheur contributeur, dans son activité éditoriale a accès au formalisme qui relève traditionnellement des spécialistes

=> il faut être pragmatique

  • dans un premier temps on va aider l'utilisateur à travailler correctement
  • on va développer des outils destinés aux utilisateurs
    • on parle de la modélisation du réseau dans Wicri/base (référence en avant)

Metadata for contributors

In most content management systems designed "before blogs and wikis" a clear barrier exists between editing contents, programming and managing metadata. On a wiki, all these activities can be handled by any actors, on any page, at any time. Quite any contributor may be faced with having to create new metadata. We have to give him a strong environnement to explore, define and comment such an activity.

This section introduces the need of a new wiki for designing metadata items, its contents and its organization.

Introducing Wicri/metadata

Here's a common situation in the life of a researcher: the writing of a call for papers. The first sentence looks like: DCMI is pleased to announce that DC-2010 will take place in Pittsburgh,. How to write it in a semantic wiki with the good properties?

While reading semantic wiki user manual, introducing a new property in a wiki seems to be very easy. Researcher have just to write something like this:

[[organizer::DCMI]]is pleased to announce that DC-2010 will take place in [[place::Pittsburgh]]

As soon as he pushes on the "Save page" button, the relations and, if needed, the properties are created. Thus the real problem does not deal with syntax, but with semantics: how to choose and to name the properties? For instance, about the role of DCMI in DC conference, we could write: organizer, has organizer, has global organizer, has local organizer, DC:contributor, dc:contributor, has dc:contributor, funded by, organized by etc.

A looking at semanticweb.org illustrates this difficulty[20]. This wiki contains 773 pages in "Property namespace". 768 are real property (5 redirect). 277 pages are classified as "wanted properties" (without explicit page). Looking for DC:creator, we have found several variants. The preferred term is "Has author" (frequency 99). The most used term is "Author" (1058). The expression "Written by" appears 35 times. At least "Author of", "Content author", and "Creator" appear once time.

Thus the following aspects have to be adressed:

  1. How to know if a property dealing with this situation exists in the semantic model of the wiki? The problem, that we have pointed out for sematicweb.org, is distributed on a wiki network.
  2. how to choose a new name for a new property in coherency with the existing ones?
  3. in a multilingual family of wikis, how to translate a metadata item?

We propose to set up a wiki, with an encyclopedic philosophy dealing with metadata. There are several samples of wikis dedicated to metadata on the web. For instance, the are several wikis on the DCMI site ‎[e1]. But genarally, these wikis are dedicated to specialists and, often, are related to a particular schema. Here, we want to be comprehensive by a non-specialist[21] having to deal with many topics at the same time.

Main lines for Wicri/metadata

Metadata are related to a model (possibly expressed through an ontology in a semantic wiki) to represent the structure of the wiki and the properties of wiki resources. Each wiki can be created with a different domain model (e.g. conference resources in the case of YYY, terminology resources in the case of the World Reference Base for soil resources). Moreover, some concepts may be in different languages. As a result, different wikis may use close or similar concepts using different models, which limits the navigation across the wiki network, through the provision of automatically related resources (derefenceable entities).

A specific wiki, called Wicri-base [2] was created to provide common tools for the Wicri community, including presentation models or templates and particular metadata sets (e.g. Infobox laboratory) and metadata elements (e. g. Attribut:A pour ville adapted from [3]).

Representing research resources

The wiki network is composed of resources of scientific communication. It is in itself a Current research Information System (CRIS). The representation of resources is bound by the general domain of research, including concepts which belong to CRIS, Knowledge Organization Systems used in the different research domains or created ad hoc, bibliographic formats such as MARC or the DCMI Scholarly Work Application Profile, datasets formatting models such as text formatting (TEI, DocBook), survey datasets (DDI), the educational formats such as LOM and the IMS-QTI application profile for assessment resources. Additionally, more general resources are necessary to describe Persons (e.g. FOAF) or Knowledge Organization Systems (e.g. SKOS).

Interoperability issues

It is possible to import complete ontologies or vocabularies, such as FOAF for instance. Resources from ontology repositories can be used for instance such as Semanticweb.org [4] or Ontologypattern [5]. However, this does not guarantee on the one hand the interoperability between metadata used in the wiki network and on the other hand interoperability with metadata used in other systems.

For example, the model used to build the wiki of the WRB terminology [6] uses one model whereas Agrowiki uses a different model [7]. A conference on the OpenResearch.org platform uses a specific model [8], whereas on Wicri, it uses another one [9].

This problem also exists inside the wiki network since it is possible to create a wiki without using the resources of the Wicri base or to modify and adapt those resources. Therefore, there is a need to define relations between concepts from the different wiki models.

The wiki as a metadata registry?

Wicri has chosen to define redirects (i.e. owl:sameAs relations) with concepts from ontology repositories. However, the strict equivalence of two concepts is limited. Ontology mapping requires richer relations to be encoded, such as SKOS mapping properties [10] skos:exactMatch, skos:closeMatch, skos:broadMatch, skos:narrowMatch and skos:relatedMatch. Moreover, collaborative ontology mapping mechanisms [Ref géorgien] should be available to the network so that any contributor who create a new metadata concept or identifies a relation between metadata concepts should be able to enrich the system.

This should end up as a wiki-based metadata registry for the Wicri network, with some specificity though. The wiki architecture allows expressing a mix between structured and unstructured content. Scientific concepts are not defined only with traditional definitions, but also using scientific literature, guidelines etc. This is particularly important in a multilingual context as we identified in the Wicri network as well as in other collaborative scientific platforms (Mizohata, Jadoul, 2010?). A review of concepts used to describe e-assessment resources in the field of education (Sarre et al., 2010) demonstrate that many concepts proposed as metadata for this domain are not fully specified. There are metadata schemas, as well as concepts only defined in journal articles, guidelines, …

In addition, semantic wikis include some intelligence which can be useful to make inferences on the relations or potential relations between the concepts used in the network. The wiki network is not only a CRIS, it also makes research content and scientific communication a building block of the semantic Web by providing dereferenceable resources and reasoning mechanisms through a decentralized and collaborative environment.

Old version

La contribution de Muriel donne une autre structure à la section ce qui suit -> Metadata for computers est temporairement conservée pour vérifier que rien n'aura été oublié dans la version finale Jacques Ducloy 11:05, 5 March 2010 (UTC)

Petit paragraphe pour mettre en évidence 2 choix fondamentaux :

  • les métadonnées et vocabulaires à caractère générique (universel?)
  • les métadonnées dans la langue du wiki

Metadata sources

  • On donne les principales sources sur lesquelles s'appuyer
  • comme on s'adresse aux contributeurs qui procèdent par imitation, il faut traiter en priorité les sites de type openresearch.org

Main sources of metadata:

  • les fondations : le DCMI
  • CRIS:
    • The skeleton of Wicri is a Current Research Information System
    • generic semantic systems dealing with research : for example Openresearch.org
  • bibliographic formats, mainly those that operate in a multilingual context (for instance Unimarc)
  • Text formatting????: TEI...
  • LOM...
  • Autorités.
  • Vocabulaire généraux (Eurovoc + spécificités par communautés scientifiques ex HAL...)
  • warning :copyright sur les systèmes de métadonnées ()
  • signaler également le besoin d'ouverture vers des vocabulaires génériques

Architecture and metadata for Wicri/metadata

Now, we would like to design a wiki based web site that must help a contributor in finding the best metadata items in a given situation. Once again, we are in a starting step and we will have to experiment several strategies.

  • On donne des premières informations sur l'organisation des données sur le wiki, toujours vu du contributeur
  • aspects multilingues
  • choix du style de nommage des métadonnées
  • on fait une liaison avec la section suivante sur les aspects informatiques

Metadata for computers

On veut utiliser les métadonnées pour aider le contributeur et pour ouvrir de nouvelles facilités. Cette section aborde deux types de problèmes :

  • la cohérence interne du réseau de wikis
  • les possibilités d'interaction avec le reste du Monde

Pour chaque cas nous avons 2 fois 2 sous-sections :

  • Notre analyse actuelle du problème et nos stratégies à court terme
  • Une vision plus prospective

Handling Wicri network coherency

2 points :

  • cohérence du réseau
  • étendre les fonctionnalités disponibles au niveau wiki vers le résaeu - mais je ne suis pas certain qu'il y ait beaucoup de métadonnées sur ce pointJacques Ducloy 08:55, 19 February 2010 (UTC)

 <wicri>
  <wiki prefix="wicri.fr" 
        type="public" 
        server="http://maquettewicri.loria.fr" 
        path="/fr.wicri/index.php5?">
     <title>Wicri (fr)</title>
     <article title="$1"/>
     <log title="Special:Connexion"/>
     <recentChanges title="Special:Modifications_r%C3%A9centes"/>
  </wiki>
  <wiki prefix="wicri.en" 
        type="public"        
        server="http://maquettewicri.loria.fr" 
        path="/en.wicri/index.php5?">
     <title>Wicri (en)</title>
     <article title="$1"/>
     <log title="Special:UserLogin"/>
     <recentChanges title="Special:RecentChanges"/>
  </wiki>

Human machine interface for contributors

Il faut faciliter la tâche du contributeur, surtout en phase d'apprentissage, qui doit intervenir dans un monde où cohabitent des informations structurées et non structurées. La littérature propose des réponses partielles, à base de RDF pour des approches centrées sur les données (au sens SGDB) et des réponses à base d'XML pour la construction d'un document lisible par l'homme. Est-il possible de concilier les 2 approches ? Y-a-t-il des choses à faire à court terme ?

Un paragraphe -> EPFL

Interoperability with semantic web & Digital Libraries

  • exemple : exporter / importer une ontologie avec des systèmes qui ne sont pas Wicri
  • Natalya F. Noy and Tania Tudorache : Collaborative Ontology Development on the (Semantic) Web

External web mining

Un paragraphe -> Loria :

  • comment utiliser l'information formalisée dans le réseau à de s fins de veille ?
  • Comment les métadonnées peuvent jouer un rôle ?

Discussion & conclusion

:-)les métadonnées c'est important !!!

Plus sérieusement, on peut développer quelque chose autour de plus on s'y prend tôt, mieux c'est... (alors le wiki permet de différer...)

Les possibilités de tirer partie du réseau de wikis et du raisonnement sémantique

Acknowledgments

Thanks to people who have contribute by reading and correcting this page: Jean-Pierre Thomesse.

References

  • [c1] Charbel Rahhal, Hala Skaf-Molli, Pascal Molli, and Stéphane Weiss: Multi-synchronous Collaborative Semantic Wikis. In Wise'09: International Conference on Web Information Systems , 2009.
    < http://www.loria.fr/~molli/pmwiki/uploads/Main/Skaf09wise.pdf >
  • [d1] Jacques Ducloy, Yann Nicolas, Diane Le Hénaff, Muriel Foulonneau, Luc Grivel, Jean-Paul Ducasse. Metadata towards an e-research cyberinfrastructure - The case of francophone PhD theses. Proceedings of DC 2006, Manzanillo, Mexico, 2006.
  • [e1] Fredrik Enoksson: A MoinMoin Wiki Syntax for Description Set Profiles, DCMI Working draft,(2008)
    < http://dublincore.org/documents/2008/10/06/dsp-wiki-syntax/ >
  • [e2] Gregor Erbach - Data-centric view in e-Science information systems. Data Science Journal Vol. 5 (2006) pp.219-222
    < http://www.jstage.jst.go.jp/article/dsj/5/0/219/_pdf >
  • [h1] Eran Hodis, Jaime Prilusky, Eric Martz, Israel Silman, John Moult and Joel L. Sussman: Proteopedia - a scientific 'wiki' bridging the rift between 3D structure and function of biomacromolecules, Genome Biology 2008, 9:R121 doi:10.1186/gb-2008-9-8-r121
    < http://genomebiology.com/2008/9/8/R121 >
  • [j1] Keith G. Jeffery. CRIS + open access = the route to research knowledge on the GRID. In 71st IFLA General Conference and Council proceedings, Oslo, Norway, 2005
    < http://www.ifla.org/IV/ifla71/papers/007e-Jeffery.pdf >
  • [j2] Keith G. Jeffery - Technical Infrastructure and Policy Framework for Maximising the Benefits from Research Output in:ELPUB2007. Openness in Digital Publishing: Awareness, Discovery and Access - Proceedings of the 11th International Conference on Electronic Publishing held in Vienna, Austria 13-15 June 2007 / Edited by: Leslie Chan and Bob Martens. ISBN 978-3-85437-292-9, 2007, pp. 1-12
    < http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.102.5044&rep=rep1&type=pdf>
  • [k1] Markus Krötzsch, Denny Vrandecic, Max Völkel, Heiko Haller, Rudi Studer. Semantic Wikipedia. In Journal of Web Semantics 5/2007, pp. 251–261. Elsevier 2007.
  • [l1] Christoph Lange. SWiM – a semantic wiki for mathematical knowledge management. In Sean Bechhofer, Manfred Hauswirth, Jörg Hoffmann, and Manolis Koubarakis, editors, ESWC, volume 5021 of Lecture Notes in Computer Science, pages 832–837. Springer, 2008.
  • [o1] Gérald Oster, Pascal Urso, Pascal Molli and Abdessamad Imine. In Proceedings of the 2006 ACM Conference on Computer Supported Cooperative Work, CSCW 2006, Banff, Alberta, Canada, November 4-8, 2006, 2006.

Notes

  1. < http://stats.wikimedia.org/EN/TablesWikipediaEN.htm#namespaces >
  2. Wicrified is a neologism that comes from term “wikified” in Wikipedia jargon.This task consists in using Wiki mark-up in order to adapt a document to Wicri network, i.e. setting wiki links, categories or semantic annotations.
  3. http://maquettewicri.loria.fr/fr.ticri/index.php5?title=Pittsburgh
  4. http://maquettewicri.loria.fr/fr.wicri-t-eau/index.php5?title=Pittsburgh
  5. http://maquettewicri.loria.fr/fr.wicri/index.php5?title=Pittsburgh
  6. http://maquettewicri.loria.fr/fr.artist/index.php5?title=Qu%E2%80%99est-ce_qu%E2%80%99une_biblioth%C3%A8que_num%C3%A9rique%2C_au_juste_%3F
  7. http://maquettewicri.loria.fr/fr.ticri/index.php5?title=Qu%E2%80%99est-ce_qu%E2%80%99une_biblioth%C3%A8que_num%C3%A9rique%2C_au_juste_%3F
  8. http://maquettewicri.loria.fr/fr.wicri/index.php5?title=Qu%27est-ce_qu%27une_biblioth%C3%A8que_num%C3%A9rique%2C_au_juste_%3F
  9. http://maquettewicri.loria.fr/fr.wicri/index.php5?title=Qu%27est-ce_qu%27une_biblioth%C3%A8que_num%C3%A9rique%2C_au_juste_%3F
  10. http://maquettewicri.loria.fr/fr.wicri/index.php5?title=Qu%27est-ce_qu%27une_biblioth%C3%A8que_num%C3%A9rique%2C_au_juste_%3F
  11. [http://www.zacker.org/semantic-web-research-isnt-working RDF Semantic web research isn't working, Zack Rosen's post
  12. < http://www.mediawiki.org/wiki/MediaWiki >
  13. Content Managment System
  14. An image map is a list of coordinates relating to a specific image, created in order to hyperlink areas of this image to various destinations.
  15. < http://proteopedia.org/wiki/index.php >
  16. For instance Acer on
  17. < http://semantic-mediawiki.org/wiki/Semantic_MediaWiki >
  18. http://wiki.openmath.org/
  19. < http://m3p.gforge.inria.fr/pmwiki/pmwiki.php >
  20. Data bave been collected on 4 March 2010.
  21. For instance, we must avoid to links to pages that contains thousand lines of RDF/XML, as an explanation!

More biblio

  • M Krötzsch, S Schaffert, D Vrandecic. Reasoning in semantic wikis - Reasoning Web, 2007 - Springer