User Tools

Site Tools


development_platform

The DIRE Development Platform

The DIRE development platform is based on GATE. This has two implications:

  • Our projects benefit from the GATE capabilities for anything that involves processing documents.
  • Our contribution is made available to GATE users through the use of plugins shared in our repository;

GATE in a nutshell

As presented in the GATE manual, “the basic business of GATE is annotating documents”. This is quite powerful in itself, given the numerous possible definitions and uses of both document (any content) and annotation (any content attached to any interval within a document). However, GATE will only process documents individually and sequentially (so if you need multi-document processing, you have to merge them into a single document).

Relationship between DIRE and GATE

A DIRE-base software can be divided into three sets:

  • the DIRE part (which is the team's contribution), including both GATE plugins (for reusability in- and outside of the team) and non-GATE modules (that are outside the scope of GATE - for instance the software GUI)
  • the GATE part, including GATE embedded enriched with the necessary plugins (some of which are DIRE contributions)
  • external material, which are part of neither - external libraries, but also the processed documents and resources that are made available by partners within the scope of a project.

How to contribute

All members of the DRIM team that work on documents are strongly encouraged to contribute to and benefit from the DIRE platform. Those that are not sure how to do so will find useful information here.

Available Plugins

All plugins listed here are available through our repository, located at http://liris.cnrs.fr/dire/gate/gate-update-site.xml.

Three plugins are currently available, providing various Processing Resources:

  • Toolkit_General provides all-purpose processing tools that can be of use regardless of the current project,
  • Toolkit_PolytonicGreek provides language processing and quotation retrieval PRs aimed at ancient greek (mainly developed around the ANR project Biblindex),
  • Toolkit_Web provides language processing and data mining PRs aimed at the World Wide Web.

Notes

  • A ressource having the type Corrector means that that resource will modify the content of the document being processed. Be careful not to run any annotating resource before a Corrector: annotations are characterised by their offset from the first character of the document, and Correctors may insert or delete characters.
  • Annotators will add annotations to the processed document (and may modify existing ones), while File Writers will do the eponymous activity (and will usually not modify nor add annotations).
  • Globally, plugins may or may not comply to any rule regarding the intended use of the GATE api. As our knowledge of the underlying principles increases, so will our mastery of the tool and its intended uses.
development_platform.txt · Last modified: 2014/06/13 12:33 by sgesche