Toolkit_Web is a GATE plugin developed by the Liris DRIM team. Its purpose is to group various processing resources aimed at processing data taken from the World Wide Web.
The available Processing Resources provided by this plugin are:
Resource | Encoding Errors Corrector |
---|---|
Resource type | Corrector - this PR modifies a Language Resource. Be careful not to run any annotating resource before a Corrector: annotations are characterised by their offset from the first character of the document, and Correctors may insert or delete characters. |
Dependencies | none |
Description | This PR corrects encoding errors in the document, in the case where the encoding format provided within the file is not the right one. It is primarily aimed at news feeds in French (which have many more reasons to be affected than those in English). |
Screenshot |