User Tools

Site Tools


ipri_news_analyzer

IPRI News Analyzer

IPRI News Analyzer (or IpriNA) is a multi-agent tool for quantitative measure that was initially developed for the ANR project IPRI. Its purpose is fourfold:

  • Managing a corpus of digital media rss feeds,
  • Reading these feeds and storing the relevant content into a database,
  • Instrumenting the classification of the stored items into news subjects,
  • Giving quantitative statistics about the relations the media sources and the subjects over time.

Ipri News Analyzer is only available in french (which means both that the GUI language is french and that language processing assumes that french resources are collected).

Poster (in french)

In more details

The general issue: measuring pluralism and redundancy in the online press

(in-depth discussion on the subject can be found on the project site)

The focus of the IPRI project was to measure the influence of the online medium on pluralism versus redundancy. Two common conception are indeed either that online press is more plural (since free speech on the Internet must apply to news on the Net) and that online press is more redundant (because of the ease of plagiary). There had been several studies before, but they were qualitative, while IPRI aimed at being quantitative.

The quantitative analysis of the online press was done by monitoring the rss feeds of most general news providers, in most areas (newspaper websites, pure players, blogs and so on). IPRI news analyzer was the tool that was developed to manage the corpus of rss feeds, monitor the production of these feeds, and analyze their content.

The results of the study (covering much more than just what has been measured using IPRI News Analyzer) is available on the project site. This page is about the software.

Using IPRI News Analyzer

IPRI News Analyzer has a few dependencies, such as a MySQL database and the TreeTagger software (which itself needs a Perl interpreter). An installation guide is available (if somewhat outdated).

Corpus definition, feeds item collection and analysis can be launched either separately, or simultaneously.

  • Database initialisation provides the set of rss feeds that was used in IPRI experiments, so you will want to replace it with your own set. The Corpus application lets you define your own tags (within some limits) and more importantly your own media sources and the feeds you follow for each of them (you will then tag the feeds).

  • Collection lets you define what to collect and when, and whether to process the language in real time or (if collection is already too resource-intensive) in batch mode. You can then oversee the collection and get hints on why collecting some feeds may not work.

  • A Log component lets you oversee the collection and memory usage, and gather stats on what has been stored in the database.

  • The Analysis component lets you group news articles into subjects. Automated classification processes include using Google News subjects, and string matching-based supervised classification and clustering, and automated classification results must be manually validated. Manual classification is possible as well.

  • The Analysis component then computes various statistics about the news subjects in relation to the feeds and their tags.

(Note: the data provided as example here are partial and certainly do not accurately represent the corresponding media sources. If you wish accurate data, try it yourself!)

Licence

IPRI News Analyzer is available under the GNU GPL v3.

All related documentation is available under the CC Attribution-Noncommercial-Share Alike 3.0 Unported.

Download

Contributors: Cyril Laitang, Elöd Egyed-Zsigmond, Samuel Gesche
Release date: 2013-10-21 (slight refactoring from the 2011 release)

An installation guide and a user guide (both in French) can be found on the project site.

ipri_news_analyzer.txt · Last modified: 2014/01/07 09:36 by sgesche