Skip to Main content Skip to Navigation
Conference papers

Stratégies pour l'anonymisation systématique d'un corpus d'interactions plurilingues

Christophe Reffay 1 François-Marie Blondel 1 Emmanuel Giguet 2 
2 Equipe Hultech - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image et Instrumentation de Caen
Abstract : Considering the textual interaction analysis field, researchers who want to share their corpus are facing many difficulties when they try to remove the marks identifying physical persons from their corpus. The European law suggests that such marks may be removed before any publication of the corpus. Many tools dedicated to online discussion analysis have been already developed in the Calico platform. They are language independent. Following this way, we propose here an interactive and systematic anonymisation process working without dictionary and being then available for any language. This process has been applied to a first multi-lingual corpus coming from the Galanet project. This paper emphasises the difficulties arising during this anonymisation process. We present the results of this experience. Beyond the substitution of identity marks, we propose two mining strategies that help to detect new lexical forms that may reveal personal information.
Document type :
Conference papers
Complete list of metadata

Cited literature [16 references]  Display  Hide  Download
Contributor : Christophe Reffay Connect in order to contact the contributor
Submitted on : Monday, July 16, 2012 - 6:52:32 PM
Last modification on : Saturday, June 25, 2022 - 9:46:56 AM
Long-term archiving on: : Wednesday, October 17, 2012 - 2:55:43 AM


Files produced by the author(s)


  • HAL Id : edutice-00718390, version 1


Christophe Reffay, François-Marie Blondel, Emmanuel Giguet. Stratégies pour l'anonymisation systématique d'un corpus d'interactions plurilingues. Intercompréhension, Jun 2012, Grenoble, France. pp.1-21. ⟨edutice-00718390⟩



Record views


Files downloads