Stratégies pour l'anonymisation systématique d'un corpus d'interactions plurilingues

Christophe Reffay 1 François-Marie Blondel 1 Emmanuel Giguet 2
2 Equipe Hultech - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen
Abstract : Considering the textual interaction analysis field, researchers who want to share their corpus are facing many difficulties when they try to remove the marks identifying physical persons from their corpus. The European law suggests that such marks may be removed before any publication of the corpus. Many tools dedicated to online discussion analysis have been already developed in the Calico platform. They are language independent. Following this way, we propose here an interactive and systematic anonymisation process working without dictionary and being then available for any language. This process has been applied to a first multi-lingual corpus coming from the Galanet project. This paper emphasises the difficulties arising during this anonymisation process. We present the results of this experience. Beyond the substitution of identity marks, we propose two mining strategies that help to detect new lexical forms that may reveal personal information.
Document type :
Conference papers
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download

https://edutice.archives-ouvertes.fr/edutice-00718390
Contributor : Christophe Reffay <>
Submitted on : Monday, July 16, 2012 - 6:52:32 PM
Last modification on : Tuesday, November 19, 2019 - 11:13:31 AM
Long-term archiving on : Wednesday, October 17, 2012 - 2:55:43 AM

File

ReffayBlondelGiguet_complet.pd...
Files produced by the author(s)

Identifiers

  • HAL Id : edutice-00718390, version 1

Citation

Christophe Reffay, François-Marie Blondel, Emmanuel Giguet. Stratégies pour l'anonymisation systématique d'un corpus d'interactions plurilingues. Intercompréhension, Jun 2012, Grenoble, France. pp.1-21. ⟨edutice-00718390⟩

Share

Metrics

Record views

476

Files downloads

547