Paralelización del corpus sensemespañol-catalán

  1. Vázquez García, Glòria
  2. Fernández Montraveta, Ana María
Anuari de Filologia. Estudis de Lingüística

ISSN: 2014-1408

Year of publication: 2011

Issue: 1

Pages: 167-193

Type: Article

More publications in: Anuari de Filologia. Estudis de Lingüística


This paper presents a parallel corpus for Spanish and Catalan, SenSem. The parallel corpus has been annotated at several linguistic levels (morphological, syntactical and semantic). The information covered at the different levels ranges from words to phrases, and sentences. One of the main values of the resource presented in this work is that it is the first corpus for Catalan that has been annotated with information regarding sentence semantics such as aspectuality and modality. The methodology followed in this project was to translate into Catalan the corpus we already had available from a previous project in Spanish. The link between both corpora has been established at the different informative units mentioned above. All the linguistic annotation has been inherited, correcting necessary aspects when it proved necessary.

Bibliographic References

  • ATSERIAS, J., CASAS, B., COMELLES, E., GONZÁLEZ, M., PADRÓ, L. y PADRÓ, M. (2006), ‚FreeLing 1.3: Syntactic and semantic services in an open-source NLP library, en Proceedings of the fifth International Conference on Language Resources and Evaluation, 48-55.
  • BAKER, M. (1993), ‚Corpus linguistics and translation studies: implications and applications, en Baker, M. Francis, G. y E. Tognini-Bonelli (eds.), Text and Technology: In Honour of John Sinclair, Amsterdam y Filadelfia, John Benjamins, 233- 250.
  • CASTELLÓN, I., FERNÁNDEZ, A., VÁZQUEZ, G., ALONSO, L. y CAPILLA, J.C. (2006), "The Sensem Corpus: a Corpus Annotated at the Syntactic and Semantic Level", en Fifth International Conference on Language Resources and Evaluation (LREC), 355-359.
  • FERNÁNDEZ, A., VÁZQUEZ, G. y CASTELLÓN, I. (2006A), "SenSem: a Databank for Spanish Verbs", en Proceedings of the X Ibero-American Workshop on Artificial Intelligence, IBERAMIA, Ribeirão Preto, Brasil.
  • FERNÁNDEZ, A., VÁZQUEZ, G. y TERUEL, D. (2006B), "Interfaz de explotación del corpus SenSem", en Actas del Congreso de la Asociación Española de Lingüística Aplicada (AESLA), XXXX.
  • MURATA, M., Q. MA, K. UCHIMOTO, T. KANAMARU, H. ISAHARA (2006), ‚Japanese-toEnglish translations of tense, aspect, and modality using machine-learning methods and comparison with machine-translation systems on market, Language Resources and Evaluation, 40, 233-242.
  • PUSTEJOVSKY, J., P. HANKS, R. SAURÍ, A. SEE, R. GAIZAUSKAS, A. SETZER, D. RADEV, B. SUNDHEIM, D. DAY, L. FERRO y LAZO, M. (2003), ‚The TIMEBANK Corpus, en Proceedings of Corpus Linguistics, 647-656.
  • SAURÍ, R., M. VERHAGEN, y PUSTEJOVSKY, J. (2006), ‚Annotating and Recognizing Event Modality in Text, en Proceedings of the 19th International FLAIRS Conference, FLAIRS 2006, Melbourne Beach, Florida.
  • SMITH, C. (1997), ‚The Parameter of Aspect‛, en Studies in Linguistics & Philosopher: Volume 43, Dordrecht, Kluwer Academic.
  • VÁZQUEZ, G. y FERNÁNDEZ, A. (2009), ‚Ampliación del Banco de Datos de Verbos del español SenSem, en Castillo Carballo, M.A y García Platero, J.M. (coords.), La lexicografía en su dimensión teórica, Murcia, Publicaciones de la Universidad de Murcia, 957-969.
  • VILLALBA, X. (2006), ‚Una base de datos de construcciones en catal{n y español, en Actas del VII Congreso de Lingüística General (CDRom. ISBN: 84-475-2086-8). Universitat de Barcelona. Disponible en: :// Acceso: 05.09.11
  • XIAO, R. y MCENERY, T. (2004), Aspect in Mandarin Chinese: A corpus-based study, Studies in Language Companion Series, 73, Amsterdam, John Benjamins P