Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2006

Open Access 01-12-2006 | Research article

Creating a medical English-Swedish dictionary using interactive word alignment

Authors: Mikael Nyström, Magnus Merkel, Lars Ahrenberg, Pierre Zweigenbaum, Håkan Petersson, Hans Åhlfeldt

Published in: BMC Medical Informatics and Decision Making | Issue 1/2006

Login to get access

Abstract

Background

This paper reports on a parallel collection of rubrics from the medical terminology systems ICD-10, ICF, MeSH, NCSP and KSH97-P and its use for semi-automatic creation of an English-Swedish dictionary of medical terminology. The methods presented are relevant for many other West European language pairs than English-Swedish.

Methods

The medical terminology systems were collected in electronic format in both English and Swedish and the rubrics were extracted in parallel language pairs. Initially, interactive word alignment was used to create training data from a sample. Then the training data were utilised in automatic word alignment in order to generate candidate term pairs. The last step was manual verification of the term pair candidates.

Results

A dictionary of 31,000 verified entries has been created in less than three man weeks, thus with considerably less time and effort needed compared to a manual approach, and without compromising quality. As a side effect of our work we found 40 different translation problems in the terminology systems and these results indicate the power of the method for finding inconsistencies in terminology translations. We also report on some factors that may contribute to making the process of dictionary creation with similar tools even more expedient. Finally, the contribution is discussed in relation to other ongoing efforts in constructing medical lexicons for non-English languages.

Conclusion

In three man weeks we were able to produce a medical English-Swedish dictionary consisting of 31,000 entries and also found hidden translation errors in the utilized medical terminology systems.
Appendix
Available only for authorised users
Literature
1.
go back to reference Cimino JJ: Desiderata for controlled medical vocabularies in the twenty-first century. 1998, 37 (4–5): 394-403. Cimino JJ: Desiderata for controlled medical vocabularies in the twenty-first century. 1998, 37 (4–5): 394-403.
2.
go back to reference Browne AC, Divita G, Aronson AR, McCray AT: UMLS language and vocabulary tools. AMIA Annu Symp Proc. Edited by: Musen MA, Friedman CP, Teich JM. 2003, Washington DC: American Medical Informatics Association, 798-802. Browne AC, Divita G, Aronson AR, McCray AT: UMLS language and vocabulary tools. AMIA Annu Symp Proc. Edited by: Musen MA, Friedman CP, Teich JM. 2003, Washington DC: American Medical Informatics Association, 798-802.
3.
go back to reference Weske-Heck G, Zaiss A, Zabel M, Schulz S, Giere W, Schopen M, Klar R: The German specialist lexicon. Proc AMIA Symp. 2002, 884-888. Weske-Heck G, Zaiss A, Zabel M, Schulz S, Giere W, Schopen M, Klar R: The German specialist lexicon. Proc AMIA Symp. 2002, 884-888.
4.
go back to reference Zweigenbaum P, Baud R, Burgun A, Namer F, Jarrousse E, Grabar N, Ruch P, Le Duff F, Forget JF, Douyere M, Darmoni S: UMLF: a unified medical lexicon for French. Int J Med Inform. 2005, 74 (2–4): 119-24. 10.1016/j.ijmedinf.2004.03.010.CrossRefPubMed Zweigenbaum P, Baud R, Burgun A, Namer F, Jarrousse E, Grabar N, Ruch P, Le Duff F, Forget JF, Douyere M, Darmoni S: UMLF: a unified medical lexicon for French. Int J Med Inform. 2005, 74 (2–4): 119-24. 10.1016/j.ijmedinf.2004.03.010.CrossRefPubMed
5.
go back to reference Jurafsky D, Martin JH: Speech and Language Processing. 2000, Upper Saddle River: Prentice-Hall Inc Jurafsky D, Martin JH: Speech and Language Processing. 2000, Upper Saddle River: Prentice-Hall Inc
6.
go back to reference World Health Organization: International statistical classification of diseases and related health problems -10th revision (ICD-10). 1992, Geneva World Health Organization: International statistical classification of diseases and related health problems -10th revision (ICD-10). 1992, Geneva
7.
go back to reference Gersenovic M: The ICD family of classifications. Methods Inf Med. 1995, 34 (1–2): 172-175.PubMed Gersenovic M: The ICD family of classifications. Methods Inf Med. 1995, 34 (1–2): 172-175.PubMed
8.
go back to reference Socialstyrelsen : Klassifikation av sjukdomar och hälsoproblem 1997. 1996, Stockholm Socialstyrelsen : Klassifikation av sjukdomar och hälsoproblem 1997. 1996, Stockholm
9.
go back to reference World Health Organization: International classification of functioning, disability and health (ICF). 2001, Geneva World Health Organization: International classification of functioning, disability and health (ICF). 2001, Geneva
10.
go back to reference Socialstyrelsen : Klassifikation av funktionstillstånd, funktionshinder och hälsa. 2003, Stockholm Socialstyrelsen : Klassifikation av funktionstillstånd, funktionshinder och hälsa. 2003, Stockholm
13.
go back to reference Nordic Medico-Statistical Committee (NOMESCO): NOMESCO Classification of Surgical Procedures (NCSP), version 1.9. 2004, Copenhagen Nordic Medico-Statistical Committee (NOMESCO): NOMESCO Classification of Surgical Procedures (NCSP), version 1.9. 2004, Copenhagen
14.
go back to reference Socialstyrelsen : Klassifikation av kirurgiska åtgärder 1997. 2004, Stockholm, 2 Socialstyrelsen : Klassifikation av kirurgiska åtgärder 1997. 2004, Stockholm, 2
15.
go back to reference Socialstyrelsen : Klassifikation av sjukdomar och hälsoproblem 1997 Primärvård. 1997, Stockholm Socialstyrelsen : Klassifikation av sjukdomar och hälsoproblem 1997 Primärvård. 1997, Stockholm
17.
go back to reference Melamed ID: Empirical Methods for Exploiting Parallel Texts. 2001, Cambridge: The MIT Press Melamed ID: Empirical Methods for Exploiting Parallel Texts. 2001, Cambridge: The MIT Press
18.
go back to reference Tiedemann J: Combining clues for word alignment. Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL): 12–17 April 2003; Budapest. 2003, Programme chairs Copestake A, Hajic J, 339-346. Tiedemann J: Combining clues for word alignment. Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL): 12–17 April 2003; Budapest. 2003, Programme chairs Copestake A, Hajic J, 339-346.
19.
go back to reference Och FJ, Ney H: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics. 2005, 29 (1): 19-51. 10.1162/089120103321337421.CrossRef Och FJ, Ney H: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics. 2005, 29 (1): 19-51. 10.1162/089120103321337421.CrossRef
20.
go back to reference Ahrenberg L, Merkel M, Petterstedt M: Interactive Word Alignment for Language Engineering. Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL): 12–17 April 2003; Budapest. 2003, Programme chairs Copestake A, Hajic J, 49-52. Ahrenberg L, Merkel M, Petterstedt M: Interactive Word Alignment for Language Engineering. Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL): 12–17 April 2003; Budapest. 2003, Programme chairs Copestake A, Hajic J, 49-52.
21.
go back to reference Merkel M, Petterstedt M, Ahrenberg L: Interactive Word Alignment for Corpus Linguistics. Proceedings from Corpus Linguistics 2003: 29–31 March 2003; Lancaster. Edited by: Archer D, Rayson P, Wilson A, Mc Enery T. 2003, 533-542. Merkel M, Petterstedt M, Ahrenberg L: Interactive Word Alignment for Corpus Linguistics. Proceedings from Corpus Linguistics 2003: 29–31 March 2003; Lancaster. Edited by: Archer D, Rayson P, Wilson A, Mc Enery T. 2003, 533-542.
22.
go back to reference Deléger L, Merkel M, Zweigenbaum P: Enriching Medical Terminologies: an Approach Based on Aligned Corpora. To appear in the Proceedings 20th International Congress of the European Federation for Medical Informatics (MIE 2006): 27-30 August 2006; Maastricht. Deléger L, Merkel M, Zweigenbaum P: Enriching Medical Terminologies: an Approach Based on Aligned Corpora. To appear in the Proceedings 20th International Congress of the European Federation for Medical Informatics (MIE 2006): 27-30 August 2006; Maastricht.
23.
go back to reference Deléger L, Merkel M, Zweigenbaum P: Using Word Alignment to Extend Multilingual Medical Terminologies. the Proceedings of Language Resources and Evaluation. 2006, Genova, 9-14. May 23 2006 , , Workshop on Acquiring and representing multilingual, specialized lexicons: the case of biomedicine Deléger L, Merkel M, Zweigenbaum P: Using Word Alignment to Extend Multilingual Medical Terminologies. the Proceedings of Language Resources and Evaluation. 2006, Genova, 9-14. May 23 2006 , , Workshop on Acquiring and representing multilingual, specialized lexicons: the case of biomedicine
24.
go back to reference Tapanainen P, Järvinen T: A non-projective dependency parser. Proceedings of the 5th Conference on Applied Natural Language Processing: 31 March-3 April 1997; Washington D.C. Edited by: Jacobs P. 1997, 64-71.CrossRef Tapanainen P, Järvinen T: A non-projective dependency parser. Proceedings of the 5th Conference on Applied Natural Language Processing: 31 March-3 April 1997; Washington D.C. Edited by: Jacobs P. 1997, 64-71.CrossRef
25.
go back to reference Tiedemann J: ISA & ICA – Two Web Interfaces for Interactive Alignment of Bitexts. Proceedings of LREC. 2006, Genova, Italy Tiedemann J: ISA & ICA – Two Web Interfaces for Interactive Alignment of Bitexts. Proceedings of LREC. 2006, Genova, Italy
26.
go back to reference Baud R, Lovis C, Rassinoux AM, Michel PA, Scherrer JR: Automatic Extraction of Linguistic Knowledge from an International Classification. Medinfo. Edited by: Cesnik B, McCray AT, Scherrer JR. 1998, Amsterdam: IOS Press, 581-585. Baud R, Lovis C, Rassinoux AM, Michel PA, Scherrer JR: Automatic Extraction of Linguistic Knowledge from an International Classification. Medinfo. Edited by: Cesnik B, McCray AT, Scherrer JR. 1998, Amsterdam: IOS Press, 581-585.
27.
go back to reference Lovis C, Baud R, Rassinoux AM, Michel PA, Scherrer JR: Medical dictionaries for patient encoding systems: a methodology. Artif Intell Med. 1998, 14 (1–2): 201-14. 10.1016/S0933-3657(98)00023-2.CrossRefPubMed Lovis C, Baud R, Rassinoux AM, Michel PA, Scherrer JR: Medical dictionaries for patient encoding systems: a methodology. Artif Intell Med. 1998, 14 (1–2): 201-14. 10.1016/S0933-3657(98)00023-2.CrossRefPubMed
28.
go back to reference Déjean H, Gaussier É, Renders JM, Sadat F: Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval. Artif Intell Med. 2005, 33 (2): 111-24. 10.1016/j.artmed.2004.07.015.CrossRefPubMed Déjean H, Gaussier É, Renders JM, Sadat F: Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval. Artif Intell Med. 2005, 33 (2): 111-24. 10.1016/j.artmed.2004.07.015.CrossRefPubMed
29.
go back to reference Markó K, Baud R, Zweigenbaum P, Merkel M, Toporowska-Gronostaj M, Kokkinakis D, Schulz S: Cross-Lingual Alignment of Medical Lexicons. Proceedings of Language Resources and Evaluation. 2006, Genoa, 5-8. 23 May 2006, ; Workshop on Acquiring and representing multilingual, specialized lexicons: the case of biomedicine Markó K, Baud R, Zweigenbaum P, Merkel M, Toporowska-Gronostaj M, Kokkinakis D, Schulz S: Cross-Lingual Alignment of Medical Lexicons. Proceedings of Language Resources and Evaluation. 2006, Genoa, 5-8. 23 May 2006, ; Workshop on Acquiring and representing multilingual, specialized lexicons: the case of biomedicine
30.
go back to reference Baud RH, Nyström M, Borin L, Ewans R, Schultz S, Zweigenbaum P: Interchanging Lexical Information for a Multilingual Dictionary. AMIA Symp. AMIA Annu Symp Proc. 2005, 31-5. Baud RH, Nyström M, Borin L, Ewans R, Schultz S, Zweigenbaum P: Interchanging Lexical Information for a Multilingual Dictionary. AMIA Symp. AMIA Annu Symp Proc. 2005, 31-5.
31.
go back to reference Petersson H, Nilsson G, Strender LE, Ahlfeldt H: The connection between terms used in medical records and coding system: a study on Swedish primary health care data. Med Inform Internet Med. 2001, 26 (2): 87-99.CrossRefPubMed Petersson H, Nilsson G, Strender LE, Ahlfeldt H: The connection between terms used in medical records and coding system: a study on Swedish primary health care data. Med Inform Internet Med. 2001, 26 (2): 87-99.CrossRefPubMed
Metadata
Title
Creating a medical English-Swedish dictionary using interactive word alignment
Authors
Mikael Nyström
Magnus Merkel
Lars Ahrenberg
Pierre Zweigenbaum
Håkan Petersson
Hans Åhlfeldt
Publication date
01-12-2006
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2006
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/1472-6947-6-35

Other articles of this Issue 1/2006

BMC Medical Informatics and Decision Making 1/2006 Go to the issue