Skip to main content
Top
Published in: Journal of Medical Systems 5/2011

01-10-2011 | Original Paper

Characterizing Mammography Reports for Health Analytics

Authors: Carlos C. Rojas, Robert M. Patton, Barbara G. Beckerman

Published in: Journal of Medical Systems | Issue 5/2011

Login to get access

Abstract

As massive collections of digital health data are becoming available, the opportunities for large-scale automated analysis increase. In particular, the widespread collection of detailed health information is expected to help realize a vision of evidence-based public health and patient-centric health care. Within such a framework for large scale health analytics we describe the transformation of a large data set of mostly unlabeled and free-text mammography data into a searchable and accessible collection, usable for analytics. We also describe several methods to characterize and analyze the data, including their temporal aspects, using information retrieval, supervised learning, and classical statistical techniques. We present experimental results that demonstrate the validity and usefulness of the approach, since the results are consistent with the known features of the data, provide novel insights about it, and can be used in specific applications. Additionally, based on the process of going from raw data to results from analysis, we present the architecture of a generic system for health analytics from clinical notes.
Footnotes
2
Breast Imaging Reporting and Data System, developed by the American College of Radiology.
 
3
This, of course, does not hold for every document and every human (within a given language) since specialized terminology is not universally accessible. It is, however, a reasonable assumption within a field, e.g., health sciences.
 
Literature
2.
go back to reference North Carolina Medical Journal. Special Issue on Data and Health Policy, 2008. North Carolina Medical Journal. Special Issue on Data and Health Policy, 2008.
3.
go back to reference Aronow, D. B., Fangfang, F., and Croft, W. B., Ad hoc classification of radiology reports. J. Am. Med. Inform. Assoc., 6(5):393–411, 1999.CrossRef Aronow, D. B., Fangfang, F., and Croft, W. B., Ad hoc classification of radiology reports. J. Am. Med. Inform. Assoc., 6(5):393–411, 1999.CrossRef
4.
go back to reference Bakalar, R., IBM’s vision for the future in patient-centric global health care: IBM’s vision of how advanced health analytics and automated health information infrastructure will transform anatomic pathology services. Arch. Pathol. Lab. Med., 132(5):766–771, 2008. Bakalar, R., IBM’s vision for the future in patient-centric global health care: IBM’s vision of how advanced health analytics and automated health information infrastructure will transform anatomic pathology services. Arch. Pathol. Lab. Med., 132(5):766–771, 2008.
5.
go back to reference Berndt, D. J., and Clifford, J., Using dynamic time warping to find patterns in time series. In: KDD Workshop, pp. 359–370, 1994. Berndt, D. J., and Clifford, J., Using dynamic time warping to find patterns in time series. In: KDD Workshop, pp. 359–370, 1994.
6.
go back to reference Borg, I., and Groenen, P., Modern Multidimensional Scaling: Theory and Applications. Springer, 1996. Borg, I., and Groenen, P., Modern Multidimensional Scaling: Theory and Applications. Springer, 1996.
7.
go back to reference Burnside, B., Strasberg, H., and Rubin, D., Automated indexing of mammography reports using linear least squares fit. In: Proc. of the 14th International Congress and Exhibition on Computer Assisted Radiology and Surgery, pp. 449–454, 2000. Burnside, B., Strasberg, H., and Rubin, D., Automated indexing of mammography reports using linear least squares fit. In: Proc. of the 14th International Congress and Exhibition on Computer Assisted Radiology and Surgery, pp. 449–454, 2000.
8.
go back to reference Chapman, W. W., Bridewell, W., Hanbury, P., Cooper, G. F., and Buchanan, B. G., Evaluation of negation phrases in narrative clinical reports. In: Proc AMIA Symp, pp. 105–109, 2001. Chapman, W. W., Bridewell, W., Hanbury, P., Cooper, G. F., and Buchanan, B. G., Evaluation of negation phrases in narrative clinical reports. In: Proc AMIA Symp, pp. 105–109, 2001.
9.
go back to reference Dumais, S., Faceted search. Encyclopedia of Database Systems, pp. 1103–1109, 2009. Dumais, S., Faceted search. Encyclopedia of Database Systems, pp. 1103–1109, 2009.
10.
go back to reference Giger, M., Computer-aided diagnosis of breast lesions in medical images. Comput. Sci. Eng. 2(5):39–45, 2000.CrossRef Giger, M., Computer-aided diagnosis of breast lesions in medical images. Comput. Sci. Eng. 2(5):39–45, 2000.CrossRef
11.
go back to reference Harkema, H., Setzer, A., Gaizauskas, R., and Hepple, M., Mining and modelling temporal clinical data. In: Proceedings of the UK e-Science All Hands Meeting, 2005. Harkema, H., Setzer, A., Gaizauskas, R., and Hepple, M., Mining and modelling temporal clinical data. In: Proceedings of the UK e-Science All Hands Meeting, 2005.
12.
go back to reference Heffernan, R., Mostashari, F., Das, D., Karpati, A., Kulldorff, M., and Weiss, D., Syndromic surveillance in public health practice, New York City. Emerg. Infect. Dis. 10(5):858–64, 2004. Heffernan, R., Mostashari, F., Das, D., Karpati, A., Kulldorff, M., and Weiss, D., Syndromic surveillance in public health practice, New York City. Emerg. Infect. Dis. 10(5):858–64, 2004.
14.
go back to reference Jain, N. L., and Friedman, C., Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports. In: Proc AMIA Annu Fall Symp, pp. 829–833, 1997. Jain, N. L., and Friedman, C., Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports. In: Proc AMIA Annu Fall Symp, pp. 829–833, 1997.
15.
go back to reference Jolliffe, I., Principal Component Analysis. Springer, 2002. Jolliffe, I., Principal Component Analysis. Springer, 2002.
16.
go back to reference Lohr, S., Tech Companies Push to Digitize Patients’ Records. New York Times, September 10 2009. Lohr, S., Tech Companies Push to Digitize Patients’ Records. New York Times, September 10 2009.
17.
go back to reference Ma, F., Bajger, M., and Bottema, M., Temporal analysis of mammograms based on graph matching. Digital Mammography, pp. 158–165, 2010. Ma, F., Bajger, M., and Bottema, M., Temporal analysis of mammograms based on graph matching. Digital Mammography, pp. 158–165, 2010.
19.
go back to reference Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., and Hurdle, J. F., Extracting information from textual documents in the electronic health record: A review of recent research. In: Yearb Med Inform, pp. 128–144, 2008. Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., and Hurdle, J. F., Extracting information from textual documents in the electronic health record: A review of recent research. In: Yearb Med Inform, pp. 128–144, 2008.
20.
go back to reference Mitchell, T. M., Machine Learning, 1st edn.. New York, NY: McGraw-Hill, Inc, 1997.MATH Mitchell, T. M., Machine Learning, 1st edn.. New York, NY: McGraw-Hill, Inc, 1997.MATH
21.
go back to reference Nassif, H., Woodsz, R., Burnsidey, E., Ayvacix, M., Shavlik, J., and Page, D., Information extraction for clinical data mining: A mammography case study. In: ICDM - DDDM09 Workshop, 2009. Nassif, H., Woodsz, R., Burnsidey, E., Ayvacix, M., Shavlik, J., and Page, D., Information extraction for clinical data mining: A mammography case study. In: ICDM - DDDM09 Workshop, 2009.
22.
go back to reference Norén, G., Hopstadius, J., Bate, A., Star, K., and Edwards, I., Temporal pattern discovery in longitudinal electronic patient records. Data Mining and Knowledge Discovery 20:1–27, 2010.CrossRefMathSciNet Norén, G., Hopstadius, J., Bate, A., Star, K., and Edwards, I., Temporal pattern discovery in longitudinal electronic patient records. Data Mining and Knowledge Discovery 20:1–27, 2010.CrossRefMathSciNet
23.
go back to reference Patton, R. M., Potok, T. E., Beckerman, B. G., and Treadwell, J. N., A genetic algorithm for learning significant phrase patterns in radiology reports. In: GECCO ’09: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference, pp. 2665–2670. New York, NY: ACM, 2009.CrossRef Patton, R. M., Potok, T. E., Beckerman, B. G., and Treadwell, J. N., A genetic algorithm for learning significant phrase patterns in radiology reports. In: GECCO ’09: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference, pp. 2665–2670. New York, NY: ACM, 2009.CrossRef
24.
go back to reference Porter, M. F., An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14(3):130–137, 1980.CrossRef Porter, M. F., An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14(3):130–137, 1980.CrossRef
25.
go back to reference Reed, J. W., Jiao, Y., Potok, T. E., Klump, B. A., Elmore, M. T., and Hurson, A. R., Tf-icf: A new term weighting scheme for clustering dynamic data streams. In: ICMLA ’06: Proceedings of the 5th International Conference on Machine Learning and Applications, pp. 258–263. Washington, DC: IEEE Computer Society, 2006. Reed, J. W., Jiao, Y., Potok, T. E., Klump, B. A., Elmore, M. T., and Hurson, A. R., Tf-icf: A new term weighting scheme for clustering dynamic data streams. In: ICMLA ’06: Proceedings of the 5th International Conference on Machine Learning and Applications, pp. 258–263. Washington, DC: IEEE Computer Society, 2006.
26.
go back to reference Roelofs, A., Karssemeijer, N., Wedekind, N., Beck, C., van Woudenberg, S., Snoeren, P., Hendriks, J., Rosselli del Turco, M., Bjurstam, N., Junkermann, H., et al., Importance of comparison of current and prior mammograms in breast cancer screening. Radiology 242(1):70, 2007.CrossRef Roelofs, A., Karssemeijer, N., Wedekind, N., Beck, C., van Woudenberg, S., Snoeren, P., Hendriks, J., Rosselli del Turco, M., Bjurstam, N., Junkermann, H., et al., Importance of comparison of current and prior mammograms in breast cancer screening. Radiology 242(1):70, 2007.CrossRef
27.
go back to reference Rokach, L., Romano, R., and Maimon, O., Negation recognition in medical narrative reports. Inf. Retr. 11(6):499–538, 2008.CrossRef Rokach, L., Romano, R., and Maimon, O., Negation recognition in medical narrative reports. Inf. Retr. 11(6):499–538, 2008.CrossRef
28.
go back to reference Sakoe, H., and Chiba, S., Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 26(1):43–49, 1978.CrossRefMATH Sakoe, H., and Chiba, S., Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 26(1):43–49, 1978.CrossRefMATH
29.
go back to reference Salton, G., and Buckley, C., Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5):513–523, 1988.CrossRef Salton, G., and Buckley, C., Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5):513–523, 1988.CrossRef
31.
go back to reference Studnicki, J., Fisher, J. W., and Eichelberger, C. N., NC- CATCH: North Carolina comprehensive assessment for tracking community health. [2], pp. 122–126. Studnicki, J., Fisher, J. W., and Eichelberger, C. N., NC- CATCH: North Carolina comprehensive assessment for tracking community health. [2], pp. 122–126.
32.
go back to reference Tang, J., Rangayyan, R., Xu, J., El Naqa, I., and Yang, Y., Computer-aided detection and diagnosis of breast cancer with mammography: Recent advances. IEEE Trans. Inf. Technol. Biomed. 13(2):236–251, 2009.CrossRef Tang, J., Rangayyan, R., Xu, J., El Naqa, I., and Yang, Y., Computer-aided detection and diagnosis of breast cancer with mammography: Recent advances. IEEE Trans. Inf. Technol. Biomed. 13(2):236–251, 2009.CrossRef
33.
go back to reference Timp, S., Varela, C., and Karssemeijer, N., Temporal change analysis for characterization of mass lesions in mammography. IEEE Trans. Med. Imag. 26(7):945–953, 2007.CrossRef Timp, S., Varela, C., and Karssemeijer, N., Temporal change analysis for characterization of mass lesions in mammography. IEEE Trans. Med. Imag. 26(7):945–953, 2007.CrossRef
34.
go back to reference Yi, B.-K., Jagadish, H. V., and Faloutsos, C., Efficient retrieval of similar time sequences under time warping. In: ICDE ’98: Proceedings of the Fourteenth International Conference on Data Engineering, pp. 201–208. Washington, DC: IEEE Computer Society, 1998. Yi, B.-K., Jagadish, H. V., and Faloutsos, C., Efficient retrieval of similar time sequences under time warping. In: ICDE ’98: Proceedings of the Fourteenth International Conference on Data Engineering, pp. 201–208. Washington, DC: IEEE Computer Society, 1998.
Metadata
Title
Characterizing Mammography Reports for Health Analytics
Authors
Carlos C. Rojas
Robert M. Patton
Barbara G. Beckerman
Publication date
01-10-2011
Publisher
Springer US
Published in
Journal of Medical Systems / Issue 5/2011
Print ISSN: 0148-5598
Electronic ISSN: 1573-689X
DOI
https://doi.org/10.1007/s10916-011-9685-2

Other articles of this Issue 5/2011

Journal of Medical Systems 5/2011 Go to the issue