Seminar Presentations in 2012

September 28, Lucila Ohno-Machado, MD, PhD - Professor & Division Chief, Division of Biomedical Informatics, UCSD
Sharing Biomedical and Clinical Data to Support Personalized Medicine
iDASH (integrating data for analysis, anonymization, and sharing) is a National Center for Biomedical Computing funded by the National Institutes of Health. It focuses on algorithms and tools for sharing data in a privacy-preserving manner. Foundational privacy technology research performed within iDASH is coupled with innovative engineering for collaborative tool development and data-sharing capabilities in a private cloud that is compliant with USA regulations. iDASH’s goal is to provide biomedical and behavioral researchers with access to data, software, and a high-performance computing environment, thus enabling them to generate and test new hypotheses. I will present the work of R&D teams within the center, which focus primarily on predictive modeling for personalized medicine. I will also describe the cyberinfrastructure that is available for collaborators around the world.

October 5, Aziz Boxwala, MD, PhD, FACMI - Principal Informatics Consultant, Meliorix, Inc.
Clinical Decision Support Consortium
Clinical decision support (CDS) has been shown to increase quality and patient safety, and improve adherence to guidelines for prevention and treatment. However, current adoption of CDS is limited due to a variety of reasons. The Clinical Decision Support Consortium is a collaboration of academic and healthcare IT industry organizations. The goal of the CDSC is to assess, define, demonstrate, and evaluate best practices for knowledge management and CDS in healthcare information technology at scale – across multiple ambulatory care settings and EHR technology platforms. This presentation will describe

  1. The multilayered knowledge representation schemes created by CDSC including the ability to specify knowledge for different CDS artifact types (e.g., reminders and alerts, order sets, documentation templates);
  2. The collaborative knowledge management processes, portal, and other tools developed or used by the CDSC; and
  3. The CDS web services developed and provided by the CDSC which are integrated with EMR systems that are commercially-available and developed in-house, and are being used in patient-care settings.

October 12, Staal Vinterbo, PhD - Associate Professor, Division of Biomedical Informatics, UCSD
Avoiding Arbitrary Privacy
Breaches of individual privacy are unfortunately a common occurrence, mainly caused by loss or theft of storage equipment containing unencrypted personal information. A different and more subtle form of breach occurs when personal information can be inferred from data and information that has been deemed safe and consequently disclosed. Examples of such are the Netflix prize rating data set where ratings could be matched with IMDB data, the AOL search log data set from which a New York Times reporter identified Thelma Arnold, and the successful matching of individuals in complex mixtures of genotyping microarray data by Homer et al. Traditionally, this inferential type of breach is sought eliminated by requiring data to be anonymous, i.e., disallowing the association of data with a unique identifier. This is evident in the definitions of personal data in the EU data protection directive and protected health information in the US HIPAA Privacy Rule. It is trivial to be perfectly safe by not disclosing anything. However, real life does require the use of human subjects’ data.

In this talk, I will discuss quantitative analysis of privacy risk, and how traditional approaches based on anonymity and definition of privacy that is based on properties of data, so-called syntactic definitions, make this difficult. As an alternative I will present Differential Privacy which is a measure of privacy risk based on properties of the process by which information is extracted from data. Finally, I will present recent results of mine in this context.

October 19, Wendy Chapman, PhD - Associate Professor, Division of Biomedical Informatics, UCSD
Developing an Information Extraction and Visualization Toolkit
There are many barriers to developing NLP algorithms for clinical text and to applying NLP to clinical tasks. At UCSD, we are addressing some of the barriers through development of shared resources to assist developers in annotating text and evaluating NLP annotations. We are also developing shared resources to assist potential users of NLP in developing knowledge bases for particular clinical problems, in customizing NLP applications, and in visualizing the output of NLP annotations for clinical research and decision support. I will describe the Information Extraction and Visualization Toolkit (IE-Viz) we are developing to aid non-NLP experts in application of NLP to clinical tasks.

October 26, Brian Chapman, PhD - Associate Professor, Division of Biomedical Informatics, UCSD
Privacy Informatics Perspectives on Medical Imaging
Medical imaging is one of the intellectual highlights of modern medicine, transforming both the diagnosis and management of disease. However, this transformation has resulted in a variety of problems including radiation exposure to patients, information overload for physicians, and staggering costs to society. In this presentation I will review how informatics has become an essential part of the practice of modern medical imaging and sample some of the current research trends within imaging informatics.

November 2, Hyeon-Eui Kim, RN, PhD - Assistant Professor, Division of Biomedical Informatics, UCSD
PEACE: Pictogram Evaluation and Authoring Collaboration Environment

November 9, Robert El-Kareh, MD, MS, MPH - Assistant Professor, Division of Biomedical Informatics, UCSD
Clinical Informatics Research: Some of the Current Efforts at UCSD

November 16, Nathaniel Heintzman, PhD - Assistant Professor, Division of Biomedical Informatics, UCSD
The DMITRI Study: Informatics Approaches to Personalizing Diabetes Therapies
Diabetes affects hundreds of millions of people worldwide and is acknowledged to present and progress heterogeneously among diverse individuals, yet most clinical guidelines are very general and thus are of limited real-world use to most patients and caretakers. Toward gaining a more personalized understanding of living with this chronic disease, we launched the Diabetes Management Integrated Technology Research Initiative (DMITRI). The DMITRI pilot study collected large multidimensional datasets from a small cohort of physically active patients with diabetes outside of the traditional clinical setting, including data from many continuous body-worn sensors and medical devices, photographic nutrition journals, lifestyle and cognitive evaluations, medical histories, clinical lab results, and SNP genotypes. Integration and analyses of these diverse datasets is ongoing. Here, we introduce the idea of physical activity as a paradigm for disease heterogeneity, discuss new analyses and visualizations made possible with these data, identify opportunities for collaborations with diverse stakeholders in the R&D community, and share our perspectives on how such multidimensional information is critical for understanding human disease at the level of the individual and for achieving truly personalized health.

November 30, Jason Young, PhD - Assistant Professor, Division of Biomedical Informatics, UCSD
Clinical Informatics & Molecular Genetics for Understanding HIV Transmission

December 7, Brian Clay, MD - Associate Professor, Division of Biomedical Informatics, UCSD; Acting CMIO, UCSD Medical Center
Clinical-Decision Support Strategies in the Electronic Medical Record: Nudging Providers to Do the Right Thing

April 6, Michael G Kahn, MD, PhD - Associate Professor, Department of Pediatrics; Co-Director, Colorado Clinical and Translational Sciences Institute; Core Director, CCTSI Biomedical Informatics, University of Colorado Denver
Married 6-year olds and other Diseases of Data: developing a comprehensive data quality assessment framework

Large-scale comparative effectiveness research significantly expands the scope and diversity of relevant data sources that capture a widening array of interventions and outcomes. Large-scale observational studies now can access health and wellness data from electronic health records, personal health records, electronic diaries, disease-specific social media sites, and automated wearable electronic sensors that were not available 5-10 years ago. Analytic methods to extract new knowledge from these diverse observational data sources are an area of active research. Numerous publications highlight concerns about the impact of poor data quality on the validity of observational studies, quality indicators, and other secondary data uses. Yet current data quality assessment (DQA) methods are ad-hoc, labor-intensive, and not based on known best practices. Terms such as “data cleaning”, “data validation”, and “extraction-transformation-and-load (ETL)” do not have consistent meaning and do not refer to a common set of evaluation practices. In addition, data quality assessment findings are not disclosed in a transparent, easily understood format. In this talk, Dr. Kahn will present a comprehensive conceptual framework for evaluating data quality. The framework is linked to a set of data quality assessment methods. I will present future plans to link the framework and assessment methods to an existing data quality evaluation tool and to develop a set of data quality meta-data tags that can describe the results of data quality assessment in a machine-consumable manner. Although the current work is focused on data quality assessment for clinical data, it seems plausible that a modified framework with relevant evaluation methods and formalized data quality meta-data tags would be useful annotations for biological data sets.

April 13, Emory Fry, CAPT, MC, USN - Investigator, Biomedical Informatics, Naval Health Research Center, San Diego, CA; Adjunct Assistant Professor of Biomedical Informatics, Assistant Professor of Pediatrics, Uniformed Services University of Health Sciences, Bethesda, MD
Hybrid Architectures For Knowledge Management and Clinical Decision Support
Dr. Fry will discuss the Knowledge Management Repository Architecture (KMR-II), a second generation Clinical Decision Support (CDS) platform for healthcare environments. It provides a standards-based approach for managing the structure and semantics of data obtained from local and distributed storage repositories. This canonical “fact” model can then be reasoned over using the knowledge management, business intelligence and predictive analytic technologies required for advanced cognitive and workflow optimization.

April 20, Jeffrey Gertsch, MD - Assistant Professor and Director, Interventional Neurophysiology Service, Department of Neurosciences, University of California San Diego School of Medicine; Senior Scientist, Warfighter Performance Department, Naval Health Research Center
Neurophysiology and 21st Century Technological Advances: How are we going to deal with the coming data avalanche?
Understanding the human nervous system requires a unique scalar systems biological approach. For this reason, neuropathology will likely be best understood with complex rather than simple biomarkers. This concept can be applied profitably in the medical neurosciences, where the neurological examination provides sparse data in a host of patients rendered unresponsive by anesthesia, brain injury, delirium, sleep, and other conditions. The sickest subset of patients have the most to gain; the surgical and critical care neurophysiologist can employ an alternative diagnostic paradigm by gathering continuous, real-time, multimodal electrophysiological data to secure a clinical correlation and reduced morbidity and mortality. Dr. Gertsch works in the emerging practice of Interventional Neurophysiology, a practice intimately tied to the technological advances necessary to observe tiny, faltering signals in this vulnerable inpatient population.

April 27, Charles Kennedy, MD - Head of Accountable Care Solutions, Aetna
Next Generation Health IT: how Aetna is using clinical ontologies and semantic interoperability to transform care through ACOs
Dr. Kennedy is a member of the HIT Policy Committee that developed the meaningful use specifications. He is also CEO of Aetna’s Accountable Care Solutions division, a unit dedicated to building ACOs across the nation. Dr. Kennedy will discuss the challenges of current generation Health IT and contrast that with the next generation of Health IT. Concrete, real world examples of how quality and efficiency can be improved through specific technical innovations and advances will be discussed. Clinical ontologies, semantic interoperability, real time query capabilities and algorithm generation and advancement will also be featured.

May 4, Yuan Wu, PhD - Postdoctoral Fellow, Division of Biomedical Informatics, UCSD
Distributed Privacy-Preserving Predictive Models
To train a central predictive model in a distributed manner without sharing raw data from local clinical sites is a practical solution for the privacy preserving issue. Basically this approach is accomplished through exchanging some insensitive intermediary results between the center and all local sites based on the specific model training methods. The distributed binary and multinomial logistic regression will be discussed. In addition, some application for distributed logistic regression in clinical trial studies will also talked about.

May 11, Eric W. Triplett, PhD - Professor and Chair, Department of Microbiology and Cell Science, University of Florida
Defining the autoimmune microbiome for type 1 diabetes

The incidence of type 1 diabetes (T1D) in children increases each year in developed countries at a pace that cannot be explained by genetics. An international search for an environmental cause of insulin-dependent diabetes is underway. A role for bacteria in T1D was suggested by the observation that antibiotics and probiotics prevent diabetes in two rodent models. We showed that diabetes-resistant rats have a significantly higher population of probiotic-like bacteria in the gut than diabetes-prone rats. Data will be presented to illustrate a more complicated story in humans. Microbial diversity and function is significantly more diverse in healthy children compared to autoimmune children. Several lines of evidence suggests that the gut microbiome of autoimmune children is significantly less healthy than that of control children. An approach to develop a diagnostic will be described.

May 18, Ali Zifan, PhD - Postdoctoral Fellow, Division of Biomedical Informatics, UCSD
Towards Realistic Computational Models of the Cardiovascular and Pulmonary System
Computational models enable to accommodate a virtual environment in which naturally occurring bio-mechanical process can be simulated. We discuss some new applications of image processing methods in medical image data analysis and modeling. When it comes to imaging the coronary arteries, we show how a physician can obtain a 3D view of the coronary tree, prior to any angioplasty, using only two monoplane angiogram images. Furthermore, we talk about the use of Electrical Impedance Imaging (EIT), for the diagnosis of lung pathology. EIT is a non-invasive imaging technique, which maps the impedance distribution of a tissue layer or volume onto an image. EIT has the advantages of being a radiation-free imaging technique, inexpensive, and portable, producing real-time impedance images. The previous tools help gain an intuitive understanding of the underlying processes of the cardiovascular and pulmonary systems, and provide a tool for building new medical apparatus for the diagnosis of different pathologies.

May 25, Ernesto Ramirez - Center for Wireless and Population Health Systems, Calit2@UCSD, Quantified Self San Diego
The Coming Health Revolution: How Data is Changing the Landscape of Personal and Public Health
Over the last few years there has been a swelling movement of individuals using self-tracking tools and systems to better understand and improve their health. The self-tracking community, in particular the Quantified Self community, has blossomed into a worldwide movement of individuals discussing tools, methods and outcomes derived from self-tracking experimentation. The wealth of knowledge from the Quantified Self community provides us with a number of insights into not only how people use self-tracking, but also its possible future use cases for private and public health as sensor and intelligent systems progress. Underlying these public and private systems is a rapidly growing data layer that may provide a rapid shift in how we understand health.

June 1, Shuang Wang, PhD- Postdoctoral Fellow, Division of Biomedical Informatics, UCSD
Privacy-Preserving Secure Biometrics Systems
A conventional biometrics authentication system does not ensure the privacy of authorized users. Even though the biometrics feature of a user is typically encrypted, such an encrypted feature has to be decrypted before it can be used during authentication. The raw features of a user could potentially be inappropriately leaked to the public if the system is compromised during authentication. Comparing to conventional access keys, biometric identifiers are even more in need of protection since they are unique to individuals and cannot be replaced or modified. In this presentation, the design of privacy-preserving secure biometric system will be described, where the comparisons among biometrics are carried out directly on the encrypted features instead of the raw features. Therefore, the privacy risk of each individual can be minimized.

June 8, Adela Grando, PhD - Postdoctoral Fellow, Division of Biomedical Informatics, UCSD
Permission ontology for sharing and reusing data obtained from informed consents
How can we maximize the reusability of data obtained from informed consents while protecting human subjects? Given the proliferation of biospecimen research and the advances in the field of genome research, balancing the maximization of data utility and protecting human subjects is central to policy decisions. The purpose of this talk is to provide an introduction to on-going research within the iDASH (integrating Data for Analysis, anonymization, and SHaring) project to address these concerns. Dr. Grando will explain a web-based prototype developed as an alternative to the traditional paper-based management of the informed consent process. She will also introduce a permission ontology proposed for facilitating the reusability and sharing of anonymous bio-specimens and medical data, while providing compliance with the permissions given by the subject through consent.

January 13, William Yasnoff, MD, PhD - Adjunct Professor, Division of Health Science Informatics, Johns Hopkins University; Professor, Institute for Healthcare Informatics, University of Minnesota[ Managing Partner, NHII Advisors
Integrating Public Health and Clinical Care Using Electronic Health Information
In 2001, the first-ever consensus national agenda for progress in Public Health Informatics (PHI) and the National Committee on Vital and Health Statistics recommended the establishment of “an information architecture that includes a longitudinal, person-based, integrated data repository.” Such a health information infrastructure (HII) is critical to meeting public health preparedness, policy and outbreak control needs, and is essential to reducing errors, improving quality and increasing efficiency in health care. Dr. Yasnoff addresses the innovation and challenges of using health record banks to make HII possible.
Join us to learn more about:
• Public health informatics
• Effective health information infrastructure
• Health record banks

January 20, Michael Conway, PhD - Postdoctoral Fellow, Division of Biomedical Informatics, UC San Diego
Natural Language Processing for Disease Surveillance
Over the past fifteen years, the worldwide growth of the internet and the widespread utilization of Electronic Health Records have significantly changed the landscape of public health surveillance. However, much of this new data is in the form of unstructured “free text” and requires intensive Natural Language Processing (NLP) before it can be used in IT Systems. This talk describes two applications of NLP in public health surveillance. First, we will survey NLP-based global health surveillance techniques, with particular reference to the BioCaster system, a Japanese-based web-accessible global health monitoring system. Second, we describe NLP-based syndromic surveillance techniques, systems and resources that utilize chief complaints (short textual summaries of symptoms derived from Electronic Health Records), to provide early warning of disease outbreaks.

January 27, Ilkay Altintas, PhD - Deputy Coordinator for Research, Cyberinfrastructure Research, Education and Development (CI-RED); Director, Scientific Workflow Automation Technologies (SWAT) Laboratory; San Diego Supercomputer Center
On Workflow-Driven Science Using Scientific Workflows and Provenance
A scientific workflow is the process of combining data and processes into a configurable, structured set of steps that implement semi-automated computational solutions of a scientific problem. Scientific workflow systems promote scientific discovery by supporting the scientific workflow design and execution. Kepler ( is an open-source, cross-project collaboration to develop a scientific workflow system for multiple disciplines, providing a workflow environment for scientists. Kepler Scientific Workflow Environment supports the design, execution, and management of scientific and engineering workflows through dedicated capabilities including provenance management, run management and reporting tools, integration of distributed computation and data management technologies, ability to ingest local and remote scripts, and sensor management and data streaming interfaces. With its built-in instrumentation to capture provenance (execution history) of workflows and related data, Kepler workflows make it easier to track information about how products were derived, and enable understanding, reproduction, and verification of scientific results. This presentation will overview various scientific applications developed in Kepler and explain how different community cyberinfrastucture projects use Kepler scientific workflows and provenance as a part of their architecture.

February 3, Olivier Bodenreider, MD, PhD - Staff Scientist, Cognitive Science Branch, Lister Hill National Center for Biomedical Communications, US National Library of Medicine
NLM Resources for Mining Biomedical Text
Over the past two decades, the National Library of Medicine (NLM) has developed resources and tools for helping researchers mine clinical text. At the core of these resources is the Unified Medical Language System (UMLS). The UMLS Metathesaurus is a large terminology integration system and a source of vocabulary for identifying biomedical entities in text. The SPECIALIST lexicon provides linguistic information about biomedical phrases including inflection and derivation. The UMLS Semantic Network provides broad categories of biomedical entities and their interrelations. Based on these resources, NLM has developed tools and services to support various aspects of text mining including spelling correction, normalization, term recognition (MetaMap), indexing (Medical Text Indexer), relation extraction (SemRep) and visualization (Semantic Medline), as well as a specialized search engine (Essie). These resources will be briefly presented and remaining challenges will be discussed.

February 10, Curtis Langlotz, MD, PhD - Professor and Vice Chair for Informatics, Department of Radiology, University of Pennsylvania
Using Information Technology to Improve Radiology Practice
Because information technology is essential to the measurement of clinical and financial operations, it can also accelerate the improvement of radiology practices. At the University of Pennsylvania, we have developed a variety of information technologies that we and others now use routinely to improve quality and assist with operational needs. These technologies include radiology-tailored summaries of the electronic medical record (EMR), analytic tools derived from radiology report databases, tracking systems for diagnostic errors, radiation dose monitoring programs, and structured reporting methods. Our radiology practice has motivated the development of these systems and has served as a rich test bed to refine and evaluate their effectiveness. This talk will review the motivations, development methods, evaluation criteria, and clinical results of these systems in our clinical practice.

February 17, André Skupin, PhD - Associate Professor, Dept of Geography, San Diego State University
Visualizing the Topical Structure of the Medical Sciences: A Self-Organizing Map Approach
The self-organizing map (SOM) method has been used in document visualization for some time. However, fairly little is known about how to deal with truly large document collections in conjunction with a large number of SOM neurons. Post-training geometric and semiotic transformations of the SOM also tend to be limited. In addition, little is known about whether/how domain experts could actually make sense of this type of detailed topic mapping of a knowledge domain. This presentation will report on a study meant to address these open questions. A team of researchers at Indiana University and San Diego State University implemented a high-resolution visualization of the biomedical knowledge domain using the SOM method, based on a document corpus of over two million scientific publications. Our approach includes deployment of supercomputing hardware and parallelization, in order to deal with a SOM of very large size and dimensionality. The resulting two-dimensional model of the high-dimensional input space is then transformed into a large-format map by using geographic information system (GIS) techniques and cartographic design principles. This map is then annotated and evaluated by subject experts stemming from the biomedical and other domains. In addition to reporting on the results of this study, the presentation argues for the applicability of numerous geographic and cartographic notions, including reference systems, base maps, overlays, and the duality of discrete objects and continuous fields.

February 24, Gondy Leroy, PhD, Associate Professor, School of Information Systems and Technology, Claremont Graduate University
Toward the Next Generation Biomedical Search Engine
Searching the increasing number of publications available in digital libraries via search engines and on websites is necessary but insufficient to stay up-to-date and digest all new information. Better tools are needed that can address this increasing amount of information. While existing backend components seem able to catch up with use of new and optimal databases and algorithms, there have been only incremental improvements in how we search and view results. With today’s processing power, even on mobile devices, much more sophisticated search is possible, but it will require a new interface, backend components and user mental model. We are working on such a new interface and accompanying backend components. Prototypes, studies and results will be discussed of a diagram search interface and its accompanying predicate-based biomedical parser.

March 2, Yunan Chen, MD, PhD - Assistant Professor, Department of Informatics, UC Irvine
Working Record: Designing EMR System to Support Collaborative Health Practices
Recently there has been an increasing interest in the field of Human-Computer Interaction to study the design of Electronic Medical Record (EMR) systems. Drawn on the insights obtained from my recent ethnographic studies on the adoption, workflow, and workaround of EMR systems in various clinical settings, I will explore the concept of “working record” that emerged from my field observations. Differing from the goal of documenting precise, clear, and complete records in EMR systems, working record is the informal, transitional, and work-in-progress information that is essential for mediating and supporting actual clinical practices. In this talk, I will conceptualize the notion of working record and exemplify the artifacts, strategies, and practices clinicians engaged in to create their own working record. This work calls for the design of EMR to not only support record keeping practices in healthcare, but also the clinical practices through which the records are written, read, and used.

March 9, Anupam Goel, MD - Co-Principal Investigator at San Diego Beacon Collaborative, UC San Diego
Trials and Tribulations of Starting a Regional Health Information Exchange
The San Diego Beacon Health Information Exchange is a new regional health information exchange in San Diego. Given the different health care players in San Diego, the health information exchange faces challenges around consent, patient identification and protocols for sharing data. These challenges (and a few more) are addressed during this presentation.

March 16, Ricky Taira, PhD - Associate Professor, Department of Radiological Sciences, UCLA Medical Imaging Informatics Laboratory
Structuring of Clinical Trials Results for Evidence-based Medicine
The randomized controlled trail (RCT) is the most well accepted means of ascertaining compelling evidence relating cause and effect relationships in medical science. The primary means for reporting the results of randomized clinical trials is through the scientific literature. The representation is mostly in scientific prose and some analytical expressions. In this talk, we concentrate specifically on the problem of structuring and interpreting the numbers reported in RCT papers. Numerical information summarizing study population, disease prevalence, therapeutic doses, outcomes observations, and statistical tests are scattered throughout RCT papers. The goal of this research is to provide a structured framework to connect these numerical information items in order to: 1) improve the understandability of the research paper; 2) assess the quality of the research paper; and 3) contextualize probabilities. The work is relevant to researchers involved in probabilistic disease modeling and meta-analysis.