It happens a thousand times a day. Every time someone partakes in medical care a record of the care is written. Taken together, there are a LOT of medical records that are written each day. Over time these records form an impressive and important collection of information -



For many reasons there is a wealth of information locked in those medical records. The information is important to the patient. It is important to the doctor. It is important to clinical care. It is important to research. It is important to insurers. And it is important to the community of medical providers, in general.

As valuable as these medical records are, unlocking the secrets found in these records is challenging. There awaits a series of land mines waiting for the person attempting to unlock those secrets.

But, unlocking the treasure chest is not easy. 

Any one human being simply cannot sit down and read these records and make sense of them. There are far too many records to digest. And to make sense of them requires a specialized background that only a few people have. Nevertheless, there is a wealth of information hiding in those records.

So, what are the obstacles to unlocking the secrets found in medical records? There are surprisingly a lot of formidable obstacles that must be overcome by the analyst looking for treasure.



The first challenge stems from the fact that there are many sources for these records. Medical records are generated by the many different facets of the providers of medical care -

  • Billing records: This type of record typically contains information about the services provided to patients along with cost breakdowns.
  • Laboratory reports: These types of records provide detailed information about the patient’s medical condition.
  • Radiology reports: This type of record typically contains images and findings from radiological exams, such as X-rays and CAT scans.
  • Surgical records: These often include notes on a surgeon’s activities during an operation, such as the time of start and end, details of medications used, and any anomalies observed.

Not only are a lot of records generated, but the terminology found in the records can vary widely as well. And then there is the problem of bringing the records together in a cohesive, intelligible manner, So the sheer number and diversity of the different types of organizations creating the medical records present a challenge. The challenge is to find ways to integrate all these different types of records into a single system that can be accessed by all stakeholders involved in a patient’s care. 



Another challenge is that medical records are often created by different vendors. Each vendor has its own language, format, and style. Blending together the records coming from different vendors of medical record technology is not an easy and automated thing to do.




As if there were not enough challenges, another major challenge is that of dealing with the volume of records. Take the number of times you have to blend records together and multiply that by a HUGE number and you have a difficult task. Even the simplest of tasks that have to be repeated millions of times present a challenge.

Unlocking the secrets in your medical records becomes a monumental challenge.

The volume of records, the diversity of sources, and the number of companies creating records create a big challenge.



But there is good news. All of these obstacles can be overcome, in a reasonable, economically feasible, time-sensitive manner. In a word, there is a path through this medical record jungle –

The story of finding the path through the jungle begins with the observation that – at the end of the day – all medical records are cast into the form of text. It doesn’t matter how old the records are, who the vendor that created the record is, or what medical discipline the patient has utilized, ultimately all medical records are in the form of text.



The path to analysis shows that there are many different sources for text. One of them is for medical records that are on paper and pencil. In order to read and analyze medical records that are on paper and pencil it is necessary to pass the paper through OCR (optical character recognition technology). Once the text has been passed through OCR, the result is an electronic record. Textual reads the electronic medical record and creates a database.

The other path to creating the analytical database record is simply for textual ETL to read the electronic record. Once the electronic record is read the database is created.

Note that textual ETL does not care what technology is used to create the electronic record. The record can come from Epic or not. Textual ETL only cares that the record is in the form of text.



Once the database is created, Forest Rim supplies the text analytic workbench. The text analytic workbench is used to –

  1. Select the text that is needed for analysis
  2. Analyze the text
  3. Store the results for future analytical activity.

The text analytic workbench operates at electronic speeds. It is normal for an analysis to be done in seconds.

This is the path through the forest. This is how you can start to do analytical processing of medical records in seconds.

The path through the forest is a simple two-step process -



Textual ETL is the technology that reads the raw text and edits, converts, and loads the text into a database. The text analytics workbench is the technology that reads the database and allows analysis to be done on the text.

Related Reading: Challenges of Textual Data and the Progression of Textual Analytics



The text then becomes the lowest common denominator among all medical records. That’s the good news. The fact that there even is a lowest common denominator is indeed very good news.

But just because there is a lowest common denominator – text – among the different medical records, does not mean that finding commonality and communication among the records is an easy or natural thing to do. In other words, you can’t just strip the medical record of its text and combine it with other medical records. The problem of combining the text of different medical records is in itself a challenge.

You can't just take the text found in medical records and throw them into the same pot.

If you do just combine a bunch of text together, you can end up with an indecipherable mess. 


So why exactly do you end up with a mess on your hands when you just randomly merge records together? In fact, there are a lot of reasons for this phenomenon -


Of the many reasons why you can just throw text together and expect to have useful results is that medical terminology is full of terms that have different names but mean the same thing. As a simple example of this consider the medication – Lasix. Lasix is also known as furosemide. And Lasix is known as lo aqua. These are all valid names for the same thing. If you are going to have a cogent analysis, you have to recognize that the same item is being discussed using different names.



Another mundane issue is that of common formatting of common variables. Take something as simple as the date. In one document date is formatted as 07/20/1945. In another document, date is seen as July 20, 1945. Now, these are both logically the same dates. But they have a very different physical presentation. When a person reads both documents he/she knows that they are on the same dates. But when a computer reads the same document, the computer must be told that these dates are the same. Then when you multiply this translation by 10,000,000 the task of equivocating like dates become a non trivial task. 





A similar issue is that of the names that we call something. In the case of furosemide and Lasix, the reference was made to a specific substance. The same issue arises when we speak of classifications of objects (called “metadata”). As a simple example, there are many kinds of drugs and there are many kinds of medications. But for the purposes of medical care, drugs and medication are the same thing.

In order to have a meaningful dialogue between different medical documents, this difference needs to be resolved.




Yet another important difference is between the simple formatting of variables. In one case blood pressure is measured as diastolic/systolic. In another case blood pressure is measured as systolic/diastolic. This is a simple condition to correct. But in order to correct it, the condition has to be recognized. Then it needs to be repeated 10,000,000 times. When you multiply even a simple condition by these numbers, something that is simple becomes something that is complex.





Yet another confusion in trying to make sense of many documents is that of recognizing the meaning of acronyms. In two medical documents, there appears the term “ha”. In one document "ha" refers to a heart attack. In another document, "ha" refers to headaches. And in yet another document "ha" refers to hepatitis A. If a proper interpretation is not made, the analysis will assume that a headache is a heart attack, and this surely leads nowhere productive.




Yet another mundane problem is that of the interpretation of misspellings and colloquialisms. For a variety of reasons, proper spelling and the use of common language fosters understanding. While such corrections are usually easy to accommodate, in the face of having to make edits and corrections 10,000,000 times, the edit is no longer trivial.




Yet another obstacle to a meaningful combination of text is the commonality of basic measurements. Suppose one medical report lists a person’s weight as 130 pounds. Another report lists the person’s weight at 59 kilograms. In order to do an incisive analysis, there need to be one common measurement of a person’s weight. (Logically the weights are the same. But physically they are not. In order to do proper analysis, the weights must be physically resolved.)



None of these issues are not resolvable. But the fact is that –

   All of them need to be resolved, and

   The resolution must occur over many, many documents 

means that lumping together the text from medical records is a non trivial process.




The good news is that there is a way to accomplish exactly what has been described in an efficient, cost-effective manner. The method is through a technology called textual ETL. Textual ETL reads medical records and turns those medical records into a standard database. In creating the database, terms and measurements are standardized into a common format and meaning.

Because textual ETL does things on a computer and in an automated manner, there is no limit to the number of records that can be processed. Textual ETL frees the doctor from having to manually read records.

And because the processing is done on a computer it is fast and inexpensive.

One way to think of textual ETL is to think of it as a means of reading a document and finding all the important data, removing the extraneous data, and placing the data in a database. Suppose you want to find language about a procedure. You don’t have to read the entire document. You let the computer read and organize the document. Now finding the text that you want is easy and efficient for even the largest document. And textual ETL removes the clutter that isn’t relevant to the nexus of the document.



Because textual ETL has built into it the ability to automatically edit and transform text, you can now bring together text from different disciplines. That is easy and natural in textual ETL.



Furthermore, textual ETL does not care who the provider of the text is. The vendor providing the medical records is irrelevant to the analytics that can be done against the records. The records can be very old records that are combined with very new records. It simply doesn’t matter to textual ETL. The only requirement for textual ETL is that the records be in the form of text.


But perhaps the biggest challenge in creating the data base for analytical processing is in doing the detailed recognition and editing in the creation of the data base. As textual ETL reads the raw text and created the data base, textual ETL can edit and transform the data so that when it arrives in the data base, it arrives in an integrated manner. This means that meaningful analysis can be done on the data base immediately. 



But perhaps the single most advantageous feature of textual ETL is that there is no limit on how many medical records that can be processed. There is no limit on the number of records that can be processed. Furthermore, the cost of reading and processing medical records in an automated fashion is not expensive.



The result is that now – for the first time, you can start to do analytical processing against medical records. What once was an expensive, laborious, error prone activity is now fast, inexpensive, and accurate.



That is why it is said that today you can start to do analytical processing against medical records as you have never been able to do it before.



Bill Inmon’s company Forest Rim Technology builds and services textual ETL and the Text Analytics Workbench. Textual ETL reads raw text and turns that raw text into a standard database. Forest Rim Technology is located in Denver, Colorado.

Bill’s latest books at TURNING TEXT INTO GOLD, Technics Publications and HEARING THE VOICE OF THE CUSTOMER, Technics Publications, and DATA ARCHITECTURE: SECOND EDITION, Elsevier.

Bill teaches a class on the Internet – PRACTICAL TEXT ANALYTICS. To find out more about the class, contact Bill at 

How Can Help is a cloud-based data integration platform that makes it easy to ingest, transform and analyze medical records from any source into your application in real-time. With, you can ingest patient records from multiple sources. also allows you to easily create automated workflows that can be used to transform, enrich and analyze the data in real-time – enabling better decision-making and insights into patient care.

This is beneficial because it allows you to process medical records more efficiently, quickly, and accurately. Additionally, as research advances in this field,’s advanced analytics capabilities can be used to find patterns and generate actionable insights that can improve care quality and outcomes for patients. helps healthcare providers streamline medical records processing, quickly generate actionable insights, and create custom analytics solutions. Try it today with a 14-day trial – improve care quality and outcomes for patients!