Skip to content

ehrQL

ehrQL (rhymes with circle), the Electronic Health Records Query Language, is a query language and command line interface that we designed for the specific application domain of EHR data. Researchers write dataset definitions with the query language and execute them with the command line interface to generate datasets with one row per patient.

ehrQL allows researchers to access data sources from primary and secondary care, as well as from organisations such as the Office for National Statistics (ONS).

Why did we create a new query language? You can find out about ehrQL's philosophy and design goals on the Bennett Institute blog.

ehrQL's documentation🔗

ehrQL's documentation has four main sections:

  1. The tutorial provides practical steps for studying ehrQL.

  2. The how-to guides provide practical steps for working with ehrQL in your project.

  3. The reference provides background knowledge for working with ehrQL in your project.

  4. The explanation provides background knowledge for studying ehrQL.

Conventions🔗

  • 💻 steps you should follow on the computer you are using for studying or working with ehrQL
  • 🗒 explanatory information
  • ⚠ important information
  • ❔ questions for you to think about

The four main sections are written to be read in order. You can navigate between them by:

  • pressing the N or . keys to go to the next page, and the P or , keys to go to the previous page;
  • using the Previous and Next links in the page footer;
  • referring to the navigation pane to the left of the page and clicking a link.

Asking for help🔗

If you need help with ehrQL or ehrQL's documentation, then please ask for help on the #ehrql-support Slack channel. (If you're unsure how to join, then please ask your co-pilot.)

A dataset definition🔗

The following dataset definition selects the date and the code of each patient's most recent asthma medication, for all patients born on or before 31 December 1999.

from ehrql import create_dataset
from ehrql.tables.beta.core import patients, medications

dataset = create_dataset()

dataset.define_population(patients.date_of_birth.is_on_or_before("1999-12-31"))

asthma_codes = ["39113311000001107", "39113611000001102"]
latest_asthma_med = (
    medications.where(medications.dmd_code.is_in(asthma_codes))
    .sort_by(medications.date)
    .last_for_patient()
)

dataset.asthma_med_date = latest_asthma_med.date
dataset.asthma_med_code = latest_asthma_med.dmd_code

When the dataset definition is executed with the command line interface, the command line interface generates a dataset with one row per patient. For example, it may generate the following dummy dataset:

patient_id asthma_med_date asthma_med_code
1 2023-05-14 39113611000001102
2 2023-05-26 39113611000001102
3 2018-07-23 39113311000001107
5 2004-09-25 39113611000001102
6 2007-04-25 39113611000001102
7 1949-10-18 39113311000001107
9 1966-05-15 39113311000001107
10 1966-03-14 39113611000001102