How a computational biologist tracks COVID-19 pt 1 of?

Hi all,

I’m Will, I’m a professor in Philadelphia and I study the HIV epidemic. I’ve been a HA user for a while, you might know my previous post on How Bayes Sensors work, from a Statistics Professor (with working Google Sheets!). I wanted to give the community here my perspective on tracking COVID-19 using Home Assistant at the local level.

In this post I will walk through creating an interactive tool using base HA components that will let you understand the local risk of COVID-19 that you have in your immediate area. I suggest reading in full as you’ll have to make a lot of non-automatable decisions and go on some sleuthing expeditions. Following along with the logic will help you adapt it to your specific situation.

I know this is an international audience, so this will apply differently to everyone. I am in Philadelphia, PA, USA with 76 confirmed cases and zero deaths so far. So, we’re very early. But, I’m going to try to tailor the HA code I’m sharing to a wide range of individual situations. Also, I mostly do my HA development through the templating engine, if someone wants to convert this stuff to an integration, I’m happy to collaborate, I’m just not good with the HA backend.

Now, our goals: 1) Estimate the true number of cases in our area given the number of deaths and 2) quantify the minimum number of individuals that could gather WITHOUT ANYONE being infected. This last number will be useful as a metric for whether it is safe to go grocery shopping or you shouldn’t even leave your house. The answer to the first, will let us calculate the second.

I’m drawing this methodology from this Medium article. Coronavirus: Why you must act now. In it he provides a handful of Google Sheets to help you estimate the number of TRUE CASES in your area from the number of fatalities. He talks about a method for using confirmed cases, but here in the US testing has been so terrible as to make it virtually useless. I believe I can at least trust the government to count the dead.

But Home Assistant has an integration to track COVID I hear you say. For some people, that may be enough, but here in the US, the country is vast, and the death count across the country means very little to my individual situation. Build an integration to scrape local data I hear you say. Good luck, every state, county, and city is putting together their own websites (none of which allow downloads or provide machine-readable formats) and those websites are changing structure every 12 hours as they figure out their response. You’d spend all your time fighting that. Instead, we’ll just use our eyes and a few input_numbers.

Before we get started programming, think about your “population”. This should include all people within 3-4 interactions of your daily life. Who are the people that interact with the people that interact with the people that interact with you. I live in the city, so, I would assume that would be everyone in Philadelphia County or its bordering 7 counties across PA and NJ. Think about your situation and come up with a reasonable guess. Do some internet sleuthing and find some state/local government websites that will break out data into the counties. Bookmark those for later. I put mine in a markdown lovelace card.

content: >
  # Links for quick Info


  [PA Health Department
  Info](https://www.health.pa.gov/topics/disease/Pages/Coronavirus.aspx)


  [Philly Health Department
  Info](https://www.phila.gov/services/mental-physical-health/environmental-health-hazards/covid-19/whats-new/)


  [Camden Health
  Department](https://www.camdencounty.com/service/health-human-services/covid-19-updates/) 
type: markdown

While you’re Googling, Find the population sizes of all of those counties combined. Mine comes to 5,370,926. Save that number, we’ll need it later.

Now for some biostats. First, in words:

As epidemiologists track deadly outbreaks they focus on only a few numbers. The number of deaths, time between infection and death, the doubling time of the infection, and the percentage of infections that lead to death (the case fatality rate, CFR). The logic being, if we know the number of deaths, and the percentage of infections that lead to deaths, we can calculate the total number of infections. But remember, there’s a time-lag between infection and death, so the deaths we see now, are due to infections in the past. We can use the doubling-rate of the infections to then estimate the number of TRUE infections right now and project that into the future. We can also use the CFR to calculate the number of deaths in the future to check our predictions and adjust parameters.

I’m taking the constants from either World Meter (for CFR) or the Medium article referenced above.

In Jinja template:

{% set deaths = 1 %}
{% set lag = 17.3 %}
{% set doubling_time = 6.18 %}
{% set CFR = 0.04 %}
{% set doubings_during_lag = lag/doubling_time %}
{% set true_cases_lagged = deaths/CFR %}

{{ doubling_time }} days ago there were ~{{true_cases_lagged | round}} true cases.

{% set true_cases_now = true_cases_lagged*(2**doubings_during_lag) %}

During that time there were {{doubings_during_lag | round(1) }} doublings.
This means there are {{ true_cases_now | round }} cases now.

{% set true_tom = (true_cases_now * (2**(1/doubling_time))) | round %}
{% set true_two = (true_cases_now * (2**(2/doubling_time))) | round %}
{% set true_week = (true_cases_now * (2**(7/doubling_time))) | round %}

Tomorrow there will be {{true_tom }} with {{ true_tom*CFR | round}} deaths.
In 2 days there will be {{ true_two }} with {{ true_two*CFR | round}} deaths.
In a week there will be {{ true_week }} with {{ true_week*CFR | round}} deaths.

You can use the template editor to play around with the model and get a feel for it. Then you can grab out the sensors that are relevant to you. I made these:

- platform: template
  sensors:
    current_covid_cases:
      friendly_name: Current COVID Cases
      unit_of_measurement: 'people'
      value_template: >- 
        {% set deaths = states.input_number.greater_pa_covid_deaths.state | float %}
        {% set lag = 17.3 %}
        {% set doubling_time = 6.18 %}
        {% set CFR = 0.04 %}
        {% set doubings_during_lag = lag/doubling_time %}
        {% set true_cases_lagged = deaths/CFR %}
        {% if deaths > 0 %}
        {{ (true_cases_lagged*(2**doubings_during_lag)) | round }}
        {% else %}
        {{ ((5/CFR)* 2**(lag/doubling_time)) | round }}
        {% endif %}


    tom_covid_cases:
      friendly_name: Tomorrow's COVID Cases
      unit_of_measurement: 'people'
      value_template: >- 
        {% set doubling_time = 6.18 %}
        {{ ((states.sensor.current_covid_cases.state|float) * (2**(1/doubling_time))) | round }}
    
    tom_covid_deaths:
      friendly_name: Tomorrow's COVID Deaths
      unit_of_measurement: 'people'
      value_template: >- 
          {% set CFR = 0.04 %}
          {{ ((states.sensor.tom_covid_cases.state | float) * CFR) | round }}
        
    week_covid_cases:
      friendly_name: COVID Cases in 7 days
      unit_of_measurement: 'people'
      value_template: >- 
        {% set doubling_time = 6.18 %}
        {{ ((states.sensor.current_covid_cases.state|float) * (2**(7/doubling_time))) | round }}
    week_covid_deaths:
      friendly_name: COVID Deaths in 7 days
      unit_of_measurement: 'people'
      value_template: >- 
        {% set CFR = 0.04 %}
        {{ ((states.sensor.week_covid_cases.state | float) * CFR) | round }}

You’ll notice on the current_covid_cases template I used an if tag to deal with situations where the number of deaths is 0. With how poor testing is in the US, you should probably hedge and assume the values associated with 5 deaths until you have concrete data.

Cool, now you can use your input_number to experiment and see how this changes things. But, that’s a perfectly useful tool for scaring the shit out of you, but it doesn’t give you any actionable information. We’ll use this information now to calculate a maximum safe group size.

First, in words. If we know (or think we know) the number of true cases in a population of known size we can calculate the likelihood of any individual having been infected. With that we can calculate the likelihood that any group of X individuals has ZERO infected people. Since we can’t enforce that risk to be exactly 0% (that only happens in group sizes of 0 people) we have to tolerate some level of risk. I’m willing to have a 1% risk (you can tailor accordingly). Using the magic of logarithms we can then calculate the maximum safe size. This could be used to modulate your behaviour.

In template

{% set current_cases = states.sensor.current_covid_cases.state | float %}
{% set tom_cases = states.sensor.tom_covid_cases.state | float %}
{% set week_cases = states.sensor.week_covid_cases.state | float %}
{% set population = 5370926 | float %}
{% set risk = 0.01 %}
{% set current_healthy_rate = (1-(current_cases/population)) %}

According to the data there are {{ current_cases }} in your area with a population of {{ population }}.

This implies that {{ (current_healthy_rate*100) | round(3) }} of people are healthy.

In a group of 5 people at a dinner party there is a {{ (100*(1-current_healthy_rate**5)) | round(5) }}% chance that there is 1 or more infected people.
In a group of 50 people at a small store there is a {{ (100*(1-current_healthy_rate**50)) | round(5) }}% chance that there is 1 or more infected people.
In a group of 500 people at a large store there is a {{ (100*(1-current_healthy_rate**500)) | round(5) }}% chance that there is 1 or more infected people.
In an event of 5000 people there is a {{ (100*(1-current_healthy_rate**5000)) | round(5) }}% chance that there is 1 or more infected people.
In a large event of 50000 people there is a {{ (100*(1-current_healthy_rate**50000)) | round(5) }}% chance that there is 1 or more infected people.

{% set current_max_size = (1-risk) | log(current_healthy_rate) %}
A crowd size of {{ current_max_size | round }} is the largest crowd where the risk of 1 or more infected people is below {{100*risk}}%.

{% set week_healthy_rate = (1-(week_cases/population)) %}
In a week there will be {{ week_cases }} in your area meaning {{ (week_healthy_rate*100) | round(3) }} of people are healthy.

In a group of 5 people at a dinner party there is a {{ (100*(1-week_healthy_rate**5)) | round(5) }}% chance that there is 1 or more infected people
In a group of 50 people at a small store there is a {{ (100*(1-week_healthy_rate**50)) | round(5) }}% chance that there is 1 or more infected people
In a group of 500 people at a large store there is a {{ (100*(1-week_healthy_rate**500)) | round(5) }}% chance that there is 1 or more infected people
In an event of 5000 people there is a {{ (100*(1-week_healthy_rate**5000)) | round(5) }}% chance that there is 1 or more infected people
In a large event of 50000 people there is a {{ (100*(1-week_healthy_rate**50000)) | round(5) }}% chance that there is 1 or more infected people

{% set week_max_size = (1-risk) | log(week_healthy_rate) %}
A crowd size of {{ week_max_size | round }} is the largest crowd where the risk of 1 or more infected people is below {{100*risk}}%.

And then I pulled out these sensors.

- platform: template
  sensors:
    today_max_size:
      friendly_name: "Today Max Safe Group Size"
      unit_of_measurement: 'people'
      value_template: >-
        {% set cases = states.sensor.current_covid_cases.state | float %}
        {% set population = 5370926 | float %}
        {% set risk = 0.01 %}
        {% set healthy_rate = (1-(cases/population)) %}
        {% set max_size = (1-risk) | log(healthy_rate) %}
        {{ max_size | round }}
    tomorrow_max_size:
      friendly_name: "Tomorrow's Max Safe Group Size"
      unit_of_measurement: 'people'
      value_template: >-
        {% set cases = states.sensor.tom_covid_cases.state | float %}
        {% set population = 5370926 | float %}
        {% set risk = 0.01 %}
        {% set healthy_rate = (1-(cases/population)) %}
        {% set max_size = (1-risk) | log(healthy_rate) %}
        {{ max_size | round }}
    week_max_size:
      friendly_name: "Next Week's Max Safe Group Size"
      unit_of_measurement: 'people'
      value_template: >-
        {% set cases = states.sensor.week_covid_cases.state | float %}
        {% set population = 5370926 | float %}
        {% set risk = 0.01 %}
        {% set healthy_rate = (1-(cases/population)) %}
        {% set max_size = (1-risk) | log(healthy_rate) %}
        {{ max_size | round }}

For me the relevant numbers are the max group size, but if you’re in an office deciding about closing (you should. Please READ the Medium article), or a business weighing options, you could make a sensor for your employee size and decide when that likelihood gets too high. You can also just use the input_number to get a sense of what life may be like for the next few weeks. If you “walk along” and repeatedly put the “tomorrow” number in the “today” input_number you can see how exponential growth really makes this disease dangerous.

I just put them into an entities card.

entities:
  - entity: input_number.greater_pa_covid_deaths
  - entity: sensor.current_covid_cases
  - entity: sensor.tom_covid_cases
  - entity: sensor.tom_covid_deaths
  - entity: sensor.week_covid_cases
  - entity: sensor.week_covid_deaths
  - entity: sensor.today_max_size
  - entity: sensor.tomorrow_max_size
  - entity: sensor.week_max_size
title: Greater PA True Cases
type: entities


I’ll make some plots and use the “deaths tomorrow” and “deaths next week” to assess how well we’re bending the curve. This model, by its very nature assumes a “well mixed population”. If we plot the number of deaths predicted in the future along with the actual number of deaths, we can see how well the social distancing measures are helping. I’ll make a follow-up post in a week or so with plots for my area. Follow along and do it yourself if you’d like.

A note of caution. LISTEN TO YOUR LOCAL AUTHORITIES. If they say to stay home. STAY HOME! There is data available to them and people with much better models that know way more information. This is a very basic model and will only give you an idea of what’s happening and SHOULD NOT be considered an excuse to hold your 1000 people concert or keep your 50 person office open. STAY SMART. STAY SAFE. STAY HOME!

8 Likes

How does this work if the number of deaths in your county is actually 0. And as of now the number of confirmed cases is 0 as reported in my county as well.

I’m in Indiana and we have a population of 6.7 million and as of today in the entire state there are only 30 cases and only 2 reported deaths. None are in my county or in any county close to mine.

there is only one case reportedly confirmed in a single neighboring county. Then the next layer is one each in two counties removed from mine (IOW, one county between mine and the county with the confirmed cases) with one in each county.

How would you handle estimating the risk with those low of numbers? Obviously if I plugged in my values then my risk should be 0 (or almost 0). Is that realistic as of right now based on that data?

@finity, that’s the same situation I’m in now. The Medium article links to a Google Sheet that you can update using what he calls the “cases method”. It tries to guess what day of a “model outbreak” you are in, and then project forward. However, it relies on some programming utilities like find and matrix math that are just outside what the Jinja template engine can really do. Also, due to our poor testing infrastructure the number of cases is VASTLY undercounted relative to the countries he uses in his model. So, its predictions may not match reality.

I’ve been filling in a 5 death count until I have a true observation. That seems like a reasonable precaution in my pretty populated area. If you’re living in a more rural area, maybe 1 is the more realistic guess. In the US here we’re in this liminal space where we don’t know if the shit has already hit the fan (ie. there are 100K cases that are less than 2 weeks old so haven’t shown up as deaths yet) or we’ve turned off the fan pre shit-hit by strict social distancing. We just don’t know yet.

Unfortunately, all we can do is wait.

1 Like

Yeah, I kind of figured as much.

And that sucks too. I’m an “information is power” kind of person.

I hate to not know because that can result in two different polar opposite reactions (over-reaction and under-reaction). But those two opposite reactions end up with potentially equally horrible outcomes - the economy gets destroyed for a relatively “minor” thing (leading to more poverty and ultimately more deaths due to all that entails) or we have lots of unnecessary deaths from a virus that could have been stopped by additional precautions which might seem “unreasonable” in light of current information.

we need information to navigate this thing and find the “middle road” in our response.

Maybe then buying a 1000 rolls of toilet paper might not be the only thing people think they can do to save themselves… :roll_eyes:

I don’t know what it is about TP that everyone went nuts over. I always have a 1-2 month supply between what’s in the closet and under the sink. Are people living roll to roll?

I always have a mindset of “information is nothing without context”. I built this to give myself context about the numbers coming out over the news. I’m hoping it will help other people do the same. You can also use it in a “simulation mode” and just get an idea of what number of deaths you should “pay attention” to for your situation.

We can also use it to understand how well our social distancing policies are working … but keep in mind, everything will have a 2 week lag. Across the US we’ve been doing this for barely a week (some places haven’t even started yet), we won’t know for a bit. Another reason to consider this hyper locally.

1 Like

I’ve been playing with your templates above and something wasn’t looking right to me.

I think in the first template in line 5 you might have the numerator and denominator backwards.

this:

{% set doubings_during_lag = lag/doubling_time %}

I think should be:

{% set doubings_during_lag = doubling_time/lag %}

Or it could possibly (probably…) be that you meant to use {{lag}} in line 7 and this:

{{ doubling_time }} days ago there were ~{{true_cases_lagged | round}} true cases.

should be:

{{ lag }} days ago there were ~{{true_cases_lagged | round}} true cases.

which then makes line 5 correct.

But even then if your calculations are correct the estimated deaths in the future based on the input data doesn’t hold true.

according to the above calculations with 3 current reported deaths and even with a much lower CFR of 1.5 there should be an additional 23 deaths tomorrow based on a doubling rate and lag as you have them in your templates.

But I did this yesterday and there have been no additional deaths at all. So something in the assumptions (doubling or lag) can’t be correct.

I understand this is only an estimate but I’m not sure this is a good tool yet even as an estimate.

I’ll be glad to help fix it since I’m obviously interested in knowing this kind of thing but I’m not smart enough to know what the best assumptions are or how to figure those out. :slightly_smiling_face:

I’ll try to dig in a bit and see if I can figure out where it’s gone wrong.

It should definitely be this way.

{% set doubings_during_lag = lag/doubling_time %}

What we’re doing here is saying: The deaths we see today are really infections from ~17 days ago (the average lag time between infection and death). The number of infections double every 6.18 days. So, with the deaths we see now, the infections from the past have doubled a little under 3 times.

You are right, it should be

Welcome to the wild world of modeling stochastic events. The smaller the sample size, the more likely there will be spurious results. All of these “constants” are different from place to place, change through time, are altered by our own actions, and our samplings are subject to noise. I would take any predictive result with a hefty grain of salt in these times. Especially ones simple enough to fit into a jinja template.

1 Like

I’m not sure I understand how this is working either.
I am using figures for the UK and I get zero projected deaths.
I must be doing something wrong but I can’t see what.
Or is my population size too big with such a relatively small number of deaths?

In case it is something obvious here’s my config.
(But honestly if you have better things to do this is isn’t super important!)

      {% set current_cases = states('sensor.current_covid_cases') | float %}
      {% set deaths =        states('input_number.covid_deaths') | float %}
      {% set tom_cases =     states('sensor.tom_covid_cases.state') | float %}
      {% set week_cases =    states('sensor.week_covid_cases') | float %}
      {% set population =    states('input_number.covid_total_population') | float %}
      {% set risk =          states('input_number.covid_tolerable_risk') | float / 100 %}
      {% set doubling_time = states('input_number.covid_doubling_time') | float %}
      {% set lag =           states('input_number.covid_lag') | float %}
      {% set CFR =           states('input_number.covid_cfr') | float / 100 %}

      {% set current_healthy_rate = (1-(current_cases/population)) %}
      {% set doubings_during_lag =  lag / doubling_time %}
      {% set true_cases_lagged =    deaths / CFR %}
      {% set true_cases_now =       true_cases_lagged * (2**doubings_during_lag) %}
      {% set true_tom =             (true_cases_now * (2**(1/doubling_time))) | round %}
      {% set true_two =             (true_cases_now * (2**(2/doubling_time))) | round %}
      {% set true_week =            (true_cases_now * (2**(7/doubling_time))) | round %}

      {{ lag }} days ago (avg time between infection and death) there were ~{{true_cases_lagged | round }} true cases.


      During that time there were {{doubings_during_lag | round(1) }} doublings.


      This means there are {{ true_cases_now | round }} cases now.


      Tomorrow there will be {{true_tom }} with {{ true_tom*CFR | round }} deaths.


      In 2 days there will be {{ true_two }} with {{ true_two*CFR | round }} deaths.


      In a week there will be {{ true_week }} with {{ true_week*CFR | round }} deaths.


      ### TODAY

      According to the data there are {{ current_cases | int }} in your area (pop. {{ population | int }}).


      This implies that {{ (current_healthy_rate*100) | round(2) }}% of people are healthy.


      The chance that there is 1 or more infected people:

      In a group of 5 at a dinner party is {{ (100*(1-current_healthy_rate**5)) | round(2) }}%

      In a group of 50 at a small store is {{ (100*(1-current_healthy_rate**50)) | round(2) }}%

      In a group of 500 at a large store is {{ (100*(1-current_healthy_rate**500)) | round(2) }}%

      In an event of 5000 there is {{ (100*(1-current_healthy_rate**5000)) | round(2) }}%


      {% set current_max_size = (1-risk) | log(current_healthy_rate) %}
      A crowd size of {{ current_max_size | round }} is the largest crowd where the risk of 1 or more infected people is below {{100*risk}}%.

      {% set week_healthy_rate = (1-(week_cases/population)) %}

      ### IN A WEEK

      In a week there will be {{ week_cases | int }} cases in your area.


      This implies {{ (week_healthy_rate*100) | round(2) }}% of people are healthy.


      The chance that there is 1 or more infected people:

      In a group of 5 at a dinner party is {{ (100*(1-week_healthy_rate**5)) | round(2) }}%

      In a group of 50 at a small store is {{ (100*(1-week_healthy_rate**50)) | round(2) }}%

      In a group of 500 at a large store is {{ (100*(1-week_healthy_rate**500)) | round(2) }}%

      In an event of 5000 is {{ (100*(1-week_healthy_rate**5000)) | round(2) }}%


      {% set week_max_size = (1-risk) | log(week_healthy_rate) %}
      A crowd size of {{ week_max_size | round }} is the largest crowd where the risk of 1 or more infected people is below {{100*risk}}%.

Yours comes from a quirk of the template language.

In jinja the | takes precedence over math operations. So, its rendering as:

{{ true_tom*(CFR | round) }}

since CFR | round evaluates to 0, then the true_tom gets multiplied by 0.
It should be:

{{ (true_tom*CFR) | round }}

That one gets me every time.

Thanks!

I probably should have at least thought about trying that myself.

It works now but gives a frightening jump from 233 to 1814 deaths in 24 hours!
But as you said if modelling this was as easy as a bit of jinja I doubt we’d have such a problem!

Playing around with stuff after looking at @finity’s thoughts.

I made the “deaths” prediction as the CFR times the true_cases_now. But, really, I forgot about the lag. The cases now, lead to deaths in the future … so the prediction was for 18 days in the future, that’s why it jumped so ridiculously. The deaths we see tomorrow, are actually from cases lag-1 days ago.

{% set true_tom_lag = (true_cases_lagged * (2**(1/doubling_time))) | round %}
{% set true_two_lag = (true_cases_lagged * (2**(2/doubling_time))) | round %}
{% set true_week_lag = (true_cases_lagged * (2**(7/doubling_time))) | round %}


Tomorrow there will be {{true_tom }} with {{ (true_tom_lag*CFR) | round}} deaths.
In 2 days there will be {{ true_two }} with {{ (true_two_lag*CFR) | round}} deaths.
In a week there will be {{ true_week }} with {{ (true_week_lag*CFR) | round}} deaths.

I updated the sensors like so:

- platform: template
  sensors:
    current_covid_cases:
      friendly_name: Current COVID Cases
      unit_of_measurement: 'people'
      value_template: >- 
        {% set deaths = states.input_number.greater_pa_covid_deaths.state | float %}
        {% set lag = 17.3 %}
        {% set doubling_time = 6.18 %}
        {% set CFR = 0.04 %}
        {% set doubings_during_lag = lag/doubling_time %}
        {% set true_cases_lagged = deaths/CFR %}
        {% if deaths > 0 %}
        {{ (true_cases_lagged*(2**doubings_during_lag)) | round }}
        {% else %}
        {{ ((5/CFR)* 2**(lag/doubling_time)) | round }}
        {% endif %}


    tom_covid_cases:
      friendly_name: Tomorrow's COVID Cases
      unit_of_measurement: 'people'
      value_template: >- 
        {% set doubling_time = 6.18 %}
        {{ ((states.sensor.current_covid_cases.state|float) * (2**(1/doubling_time))) | round }}
    
    tom_covid_deaths:
      friendly_name: Tomorrow's COVID Deaths
      unit_of_measurement: 'people'
      value_template: >- 
          {% set deaths = states.input_number.greater_pa_covid_deaths.state | float %}
          {% set lag = 17.3 %}
          {% set doubling_time = 6.18 %}
          {% set CFR = 0.04 %}
          {% set doubings_during_lag = lag/doubling_time %}
          {% set true_cases_lagged = deaths/CFR %}
          {% set true_tom_lag = (true_cases_lagged * (2**(1/doubling_time))) | round %}
          {{ (true_tom_lag*CFR) | round}}
        
    week_covid_cases:
      friendly_name: COVID Cases in 7 days
      unit_of_measurement: 'people'
      value_template: >- 
        {% set doubling_time = 6.18 %}
        {{ ((states.sensor.current_covid_cases.state|float) * (2**(7/doubling_time))) | round }}
    week_covid_deaths:
      friendly_name: COVID Deaths in 7 days
      unit_of_measurement: 'people'
      value_template: >- 
          {% set deaths = states.input_number.greater_pa_covid_deaths.state | float %}
          {% set lag = 17.3 %}
          {% set doubling_time = 6.18 %}
          {% set CFR = 0.04 %}
          {% set doubings_during_lag = lag/doubling_time %}
          {% set true_cases_lagged = deaths/CFR %}
          {% set true_tom_lag = (true_cases_lagged * (2**(7/doubling_time))) | round %}
          {{ (true_tom_lag*CFR) | round}}

Now that I have a week’s worth of data, I’ll post some visualizations soon.

You’re, of course, absolutely correct. Good catch!

I kept looking at it myself to see where it was failing and I didn’t see it either.

I thought it was that the doubling rate was wrong based on some further research in the US. It unfortunately seems to be closer to 2 - 3 days rather than 6 but that would, and did!, make the numbers even worse.

I haven’t found any additional information on the lag time tho.

Thanks again for this.

Well…
Yesterday it predicted we’d have 334 deaths today and we actually have 335…

Any updates. looking forward to the visuals.

Tracking Arizona here.

Part 2: The visuals

Now that I’ve collected a week’s worth of data in most places I think I’ve worked out a good dashboard for myself. I’m tracking the outbreak across two areas of the state because my wife works as a doctor in a different region. And I’m tracking the PA state as a whole to give myself some comparisons. Again, most of this code won’t be copy-pastable to your situation but it might give you ideas for your own dashboard.

I’ve tried to make this useful as both a big-screen data heavy dashboard (that I use) and a phone display that my wife occasionally glances at.

Across the top I’ve pulled out the most relevant information in badges. The number of cases, what percent of tests are positive, what percent of people who are sick that die (case fatality rate), the number of deaths, and the rate that those deaths are doubling. In a week or so I’ll add info on how that doubling rate is changing (is this thing getting better or worse).

The cards:

The upper map is an iframe card that references a Google map that is updated by the city. Useful overall picture.

f2 map

aspect_ratio: 50%
type: iframe
url: 'https://www.google.com/maps/d/embed?z=6&mid=1lpLPPrltIuVkwFtXhoWdWEA4B1DvNmne'

The next is county by county level information. I’ve used the multiple-entity-row and foldable-entity-row to have both high-level information and allow me access to the underlying input_numbers. This is important because I’m still hand entering data every day at 1PM when our state releases them. There still isn’t a machine-readable version. Because the city uses an aspx dynamically created website I can’t use the scrape sensor or my own python script.

f3 cases closed

and with one open.

entities:
  - entities:
      - entity: input_number.berks_county_covid_cases
        icon: 'mdi:counter'
        secondary_info: last-changed
      - entity: input_number.bucks_county_covid_cases
        icon: 'mdi:counter'
        secondary_info: last-changed
     ....
    head:
      label: Cases
      type: section
    type: 'custom:fold-entity-row'
  - entities:
      - entity: sensor.philadelphia_area_covid_cases
        name: Current
    entity: sensor.philly_area_cases_doubling_time
    icon: 'mdi:counter'
    name: Philadelphia Area
    secondary_info: last-changed
    state_header: Doubling Every
    type: 'custom:multiple-entity-row'
   ....
  - entities:
      - entity: input_number.berks_county_covid_deaths
        icon: 'mdi:counter'
        secondary_info: last-changed
    ....
    head:
      label: Deaths
      type: section
    type: 'custom:fold-entity-row'
  - entities:
      - entity: sensor.philadelphia_area_covid_deaths
        name: Current
    entity: sensor.philly_area_death_doubling_time_filter
    icon: 'mdi:counter'
    name: Philadelphia Area
    secondary_info: last-changed
    state_header: Doubling Every
    type: 'custom:multiple-entity-row'
  ....
  - entities:
      - entity: input_number.confirmed_positive_covid_tests
        icon: 'mdi:counter'
        secondary_info: last-changed
      - entity: input_number.negative_covid_tests
        icon: 'mdi:counter'
        secondary_info: last-changed
      - entity: input_number.greater_pa_covid_deaths
        icon: 'mdi:counter'
        secondary_info: last-changed
    head:
      label: Overall
      type: section
    type: 'custom:fold-entity-row'
show_header_toggle: false
title: By County
type: entities

I use the multiple-entity-row to show the number of cases as well the doubling rate (more on this later).

Next I have a collection of two graphs showing the state of testing. These are mini-graph-cards that show the positive test rate and the number of positive, negative, and total tests.

F4 test rates

aggregate_func: last
color_thresholds:
  - color: green
    value: 30
  - color: yellow
    value: 60
  - color: red
    value: 70
  - color: crimson
    value: 90
color_thresholds_transition: hard
entities:
  - entity: sensor.covid_current_rate
    show_fill: false
    show_state: true
    state_adaptive_color: true
    title: Current
group_by: date
hours_to_show: 156
icon: 'mdi:home-thermometer'
lower_bound: 0
name: Positive Test Rates
show:
  labels: true
show_state: false
smoothing: true
type: 'custom:mini-graph-card'
upper_bound: 20

I calculate the positive test rate with a simple template sensor:

- platform: template
  sensors:
    covid_current_rate:
      friendly_name: Positive Case Rate
      unit_of_measurement: '%'
      value_template: >- 
        {% set pos = states.input_number.confirmed_positive_covid_tests.state |float %}
        {% set neg = states.input_number.negative_covid_tests.state |float%}
        {% set p_pos = states.input_number.presumptive_positive_covid_tests.state |float %}
        {% set pending = states.input_number.pending_covid_tests.state |float%}
        {% set total = pos+neg+p_pos %}
        {{(100*(pos+p_pos)/(total)) | round(1) }}

These are useful to see if there is a surge in testing and how that impacts the rate of positive tests. Currently awaiting said surge #45.

The next is the case fatality rate. I’m plotting it across the multiple sites and through time. This will help understand how overwhelmed the system is. If there are too many cases and not enough resources this will increase. If we’re managing things well this will stay constant.

F5 CFR

- platform: template
  sensors:
    philly_area_case_fatality_rate:
      friendly_name: Philly Case Fatality Rate
      unit_of_measurement: '%'
      value_template: >- 
        {% set deaths = states.sensor.philadelphia_area_covid_deaths.state | float %}
        {% set cases = states.sensor.philadelphia_area_covid_cases.state | float %}
        {{ (100*(deaths/cases)) | round(2) }} 

The next is a set of graphs for the number of cases/deaths and the change in doubling times. Currently there is only 3 days worth of doubling data (because of trouble with sensors, more on this later).

Calculating the doubling time turned out to be obnoxious. At first I figured I could go with the Trend sensor. It measures gradients well, I’ve used it for my temperature sensors and found it particularly useful. What I didn’t realize is that it doesn’t load information on HASS restart, its buffer of samples don’t reload and therefore it doesn’t give you a gradient until at least 2 more updates. When you’re dealing with daily level data and you’re stuck inside hacking on your Home Assistant, there are a lot of restarts and you’ll never get a gradient.

My solution was to use my InfluxDB to track & aggregate data and then calculate the rates. Then use the influxdb sensor to bring that data back into home assistant. I ended up making an influxdb database so that if I found another datasource I could merge things later. If someone can think of a pure HA solution I’d love to learn a new trick.

Here’s the influx query that works for my setup.

This one gathers my sensor data, aggregates it by day, then inserts it into my covid database.

SELECT max("value") AS "philly_area_deaths" INTO "covid"."autogen"."raw" FROM "home_assistant"."autogen"."value" WHERE "entity_id"='philadelphia_area_covid_deaths' GROUP BY time(1d) FILL(previous);

The next one takes the day level data and uses the derivative function to allow me to create the right equation to calculate the instantaneous doubling rate between any two days. This equation can be boiled down to log2(today/yesterday), (in words: what number, raised to the second power, will give me the increase I saw between yesterday and today). That is the time-constant, but most people prefer doubling-time, which is just the inverse. I had to use the derivative function to calculate the “yesterday” denominator because influx doesn’t seem to have a way to lag values. So, I calculated yesterday as “today minus the difference between yesterday and today”, since the difference is negative, the signs workout.

SELECT 1/log2((max("philly_area_deaths")/(max("philly_area_deaths")-derivative(max("philly_area_deaths"), 1d)))) AS "philly_area_deaths_doubling_time" INTO "covid"."autogen"."rates" FROM "covid"."autogen"."raw" WHERE time > :dashboardTime: GROUP BY time(1d) FILL(null);

I have these running on a continuous query.

Then I use the influxdb sensor to get the doubling time back out and graph. Remember, higher numbers are better (6 days to double is better than 2 days to double). A word of caution in how to interpret the changes in doubling times. When we surge testing I expect the case doubling time to decrease (because we’re DETECTING more cases per day), this is a good thing. If the case doubling rate is increasing (like in my graphs) it means that testing is actually slowing relative to the spread. The death doubling rate is what to look at when we’re assessing whether social distancing is working. If this number is increasing, then we’re seeing improvement. If this number is decreasing then it is a symptom of faster spread and a more overloaded system.

Lastly I have these prediction tabs. I keep them in folded cases like a “spoiler tag”, because sometimes you may not want to see. The predictions come from modifications of the sensors from the earlier post. I’ve made them specific to the data from each region and I’ve used the filter sensor to clean up some noise.

F7 predictions

entities:
  - entities:
      - entity: input_number.greater_pa_covid_deaths
        name: Deaths Seen Today
      - entity: sensor.pa_area_death_doubling_time_filter
        name: Doubling every
      - entity: sensor.pa_area_deaths_tom
        name: Tomorrow
      - entity: sensor.pa_area_deaths_week
        name: In a week
    head:
      label: PA
      type: section
    type: 'custom:fold-entity-row'
title: Predictions
type: entities

As before, I need the CFR and the doubling time to make my predictions. But, as I get values day to day there’s going to be noise. So, I’m putting them through a low-pass filter with a 7 day time-constant. So, my doubling-time and CFRs will roughly be an average of the past week, that seemed reasonable. I’m also using a time-window filter to remove the effect of my data entry on the CFR. Since I enter values by hand, only one value gets updated at a time, causing my template sensors to update, this then throws off the filter’s time-constant because those intermediate values aren’t “real”. By putting the time_throttle filter on, I can remove those intermediate values. I also put on range filter just incase I enter a 0 somewhere and get out of bounds values.

# Get the doubling time back out of influx
- platform: influxdb
  host: localhost
  username: 
  password: 
  queries:
    - name: Philly Area Deaths Doubling Time
      unit_of_measurement: days
      value_template: '{{ value | round(1) }}'
      group_function: last
      where: >-
        time > now() - 20d
      measurement: rates
      database: covid
      field: philly_area_deaths_doubling_time

- platform: filter
  name: "Philly Area Case Fatality Rate Filter"
  entity_id: sensor.philly_area_case_fatality_rate
  filters:
    - filter: time_throttle
      window_size: "00:05"
    - filter: range
      lower_bound: 0
      upper_bound: 100
    - filter: lowpass
      time_constant: 7
      precision: 2


- platform: filter
  name: "Philly Area Death Doubling Time Filter"
  entity_id: sensor.philly_area_deaths_doubling_time
  filters:
    - filter: range
      lower_bound: 0
    - filter: lowpass
      time_constant: 7
      precision: 2

Then I use these filtered values in my prediction sensors.

- platform: template
  sensors:

    philly_area_deaths_tom:
      friendly_name: Tomorrow's COVID Deaths (Philly)
      unit_of_measurement: 'people'
      value_template: >- 
          {% set deaths = states.sensor.philadelphia_area_covid_deaths.state | float %}
          {% set lag = 17.3 %}
          {% set doubling_time = states.sensor.philly_area_death_doubling_time_filter.state | float %}
          {% set CFR = (states.sensor.philly_area_case_fatality_rate_filter.state | float)/100 %}
          {% set doubings_during_lag = lag/doubling_time %}
          {% set true_cases_lagged = deaths/CFR %}
          {% set true_tom_lag = (true_cases_lagged * (2**(1/doubling_time))) | round %}
          {{ (true_tom_lag*CFR) | round}}
    
    philly_area_deaths_week:
      friendly_name: Week COVID Deaths (Philly)
      unit_of_measurement: 'people'
      value_template: >- 
          {% set deaths = states.sensor.philadelphia_area_covid_deaths.state | float %}
          {% set lag = 17.3 %}
          {% set doubling_time = states.sensor.philly_area_death_doubling_time_filter.state | float %}
          {% set CFR = (states.sensor.philly_area_case_fatality_rate_filter.state | float)/100 %}
          {% set doubings_during_lag = lag/doubling_time %}
          {% set true_cases_lagged = deaths/CFR %}
          {% set true_tom_lag = (true_cases_lagged * (2**(7/doubling_time))) | round %}
          {{ (true_tom_lag*CFR) | round}}

I’m still working out a way to measure my error correctly. It’ll probably have to be another Influxdb trick.

Has anyone else out there come up with another good metric to track? Anyone having better luck getting data in machine-readable format at the local level?

3 Likes

I just want to say, bravo! The work you are doing is well beyond anything I’ve seen anywhere else. I hope you continue to keep the energy up on this project.

2 Likes

Thank you for your valued contributions. I have managed to get most of the information from your first posts into my setup. The second set of data is much, further above my level of thinking-- you’re smart. Me not as much.

Again. Thank you. This is much better than the mainstream garble we are being fed. Looking forward to any future posts on this from you.

Much apprecated.

And by the way, yesterday was at 15, andpredicted 17, today, the number was 17.

1 Like

That post was made merely 14 days ago. Anyone looking for a sobering definition of ‘exponential growth’ only needs to check today’s tally of confirmed cases and deaths in the USA (more than 164,000 and 3,100, respectively; from here: COVID-19 Map - Johns Hopkins Coronavirus Resource Center).

We’re in for a long wait until a vaccine becomes widely available (or a ‘plateau’ in confirmed cases). Please take all recommended precautions to minimize the spread/contraction of this virus. It’s more virulent and lethal than ‘seasonal flu’ so act accordingly. Good luck to all of you.

1 Like

Part 3: Some better sensors

Ugh, managing this data is a pain. My state’s website has changed format 4 times in the past 3 weeks. My suggestions to the “OpenCity data” projects to create wget-able APIs have gone completely unanswered.

But, yesterday (I think, time is a flat circle now) the NY Times released a Github project that contains county-level data. They’ve been keeping it updated with about a day’s worth of lag. The data entry has matched what I’ve collected so far.

Nicely it has an easily parse-able, wget-able, and consistent web address for the data.
https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv

I made a simple Python script that parses out the data by county for a queried state. It returns the data in a json object that you can easily parse with the command line sensor.

Here’s the python script.

import requests
import sys
import csv
import json

url = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv'
resp = requests.get(url)
decoded_content = resp.content.decode('utf-8')
reader = csv.DictReader(decoded_content.splitlines(), delimiter=',')

states = {'Pennsylvania', 'New Jersey'}
state = sys.argv[1]
county = {}
for row in reader:
    if row['state'] == state:
        county[row['county']+'_cases'] = int(row['cases'])
        county[row['county']+'_deaths'] = int(row['deaths'])    
        county['last_date'] = row['date']
        
# CSV is in sorted order, so whatever is left is the most recent.

# Calulcate state-level info
state_cases = 0
state_deaths = 0
for key, val in county.items():
    if key.endswith('deaths'):
        state_deaths += val
    elif key.endswith('cases'):
        state_cases += val

county['state_level_cases'] = state_cases
county['state_level_deaths'] = state_deaths

# This will print out the dict in json format for easy parsing in HA.
print(json.dumps(county))

Which I saved as covid_county.py in the base of my config directory. Then I made two command line sensors.

- platform: command_line
  name: PA Covid Stats
  unit_of_measurement: people
  scan_interval: 7200 # Every two hours, they have day-level updates but at random times. This seems reasonable.
  value_template: '{{ value_json.state_level_deaths }}'
  command: "python covid_county.py Pennsylvania"
  json_attributes:
    - state_level_deaths
    - state_level_cases
    - last_date
    - Berks_cases
    - Berks_deaths
    - Bucks_cases
    - Bucks_deaths
    ...
 
- platform: command_line
  name: NJ Covid Stats
  unit_of_measurement: people
  scan_interval: 7200 # Every two hours, they have day-level updates but at random times. This seems reasonable.
  value_template: '{{ value_json.state_level_deaths }}'
  command: "python covid_county.py 'New Jersey' " # note the extra quotes around states with multiple words
  json_attributes:
    - last_date
    - state_level_deaths
    - state_level_cases
    - Camden_cases
    - Camden_deaths

You can adjust the json_attributes to or the value_template to pull out the data you want. Then you can use template sensors to aggregate them as needed. Now you can set an automation that alerts you to new data.

2 Likes

As this grows, the county information is so much more important to me than the state or country level information, so thank you for the work on this! It is working for me flawlessly.