How a computational biologist tracks COVID-19 pt 1 of?

JudoWill · April 3, 2020, 2:30pm

Yeah, and having a place that has recorded all of the information into CSV files lets us better look at how things were changing before we (personally) started recording, my state website only provides the “today” results and doesn’t even publish changing rates making it really hard to see if this is on an upward or downward trend in the doubling rates.

texanbull14 · April 3, 2020, 3:32pm

Absolutely. Tried to figure this out and I couldn’t so appreciate it.

My lovelace config with the data:

finity · April 3, 2020, 3:38pm

Has anyone tried to figure out way to make a logarithmic graph? I’ve looked around but haven’t been able to find anything.

JudoWill · April 3, 2020, 4:11pm

I haven’t been able to do it through mini-graph-card or the standard history graph. My only solution has just been to use the log jinja filter. The standard base is e like so {{100 | log}} ~= 4.6, you can change the base like so {{ 100 | log(10) }} == 2. I just made a template sensor that just reports the log-10 values (rounded to 2) and then plot that sensor.

JudoWill · April 3, 2020, 4:15pm

@texanbull14, sweet skinning. Are you using a picture-element to show the map? I was thinking the same trick myself.

texanbull14 · April 3, 2020, 4:28pm

You got it!

finity · April 3, 2020, 5:25pm

Thanks!

Thanks again! that just answered another question I had.

I saw an error in my logs at restarts about division by zero in some of my template sensors. And when digging thru I noticed the template with the above format and I wasn’t able to see what it was supposed to be doing since I had never seen the log base notation before. Now I get it.

But I still haven’t solved the error. But it works after startup so it must just be that one of my inputs hadn’t initialized yet. Not really a big deal.

finity · April 4, 2020, 8:59pm

I’m missing something here…

I’ve tried to set up the latest python script and command line sensor and it isn’t working. I keep getting “command failed…” error in the log and the sensors are “unknown”

Here is the python script saved as "county_covid.py in my python_scripts directory. It’s the same as yours but I just changed the states to my own:

import requests
import sys
import csv
import json

url = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv'
resp = requests.get(url)
decoded_content = resp.content.decode('utf-8')
reader = csv.DictReader(decoded_content.splitlines(), delimiter=',')

states = {'Indiana', 'Ohio', 'Michigan'}
state = sys.argv[1]
county = {}
for row in reader:
    if row['state'] == state:
        county[row['county']+'_cases'] = int(row['cases'])
        county[row['county']+'_deaths'] = int(row['deaths'])    
        county['last_date'] = row['date']
        
# CSV is in sorted order, so whatever is left is the most recent.

# Calulcate state-level info
state_cases = 0
state_deaths = 0
for key, val in county.items():
    if key.endswith('deaths'):
        state_deaths += val
    elif key.endswith('cases'):
        state_cases += val

county['state_level_cases'] = state_cases
county['state_level_deaths'] = state_deaths

# This will print out the dict in json format for easy parsing in HA.
print(json.dumps(county))

then here is one of the command line sensors and just changed the state/counties to mine:

- platform: command_line
    name: IN Covid Stats
    unit_of_measurement: people
    scan_interval: 600 #7200
    value_template: '{{ value_json.state_level_deaths }}'
    command: "python covid_county.py Indiana"
    json_attributes:
      - state_level_deaths
      - state_level_cases
      - last_date
      - Dekalb_cases
      - Dekalb_deaths
      - Allen_cases
      - Allen_deaths

in the script what is the purpose of “states = {‘Indiana’, ‘Ohio’, ‘Michigan’}”? I don’t see the “states” variable used anywhere.

I can run the python script without error in the console of my HA docker container so I know that part is working but even there is doesn’t return any county info at all - just {“state_level_cases”: 0, “state_level_deaths”: 0} as the result.

And to add I have other python scripts that are working just fine from that same directory.

I can’t see why it’s failing.

EDIT:

after running the script as a service and adding "data: ‘Indiana’ " in the data field I get the following error in my log:

2020-04-04 17:09:43 ERROR (SyncWorker_7) [homeassistant.components.python_script.covid_county.py] Error executing script: __import__ not found
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/python_script/__init__.py", line 196, in execute
    exec(compiled.code, restricted_globals)
  File "covid_county.py", line 1, in <module>
ImportError: __import__ not found

EDIT 2:

I figured out why my script call in the sensor was failing. I didn’t write out the entire path for the script location. I assumed HA would know the script is in the “python_scripts” folder but apparently not.

HOWEVER, I’m still getting the “import” error in my logs from above and the state is always 0 with no county attributes.

How do I run the script from inside HA and get the “import” command to work in the script?

vermium-sifell · April 4, 2020, 9:01pm

Sorry for saying this. But its very cool to calculate next days statuses

JudoWill · April 19, 2020, 4:00pm

Part 4: How is the prediction holding up.

How’s everyone doing? Staying home whenever possible I hope.

Like any good model, it is important to look at how well your predictions match reality. I pulled out my predictions for the Philadelphia area and offset them forward by a single day to match with the next day’s results. I ended up doing this in Grafana, I couldn’t figure out a HA way to do the same. Unfortunately, InfluxDB/Grafana doesn’t have a true way to offset the data, I had to use first(tomorrow_sensor) and last(today_sensor) aggregations to cheat this single-day offset. So, I can’t use it to check my week-long predictions.

It is off by less than 1% on any given day. I’ve found the low-pass filter on the doubling-rate and CFR has been pretty important since they have a lot of day-to-day variance. For example, on weekends, less deaths are reported, using a low-pass filter has seemed to help smooth these out.

Looking at the 1 week predictions by eye has them off by significantly more. Which is not too surprising, small differences in the parameters lead to wild differences with exponential growth. And none of these values are “constants”.

Keeping a daily track of all of these metrics has helped to notice trends like this:

The case fatality rate has been steadily increasing, a troubling sign.

And this:

The number of deaths per day has been steadily increasing. The blue line marks the PA stay-at-home order, four weeks ago. Either the order isn’t being well followed, the lag is significantly greater than 17 days, or the data hasn’t caught up with the true CFR. Realistically, it is a mix of everything.

Has anyone else come up with good visualizations that look at the month-long local state of this pandemic?

dowden.asst · May 7, 2020, 2:20am

finity ~
how did you define your path?

finity · May 7, 2020, 2:45pm

command: "python /config/python_scripts/covid_county.py Indiana"

However I don’t know if I ever fully got that part to work.

All of the other previous sensors were working OK so I never went any further troubleshooting the errors I was getting as noted in that post above.

AndyRPH · June 5, 2020, 9:10pm

Having trouble since my state has cities that exist not apart of counties.

Should i be able to run python covid_county.py Virginia from the command line and get output? I get an error:

Traceback (most recent call last):
  File "covid_county.py", line 14, in <module>
    for row in reader:
  File "/usr/lib/python2.7/csv.py", line 108, in next
    row = self.reader.next()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 13: ordinal not in range(128)

My sensors yaml entry reads:

- platform: command_line
    name: VA Covid Stats
    unit_of_measurement: people
    scan_interval: 7200 # Every two hours, they have day-level updates but at random times. This seems reasonable.
    value_template: '{{ value_json.state_level_deaths }}'
    command: "python /config/misc/covid_county.py Virginia"
    json_attributes:
      - state_level_deaths  
      - state_level_cases
      - last_date
      - Augusta_cases
      - Augusta_deaths
      - Staunton_city_cases
      - Staunton__city_deaths
      - Waynesboro_cases
      - Waynesboro_deaths

However, looking at the csv file, it reads with spaces in the city names
2020-04-05,Staunton city,Virginia,51790,1,0

What’s the proper way to escape a county name with spaces in it?

rak · June 8, 2020, 6:56am

Hi guys,

You now Germans like numbers … . German officials set rules when comming back from a country which had more than 50 new infections per days per 100.000 inhabitants (avg over 7 days). Thats why I am very much interessted in how those numbers evolve.

Can anyone provide me a source or some hint where to get/derive numbers. The official integration delivers cummulated confirmed cases. I need delta from today and yesterday. How can this be done in HA? That would be already an interessting figure.

Regards
Ralg

rak · June 10, 2020, 10:56am

Hi,

I was able to solve aboves question with the derivate integration. However I would love to see the numbers from the ECDC. I created the following Python script to extract the numbers I am interested in (official incidence metric from EU if a country is deemed critical).

This official incidence metric is “cummulative new cases over the last 7 days per 100.000 capita”.

I think this is a very good number to see where the problem is still on its verge and were it is under control. Official EU view is >50 is critical.

import pandas as pd
import io
import requests
import datetime
from IPython.display import display, HTML

url="https://opendata.ecdc.europa.eu/covid19/casedistribution/csv"
s=requests.get(url).content
df=pd.read_csv(io.StringIO(s.decode('utf-8')))

df['dateRep'] =  pd.to_datetime(df['dateRep'], format='%d/%m/%Y')
df = df[df.dateRep > datetime.datetime.now() - pd.to_timedelta("8day")]

incidence = df.groupby("countriesAndTerritories")[["geoId","countryterritoryCode","continentExp","cases","deaths","popData2018"]].agg({"continentExp": 'first',"countryterritoryCode": 'first',"geoId": 'first', "cases": 'sum',  "deaths": 'sum', "popData2018": 'max'})
incidence['incidence'] = round(incidence["cases"] / incidence["popData2018"] * 100000)

Result for today is

Qatar                       483.0
Bahrain                     280.0
Chile                       201.0
Armenia                     142.0
Kuwait                      130.0
Oman                        124.0
Andorra                     113.0
Peru                        105.0
Brazil                      102.0
Panama                       81.0
Sweden                       80.0
Djibouti                     79.0
Belarus                      72.0
Saudi_Arabia                 64.0
Singapore                    57.0
United_States_of_America     52.0

Is there any chance we integrate this metric into the official COVID integration, or how simple is it to create a integration from above python code. I mus admit I can do some python, but I have never build an own integration.

Sincerely.
Ralf