what you need to change in your addon config has nothing to do with terminal.
you need to change that in the addon config from appdaemon, so that for the appdaemon addon beautifullsoup is installed.
which is a python package which normally is installed with pip, but you cant use pip inside an addon.
Thanks guys, this is what I had to put.
{
"log_level": "info",
"system_packages": [],
"python_packages": [
"beautifulsoup4"
]
}
The sensor shows up, Im so EXCITED! Thank You so much!!!
remember, i wasnt sure which way to use time for HA, so its possible HA doesnt recognize it as time.
you need to try out automations for that.
and dont forget that this webpage is only for this year. its very well possible that they change something in the page next year, and then with the help from my comments you should be able to figure out how to change your code accordingly.
You are awesome M8, Iām going to study this and try to do more like it! Thanks so much
Actually the time is set for the next game to be at ā17:00:00ā but the time should be ā13:00:00ā it is EST (Eastern Standard Time) maybe thatās why?
Also I may try to change the way it showes the date. I have a Updated US sensor so it shows how weāre more used to seeing it.
it takes the time that is written on the website.
so i think that needs a small change to make it your time.
you could find the line that changes the time to a time object, this one
game_time = datetime.datetime.strptime(game_start,"%Y-%m-%dT%H:%M:%SZ")
and add or deduct the amount of hours for your time like this
game_time = datetime.datetime.strptime(game_start,"%Y-%m-%dT%H:%M:%SZ") + datetime.timedelta(hours= 4)
and in this line:
next_game_str = next_game_time.strftime("%Y/%m/%d %H:%M:%S")
you can change the way that the time is set in the sensor.
i dont know what the format should be so that HA sees it as time and not as string.
next_game_str = next_game_time.strftime("%Y/%m/%d %H:%M:%S")
you can change the way that the time is set in the sensor.
i dont know what the format should be so that HA sees it as time and not as string.
could I do this?
next_game_str = next_game_time.strftime("{{ now().strftime('%I:%M %p') }}")
i really dont know what you are trying.
but it wont work.
why would you want to put the actual time(now() ) in the place from the game time?
you could use
next_game_str = next_game_time.strftime("%I:%M %p")
but then the sensor would only have the time.
and when you want to use the value for an automation you need a date and a time in a way that HA knows its a time
but you could add another attribute that sets the datetime for HA and automate on that.
but you still want to see the date in the sensor value, so i would say
next_game_str = next_game_time.strftime("%Y/%m/%d %I:%M %p")
its because im a noob lol
this worked perfectly!
next_game_str = next_game_time.strftime("%m/%d/%y %I:%M %p")
and this
game_time = datetime.datetime.strptime(game_start,"%Y-%m-%dT%H:%M:%SZ") + datetime.timedelta(hours= -4)
Thanks so much, its perfect!
I was thinking about trying this. I also found this:
and
and
ā¦damn. You can go crazy with this. as always
Hi, Rene
I hope all has been well for you! I got a few questions if you would be willing to take a look.
I havenāt been able to understand the Appdeamon scraping so I did try to modify the one you sent me last year. I did leave your instructions so I could try to follow along. im sure Im waaaaaay off but this is what I was trying to get.
https://www.nfl.com/standings
Im trying to get just the AFC North Team information, so I can place it in my lovelace page ( I have a Cleveland Browns Section)
It would be really cool to get this information on my page and have it update each week. I have tried iframe but i couldnāt get it to give me specific parts of the page. Is that even possible with a app like the one you supplied?
Donāt laugh too much this is what I tried, but it is getting errors.
###########################################################################################
# an app to that creates a sensor out of data collected from #
# https://www.nfl.com/standings #
# #
###########################################################################################
import appdaemon.plugins.hass.hassapi as hass
import datetime
import time
import requests
from socket import timeout
from bs4 import BeautifulSoup
class standings(hass.Hass):
def initialize(self):
#################################################################
# when initialising the sensor needs to be imported #
# but we need to run the same code again to get the next values #
# thats why i only start the call_back from here #
#################################################################
self.get_values(self)
def get_values(self, kwargs):
#################################################################
# first we set some values, this could be done in the yaml #
# but this app is specialized and will only work for this #
# webpage, so why bother #
#################################################################
self.url = "https://www.nfl.com/standings"
self.sensorname = "sensor.standings"
self.friendly_name = "AFC North Standings"
afc_standings = None
#################################################################
# now we read the webpage #
#################################################################
try:
response = requests.get(self.url, timeout=10)
except:
self.log("i couldnt read the nfl standings page")
return
page = response.content
#################################################################
# now that we got the webpage we make the data readable #
#################################################################
soup = BeautifulSoup(page, "html.parser")
#################################################################
# in the google chrome console we are going down the tree from #
# body. every time an indention is visible we add the next #
# element. untill we see main, which contains a lot of section #
# elements. nextSibling makes us go to the next element on the #
# same level. untill we reach the table containing the schedule #
# cards. some invisible empty siblings make that we need more #
# rimes nextSibling then the amount of sections #
#################################################################
cards_table = soup.body.div.main.section.nextSibling.nextSibling.nextSibling.nextSibling.nextSibling.nextSibling.nextSibling.nextSibling.nextSibling.nextSibling.nextSibling.nextSibling
#################################################################
# to see if we got the right data we log it. uncomment when #
# you expect that the webpage is changed #
self.log(cards_table) #
#################################################################
#################################################################
# now we find the first card inside the table #
#################################################################
first_card = cards_table.div.div
#################################################################
# the first card is the title card containing "regular season" #
# now we are going to loop to the cards following that first 1 #
#################################################################
for standings_card in first_card.find_next_siblings():
#############################################################
# lets find the date we want out of the card #
#############################################################
try:
afc_north_standings = standings_card.div["data-gametime"]
except:
#########################################################
# there is no date found in this card (probably an add) #
#########################################################
standings = ""
#############################################################
# if we find a date, then we need to translate the date to #
# a time we can compare. in this case we find a date like #
# like this 2018-09-09T17:00:00Z which is %Y-%m-%dT%H:%M:%S #
# (python datetime lib docs tell us that) #
#############################################################
#if standings != "":
# standings = datetime.datetime.strptime(
# game_start, "%Y-%m-%dT%H:%M:%SZ") + datetime.timedelta(hours=4)
#########################################################
# find out if this date is in the future #
#########################################################
# if game_time > datetime.datetime.now():
#####################################################
# check if we didnt find one before, when not set it#
#####################################################
# if next_game_time == None:
# next_game_time = game_time
#################################################
# now that we know that this is the next game #
# lets also lookup the opponent in the card #
# it will make a nice attribute for the sensor #
# to remove all whitespace we use strip() #
# again we can find that by looking at the #
# google chrome console #
#################################################
afc_north_standings = standings_card.div.div.div.nextSibling.nextSibling.nextSibling.p.nextSibling.nextSibling.string.strip()
#################################################
# and we want to find the channel that it will #
# be on. #
#################################################
cleveland_browns = standings_card.div.div.nextSibling.nextSibling.div.nextSibling.nextSibling.div.div.span.nextSibling.nextSibling.string.strip()
#################################################################
# now we got all data we need but the date isnt what we need #
# we translate that again to the timeformat we want to see #
# for the HA sensor #
#################################################################
#next_game_str = next_game_time.strftime("%Y/%m/%d %H:%M:%S")
#################################################################
# now we got all info we need and we can create a sensor. #
# the first time that the code is run it will create a warning #
# that the sensor doesnt exist. if we see that in the log we #
# know that the sensor is created. #
#################################################################
self.set_state(self.sensorname, state=afc_north_standings_str, attributes={
"friendly_name": self.friendly_name, "Cleveland Browns": cleveland_browns, "Pittsburgh Steelers": pittsburgh_steelers, "Baltimore Ravens": baltimore_ravens, "Cincinnati Bengals": cincinnati_bengals})
#################################################################
# now al we need to do is make sure that the sensor stays up to #
# date. we could check the webpage every minute, but that would #
# be unneccesary traffic. we dont know exactly when the webpage #
# is updated, we need to use a short time after the game, but #
# we dont want it to be too long #
# if the sensor isnt up to date, just check the page, restart #
# the app and or change the extra time we now add #
#################################################################
update_standings = afc_north_standings
#################################################################
# so we got a time that we want to update the sensor. so we run #
# this code again at that time. #
#################################################################
self.run_at(self.get_values, update_standings)
and
afc_north_standings:
module: afc_north_standings
class: afc_north_standings
am I really far off? If you donāt have time or donāt wish to look at it I completely understand as I clearly learned nothing since the origional post. I canāt even get that to work either
Thanks either way!
Corey
if you want to learn scraping there is 1 thing very important.
you need to know how pages are build, and you need to be able to read them.
when you open your webpage in chrome, and then use [ctrl] [shift] I your page splits in 2 parts.
left where you have your page and right the way it is build
the first thing to do is to learn to read that right part.
the webpage is build up in levels.
the first 1 is HTML , you see that at the top, that is like the root dir.
the second one is head, which is like a subdir from html
a subdir is called a child and html is the parent from head
then you see below head the subdir body.
body is a sibbling from head and also a child from html.
now you go down like its an ancestry tree.
when you start scraping you need to try and look at your logging, untill you only see the value you want to save in a sensor.
lets look at the code:
###########################################################################################
# an app to that creates a sensor out of data collected from #
# https://www.nfl.com/standings #
# #
###########################################################################################
import appdaemon.plugins.hass.hassapi as hass
import datetime
import time
import requests
from socket import timeout
from bs4 import BeautifulSoup
class standings(hass.Hass):
def initialize(self):
#################################################################
# when initialising the sensor needs to be imported #
# but we need to run the same code again to get the next values #
# thats why i only start the call_back from here #
#################################################################
self.get_values(self)
def get_values(self, kwargs):
#################################################################
# first we set some values, this could be done in the yaml #
# but this app is specialized and will only work for this #
# webpage, so why bother #
#################################################################
self.url = "https://www.nfl.com/standings"
self.sensorname = "sensor.standings"
self.friendly_name = "AFC North Standings"
afc_standings = None
#################################################################
# now we read the webpage #
#################################################################
try:
response = requests.get(self.url, timeout=10)
except:
self.log("i couldnt read the nfl standings page")
return
page = response.content
this part you never need to change. what is done here is that you get the webpage and save all its content (what you see in chrome on the right side) in the variable call āpageā
then we use this command:
soup = BeautifulSoup(page, "html.parser")
to get all that from the page into the variable called āsoupā, but in a way we can work with it.
now you get the hard part.
you can use a line like
self.log(soup)
to show the content from the variable in your logs.
in this case it shws everything from the webpage
in chrome you find the next level you want to find
off course every information you want is in body, so you can use
self.log(soup.body)
now you already get a bit less information in your log.
lets take another look at our chrome page
we already did know we needed soup.body
as you can see in the page, everything from the page is inside a sublevel (child) called div with id=content.
so we change our log to
self.log(soup.body.div)
inside that div is another div (data_radium=ātrueā)
and inside that another div, which has no settings.
so we are already on
self.log(soup.body.div.div.div)
now we get to a difficult level. when we log that or look in chrome we see a few divs as children from the div we have chosen in our log.
when we want the first one, we could just use div and go on.
but we want the next one that has class=āapplication-shellā
so instead of just div, we use div.nextSibling
div means we chose all children with the name div, but we dont want number 1, but the nextSibling
so our log now looks like:
self.log(soup.body.div.div.div.div.nextSibling)
now you go deeper and deeper, every time you save the app, look at the log and see if the first line from the log is the same as you see inside chrome, so you know where you are.
all the way untill you see the value you look for.
if we are on the level where our info is and we dont need any more divs (or other html elements)
we can use .string to view the text inside that child.
now we can put everything we got between the ( ) from the self.log into a variable
my_value = soup.soup.body.div.div.div.div.nextSibling # and everything that you need to put behind here
and then all you need is set_state.
i hope this did make clear how you need to work to get your data.
of course there are a lot of other ways.
if it isnt clear enough the way i tell, or i did chose a way that is to difficult, then you can look at 1 or more from the many online tutorials.
to find those google for āscraping with beautifulsoup tutorialā or something like that
because the program you use is BeautifulSoup
i wish you success, and hope to hear soon that you did understand and that it worked.
Thank you so very much, you really are the best! I canāt believe you are so willing to help that you typed that all out for me. Again thank you my friend.
I tried running through this a few times and setup what I thought I should.
maybe I did something wrong. It may take me some time to figure out, between the family and work (im in pest control) im swamped at work. but ill watch a few more tutorials and try to follow your guide and see if I can figure out what im doing wrong. I appreciate you bro!
appdaemon>apps>standings.py
appdaemon>apps>apps.yaml (placed this;
standings:
module: standings
class: standings
is that correct?
###########################################################################################
# an app to that creates a sensor out of data collected from #
# https://www.nfl.com/standings #
# #
###########################################################################################
import appdaemon.plugins.hass.hassapi as hass
import datetime
import time
import requests
from socket import timeout
from bs4 import BeautifulSoup
class standings(hass.Hass):
def initialize(self):
#################################################################
# when initialising the sensor needs to be imported #
# but we need to run the same code again to get the next values #
# thats why i only start the call_back from here #
#################################################################
self.get_values(self)
def get_values(self, kwargs):
#################################################################
# first we set some values, this could be done in the yaml #
# but this app is specialized and will only work for this #
# webpage, so why bother #
#################################################################
self.url = "https://www.nfl.com/standings"
self.sensorname = "sensor.standings"
self.friendly_name = "AFC North Standings"
afc_standings = None
#################################################################
# now we read the webpage #
#################################################################
try:
response = requests.get(self.url, timeout=10)
except:
self.log("i couldnt read the nfl standings page")
return
page = response.content
soup = BeautifulSoup(page, "html.parser")
self.log(soup.body.div.div.div.div.nextSibling)
my_value = soup.soup.body.div.div.div.div.nextSibling
self.set_state(self.sensorname, state=afc_north_standings_str, attributes={
"friendly_name": self.friendly_name, "Cleveland Browns": cleveland_browns, "Pittsburgh Steelers": pittsburgh_steelers, "Baltimore Ravens": baltimore_ravens, "Cincinnati Bengals": cincinnati_bengals})
that on its own is correct, except that i didnt do all the work for you.
so the last 2 lines are not needed yet (only when you are ready and when you get the actual value in your log)
so first you need to understand what i did to get to that point, and then you need to go deeper into the structure.
so the line
soup.body.div.div.div.div.nextSibling will get a lot longer before your finished, and every time in between you look at your logs and add 1 step, look at your logs again, add another step, untill there is only the value you want.
so you are not doing anything wrong, your yourney didnt end yet.
and your welcome, if i succeed to help you learn how to do it, i have a good day.
HI Buddy, i have been trying to wrap my head around this, but I never was able to figure it out.
This led me to finding jupyter notebook, with that I was able to scrape a page using pandas. Actually that was the easiest way. I got everything to print perfectly. Now I was wondering if this could be used in Appdeamon. I tried a few different ways but couldnt get it to work.
This is that I came up with:
import pandas as pd
d = pd.read_html('https://www.clevelandbrowns.com/team/standings/', index_col=0)
df = d[0]
df
this is the printout:
Is this something that can be done? Or do you have a suggestion where I should look to get this into a lovelace view? Thanks man I hope all is well with you.
sorry mate,
but this kind of info cant be used.
in a sensor you can put a line of text or a value, but not a whole webpage.
you could however save that html into a file and view that file on dashboard with iframe.
it would be available on a dashboard but not inside HA.
maybe we should have a chat on discord on some point so that i can explain it better step by step.
This post of yours has just helped me write my first Python program and changing it so it can be initialized by AppDeamon.
This post of yours is probably the best example on the web that demonstrates AppDeamon use.
The solution is not yet finished, but it is finally working.
So, just wanted to say: Thank you
your very welcome.
in that case i advise you also to read this: