Basic web scrape

Luis_Sammy · July 6, 2022, 9:07am

Hi there and thanks in advance for your help.

I´m trying to scrape a really really really simple website (see the picture) but I can´t make the sensor work

I just need the time “Tiempo (min)”

This is the web code:

<!DOCTYPE html><html lang="es"><head><meta charset="utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="description" content=""><meta name="viewport" content="width=device-width, initial-scale=1"><title></title><link rel="stylesheet" href="/stylesheets/style.css"></head><body><h1></h1><table><tr><th>Línea</th><th>PMR</th><th>Destino</th><th>Tiempo (min)</th></tr><tr><td>4</td><td></td><td>PINAR DE JALON</td><td>17</td></tr></table><p><a href="/">Inicio</a></p></body></html>

I have tried multiple selects but no luck!


- platform: scrape
  name: bus
  resource: https://auvasa-scraper.herokuapp.com/web/stop/1190
  select: "Tiempo (min)"

- platform: scrape
  name: bus
  resource: https://auvasa-scraper.herokuapp.com/web/stop/1190
  select: "td"

etc...

If I use select: “td” it gives me the number 4 which is the line but I need the fourth td

Can anyone tell me how to extract that information? THANKS AGAIN!!

AdmiralStipe · July 6, 2022, 9:46am

I would try:
select: "body > table > tbody > tr:nth-child(2) > td:nth-child(4)"
That’s the selector I get in Microsoft Edge for your time.

Maybe even only
select: "td:nth-child(4)"
would be enough.

Troon · July 6, 2022, 9:54am

- platform: scrape
  name: bus
  resource: https://auvasa-scraper.herokuapp.com/web/stop/1190
  select: "td"
  index: 3

(I think the index is 0-based, so index: 3 should give you the fourth cell)

Luis_Sammy · July 6, 2022, 9:59am

Both options work like charm!!! thank you veryyyyy much