If you are comfortable with databases and SQL, as a first iteration you can setup a MariaDB database server and use it as Home Assistant’s data store for it’s ‘recorder’ function, see info at link below. Depending on how beefy your HA server is, you can run the database server on the same server using standard linux install methods, docker or even a vm. Or run the database server on a separate machine. With this setup it is now easy to access the data coming into Home Assistant with standard database query tools and data science tools like juypterlab.
I do this using Postgresql in docker on a Intel i7 that also runs HA in docker. I have found it very successful. I have 3 years of history in Postgresql.
I ran InfluxDB and Grafana as well for a while, but did not find enough value to keep educated on another set of tools when I was already using python, matplotlib and juypter for ML and visualization. Others find the value of Influx and Grafana worth the effort.
I have a quarter trillion records in the Postgresql database and have found that performance of database is fine. Home Assistant is inserting between 30k and 40k records per hour into the Postgresql server. And when I ran Influx in parallel with Postgresql, the storage compression was not significant, as HA’s data is relatively horizontally small.
This nice thing about first moving to an external relational database is that you size up the performance and then decide if you needs require a time series database for analytics. Or since you already have a Influx instance running, you can run MariaDB in parallel and compare your experiences.
Home Assistant has some juypter notebooks at the second link below will give you a start on the data model directly in HA. I started here but do most work directly to Postgresql database with my own notebooks. There are other ML packages as well, Orange, is a great starting place. Link below.