Voice over IP Integration - Call from Any SIP Softphone

Oef, not that advanced over here. How can I do that? ssh into HA and then tcpdump command which what parameters?

Yeah, sorry, if you are up for learning I can try to describe how I would do it. Start by connecting to HA with ssh, then find out what your network interface is called with the ifconfig command. There might be a few listed so you want the one associated with the IP address you use to connect to HA. Once you have that I would run tcpdump with something like:

tcpdump -i <iface name> 'udp port 5060'

You might want to write that out to a file by adding ‘-w ’. You can then read it with ‘tcpdump -r -A’.

Great, always keen to learn.

See below the output from the network interface when I call the VOIP integration. Basically it rings but it is not being pick-up

There is nothing else in this log

Sorry, I think you will need to add -A to that command to get it to print out the packet contents.

When I use the app the successful log info is:

Speech-to-text2.22s :white_check_mark:
Engine
stt.faster_whisper
Language
en
Output
What is the outside humidity?
Raw
Natural Language Processing0.06s :white_check_mark:
Engine
conversation.home_assistant
Language
en
Input
What is the outside humidity?
Response type
query_answer
Prefer handling locally
true
Processed locally
true

I see the error “speech-to-text failed” when running with the HT801V2 and the log is below. I have the codec (all entries) set to OPUS and followed the tutorial for setting up the rest, though the info for setting up the GrandStream device is rather light in the tutorial.

stage: done
run:
pipeline: 01jgm6dzfctr1t7ngkfnwymrv4
language: en
events:

  • type: run-start
    data:
    pipeline: 01jgm6dzfctr1t7ngkfnwymrv4
    language: en
    timestamp: “2025-01-16T21:38:37.193354+00:00”
  • type: stt-start
    data:
    engine: stt.faster_whisper
    metadata:
    language: en
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
    timestamp: “2025-01-16T21:38:37.193614+00:00”
  • type: error
    data:
    code: stt-stream-failed
    message: speech-to-text failed
    timestamp: “2025-01-16T21:38:42.552141+00:00”
  • type: run-end
    data: null
    timestamp: “2025-01-16T21:38:42.552638+00:00”
    stt:
    engine: stt.faster_whisper
    metadata:
    language: en
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
    done: false
    error:
    code: stt-stream-failed
    message: speech-to-text failed

Indeed, see below the output (which then repeats)

Any ideas?

That is interesting, it looks like in the From header there is no space between the description and the actual URI. I’ll have to take a closer look at the SIP spec, I think the space is required.

I stand corrected, it appears in https://www.ietf.org/rfc/rfc3261.txt section 20.10 Contact that

There may or may not be LWS between the display-name and the “<”.

Assuming LWS means linear white space. Looks like we need to update the header parsing to account for that.

As you may have noted it appears your STT processing is taking longer than 2 seconds, which I suspect is the cause of the problem.

Great to hear you might have found the issue. I am sure other 3cx users will be happy to get this resolved. Let me know if you want more details on my set-up to further help resolve.

It was a pretty easy fix, if you want to watch for if/when it gets merged Fix SIP header parsing by jaminh · Pull Request #26 · home-assistant-libs/voip-utils · GitHub.

1 Like