Qt QNetworkAccessManager causing latency spikes on wifi

I don’t post a lot from my other life, the programming personality, most of my posts come from my adventures in virtualization. I have a bit more fun writing about virtualization than I do writing about programming, plus the number of coding blogs out there is enormous. But I’ve been troubleshooting an extremely hard to pin down issue with one of my applications (WhatPulse), where there would be a latency spike every 10 seconds (or less), only when primarily connected on wifi on Mac OS 10.12 (lower versions don’t seem affected). I’ve been pulling my hair on this because there was nothing to be found on it. Hence this contribution to the interwebs.

Pinging to your default gateway IP address looked like this:

After digging a bit further and looking at the files being opened by the app, I discovered that the Mac OS network settings files were being opened during these latency spikes.

Martijns-iMac:PowerShell martijn$ sudo opensnoop -ve 2>&1 | grep -i whatpulse
2017 Jun 17 21:20:44   501  93561 WhatPulse     14   0 /Library/Preferences/SystemConfiguration/preferences.plist 
2017 Jun 17 21:20:44   501  93561 WhatPulse     14   0 /System/Library/Frameworks/SystemConfiguration.framework/Resources/English.lproj/NetworkInterface.strings 
2017 Jun 17 21:20:44   501  93561 WhatPulse     14   0 /System/Library/Frameworks/SystemConfiguration.framework/Resources/English.lproj/NetworkInterface.strings 
2017 Jun 17 21:20:44   501  93561 WhatPulse     14   0 /System/Library/Frameworks/SystemConfiguration.framework/Resources/English.lproj/NetworkInterface.strings 
2017 Jun 17 21:20:44   501  93561 WhatPulse     14   0 /System/Library/Frameworks/SystemConfiguration.framework/Resources/English.lproj/NetworkInterface.strings 
2017 Jun 17 21:20:44   501  93561 WhatPulse     14   0 /System/Library/Frameworks/SystemConfiguration.framework/Resources/English.lproj/NetworkInterface.strings 
2017 Jun 17 21:20:54   501  93561 WhatPulse     14   0 /Library/Preferences/SystemConfiguration/com.apple.airport.preferences.plist 
2017 Jun 17 21:20:54   501  93561 WhatPulse     -1   2 /Users/martijn/Library/Preferences/com.apple.eap.profiles.plist 
2017 Jun 17 21:20:54   501  93561 WhatPulse     28   0 /Library/Preferences/SystemConfiguration/preferences.plist 
2017 Jun 17 21:20:54   501  93561 WhatPulse     28   0 /System/Library/Frameworks/SystemConfiguration.framework/Resources/English.lproj/NetworkInterface.strings 
2017 Jun 17 21:20:54   501  93561 WhatPulse     28   0 /System/Library/Frameworks/SystemConfiguration.framework/Resources/English.lproj/NetworkInterface.strings 
2017 Jun 17 21:20:54   501  93561 WhatPulse     28   0 /System/Library/Frameworks/SystemConfiguration.framework/Resources/English.lproj/NetworkInterface.strings 
2017 Jun 17 21:20:54   501  93561 WhatPulse     28   0 /System/Library/Frameworks/SystemConfiguration.framework/Resources/English.lproj/NetworkInterface.strings 
2017 Jun 17 21:20:54   501  93561 WhatPulse     28   0 /System/Library/Frameworks/SystemConfiguration.framework/Resources/English.lproj/NetworkInterface.strings 
^C

Is it me?

WhatPulse counts and graphs network traffic, so it has a pcap integration to get a copy of the headers of each network packet to analyse. Due to this feature, I naturally thought the latency spikes were related to my programming. After removing all modules of the application and basically leaving a GUI without any data services attached to it, the issue was still happening. So, removed more code. And then some more. The spikes were still happening. After the code carnage, I decided to rule out the application itself and went looking on the OS. Thus far only Mac OS 10.12 users had reported issues and I didn’t have a Windows instance connected wifi-only, so had to rely on other peoples tests.

Posts on the Apple forums mentioned some issues with similar symptoms, so I submitted a bug report to Apple. No response of course.

After a while and some more code carnage, I got to the point where I removed the last QNetworkAccessManager class from being instantiated and something wonderful happened; the latency spikes stopped.

No, no it isn’t.

I could now focus the search on QNetworkAccessManager and these Qt bug reports reports turned up:

Cause

It appears as the culprit is here: qtbase / src / network / bearer / qnetworkconfigmanager_p.cpp

void QNetworkConfigurationManagerPrivate::startPolling()
{
...
  int interval = qEnvironmentVariableIntValue("QT_BEARER_POLL_TIMEOUT", &ok);
  if (!ok)
    interval = 10000;//default 10 seconds
  pollTimer->setInterval(interval);
  pollTimer->setSingleShot(true);
  connect(pollTimer, SIGNAL(timeout()), this, SLOT(pollEngines()));
...
}

This looks simple and harmless enough; just check for configuration changes on network connectivity. It’s supposed to be able to carry over network connections create in Qt to new connections if the connectivity changes. But here’s the thing; it basically rescans and reconnects your network connectivity when you’re on wifi. Hence the latency spikes, it’s a reconnection not bad enough to drop network packets, but bad enough to cause a latency spike and affect real-time network applications (like games).

Work around

The Qt bugs are still open (oldest one is from July 2014, so it’s probably never going to be fixed?!?), so there’s no permanent fix. I also didn’t have the motivation or time to dive into the Qt code itself to look for a fix. BUT, we can work around it. The timer that triggers the reconnects looks for the environment variable QT_BEARER_POLL_TIMEOUT as its timer interval. So we can set that environment variable to fix the latency spikes. I simply put this in my main():

// When QNetworkAccessManager is instantiated it regularly starts polling 
// all network interfaces to see if anything changes and if so, what. This 
// creates a latency spike every 10 seconds on Mac OS 10.12+ and Windows 7 >=
// when on a wifi connection. 
// So here we disable it for lack of better measure.
// This will also cause this message: QObject::startTimer: Timers cannot 
// have negative intervals
// For more info see:
// - https://bugreports.qt.io/browse/QTBUG-40332
// - https://bugreports.qt.io/browse/QTBUG-46015
qputenv("QT_BEARER_POLL_TIMEOUT", QByteArray::number(-1));

After doing so, the ping output looks much better:

I hope this saves someone else the time I’ve put in to troubleshoot this.



Share the wealth!

4 Comments

  1. Frank Fleming

    July 14, 2017 at 23:00

    Bless you.

  2. Life saver, spend hours trying to work out why this random network freeze was happening!
    Thanks!

  3. Considering how many applications use Qt, this is quite a major bug. Well done on digging into its cause and releasing a workaround. Appreciated.

  4. Funny, it’s nearly 6 years on this blog post. It gave an environment variable to use to fix Medibang Paint which uses Qt. I was getting massive latency on a Win11 gaming PC during voice chat. Ping slowed down every 10 seconds from the PC, but other PC’s on the network ping’ing into the gaming PC showed lost packets, around 1-2seconds of lost packets every 10s.

    Wireshark didn’t show much on either end. On the remote I would see the gaming PC ask for arp mac addresses to respond to the ping echo “who has .2, tell me?” and then ping worked again until the 10s mark.

    I knew the hangs didn’t happen every day, and the wireshark traffic did show some odd connection timeouts to some port :443s. On a whim I shutdown all background appstores, updaters, etc., then I noticed my paint program (which shouldn’t use the network!). Once that was killed the problem went away.

    After that once google led to another and I found this documented temp fix.

    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2024 Lostdomain

Theme by Anders NorénUp ↑