How 7-Zip, Hyper-V, and DNS Paralyzed A VOIP Phone System

Today was a tour-de-force in unintended consequences. It started with an old coworker, as a kind of boomerang. They came to work for us, then they moved on, only to come back. That was the premise of this story, the start of it, a coworker boomerang.

The task was really straightforward. De-compress the previously compressed user files related to this particular coworker, so that when they login, they see exactly what they left behind. It was modest, about 36GB worth of data. Looking at everything, the intended target had 365GB of open space, so plenty of room for this. I started with 7-Zip on Windows, opened the archive and extracted it to the drive with all the space. Near the end of the transaction, 7-Zip threw an error, “Out of Disk Space.” and I frowned and scratched my head. 365GB open space, and… this? Turns out, 7-Zip on Windows, at least this copy of it, unpacks the archive to a temporary folder on the temporary resource that Windows assigns, by default this ends up on C: drive. The process was filling an already low-on-capacity primary OS drive. I chased the temporary folder and removed it, correcting the issue. Or so I had thought.

An hour later, out of the apparent blue, around 12:30pm today, all the VOIP desk phones suddenly went “NO SERVICE”. I scrambled, naturally, feeling that rising panic as nothing had changed, there were no alarms, just suddenly total phone failure. I called the VOIP support line, and the official line from support was to reboot my network. A stack of eight fully packed Cisco Catalyst switches, three servers, and a gaggle of networking gear designed to offer at least a dozen vital services – reboot all of that. While talking with support, I opened up a console to my Linux box running on Hyper-V on one of my servers, which is to say, plugged into the very network core itself that I was asked to reboot. I then found my out-of-service desk phone, it’s IP was fine, it was totally functional, I grabbed the SIP password, logged into the phone, went to where it lists the VOIP endpoint for our phone carrier, and then asked mtr to show me the packet flow across the network, from my humble little wooden box of an office to the VOIP endpoint. The utility was clear, it was fine. No issues. 500 and counting packets all arriving promptly, no flaws, no errors, and NO SERVICE.

So I was growing more vexed with support, really unwilling to reboot the entirety of my network core when mtr was just merrily popping packets directly to the correct VOIP endpoint deep inside the carriers network. My traffic could get to where it had to go, the phones were NO SERVICE still. Support was flat-footed. I stopped myself, because I could feel the rage build, my old companion, the anger that comes when people aren’t listening to what I am trying to tell them. I stopped. It was not going anywhere and I promised myself that I would fight this anger, tooth and claw to the best of my ability. So I simply calmly asked for the ticket number on their side, and thanked them for their time and hung up my cell phone. I obviously muttered some choice phrases in a small voice, but otherwise I was very proud of myself. I derailed what could have become a very ugly scene.

Everything works. I am not going to reboot the core. The phones simply say NO SERVICE. Then other reports rolled in, network faults, adjacent but not the same, Wifi failures in Houston Texas, hmmm. What does Wifi out in Houston have to do with dud phones in Kalamazoo?

I had this sinking feeling, my gut screamed at me, something about the PDC, Wifi, and the Phones were all touching something common that had failed, but had failed silently. I chuckled to myself, the old IT chestnut occurred to me, “It’s always DNS.” and so, in respect to that, I opened the Hyper-V management window on the PDC and looked for my twin OpenDNS Resolvers, they are VM’s that run quietly, flawlessly, for years on years without a peep deep within Hyper-V. There it was, right there, right in front of me. The two resolver VM’s and just to the right of their names, the quaint little status indicator from Hyper-V. “PAUSED.”

The moment I saw that, I yelled out “PAUSED” and “NO SERVICE” and screamed. Right click on both VM’s, click Resume, and Hyper-V gleefully, in a heartbeat, resumed both little VM’s and just like that, another reboot to the VOIP phone and bleep-bloop-blunk, the phone was functional and just fine.

It is always DNS. I have three resolvers, the two resolvers were on the same host and the host had a wee panic and Hyper-V silently just paused everything, and then after a short while of cooking, the phones and Wifi, which also uses those resolvers, all went kaput all in one happy bunch.

Obviously the answer is to round-robin the resolvers, the primary on the PDC, then one resolver running in VMWare nearby, and then the secondary on the PDC. A sandwich right down the middle. I both thanked my past self and kicked my past self, for having the wits to set up a third resolver, which was then for a short while, the only resolver there was, except for choice parts of my network.

So, it ended happily, alls well that ends well. The next step is to spread this round-robin resolver correction throughout my network, to help avoid this from ever happening again. But then I laughed as I considered the gamut of what had transpired. 7-Zip, well meaning and purely accidentally caused an unintended disk space alert, Hyper-V silently and studiously paused its charges, and the network kind of rolled on over the speed-bumps, and at the end, proved again, “It’s always DNS.”

Dodgy Clouds

The recent outage in the Google Cloud infrastructure has certainly revealed a fair amount of vulnerability in their cloud offerings. So many services were affected, and I heard some tales of Nest owners who couldn’t unlock their homes or control their HVAC systems because the system couldn’t function without the other side being up and running.

This has always worried me about cloud infrastructure and beyond that, into IoT designs. We have come to depend on much of this kind of technology recently, and it can be tough for those that understand how all this works to let things like HVAC controls and door lock security go off to be managed by a company without any sort of manual override.

Google Chrome and Ads

It isn’t the first time that Google has turned on us, they used to have as a company motto, “Don’t be Evil,” but then when they ran into a profit wall, they realized that they had to accept evil into their company to make more money. So now, Google is Evil. Recently, the details came to light in regards to how Google will be changing Google Chrome. They are going to disable a programming API that enables some ad-blocking software to function correctly. Honestly, I was expecting this sort of thing long ago. It was the perfect reason to look into moving ad-blocking away from the computer level and further into the network itself. At work, I use Cisco Umbrella, and that places a filter on DNS services. When I was playing around with Raspberry Pi computers a long while back, there was another GitHub project that caught my attention, and that was Pi-Hole.

Pi-Hole

The GitHub project, Pi-Hole is a very straightforward installation that provides DNS filtering for malware and adware based on community-developed blocklists. I originally used it on my Raspberry Pi until I discovered that the Pi wasn’t really all that reliable a platform. Since then I have installed Debian Linux on my original Mac Mini, and that machine, which also serves as a central entertainment hub for my household also provides Pi-Hole services. I have set my home router to refer to the Pi-Hole for it’s upstream DNS requests, so every device attached to my home network funnels all the DNS traffic through the Pi-Hole. In that installation, with all the DNS requests sent to the Pi-Hole, it has liberated my Google Chrome, and any other browser, on my computer, iPhone, iPad, or whatever without any settings to change or fuss around with. To that end, thank you, Google, for giving me the push to help eliminate ads throughout my home.

Sirius/XM Outages

In line with what happened when the Google Cloud malfunctioned, there was another event earlier today that posed a challenge for me, IT wise. I was driving into work and I often times listen to XM’s Channel 33, which is First Wave. I was enjoying all of that music, and the announcer mentioned the channel schedule. That reminded me that I have the XM app on my iPhone and I could stream the XM signal into my workplace just as easily as I can stream Spotify music. So then I tried to use the app and ran into Error 1025. What the hell is that? I eventually got into a chat with a Sirius/XM representative, and they told me that there were system level issues at Sirius/XM that was giving everyone challenges. I have to remind myself frequently that my first stop should be DownDetector.com! I browsed to that site while I was on the chat with the XM representative and there it was, Sirius/XM, with a huge complaint spike. I should have started there! Lesson learned!

The way of things, for cloud infrastructure and all these interconnected devices, will not go away anytime soon. While the settings that you have on your phone and computer might also be causing issues with connectivity, it’s important to always keep in mind that sometimes the biggest systems can also be more fragile. It’s important to keep sites like DownDetector in mind because if you are having a problem with a website, chances are so are a whole lot of other people.

I know why the caged bird is stark-raving insane…

Many moons ago I found an online web hosting company called Hosting4Less.com. They had good service and I established a domain with them on behalf of one of my family members. Everything was going swimmingly until a dust-up started me looking for other web hosting providers. The web hosting market is jammed packed with competitors. The people at Hosting4Less can’t compete with the service I found, called iPage. Moving this domain however was less than easy.

The domain was managed by a bulk-domain registrar “OpenSRS” something or other. In order to get the domain transferred to a new domain registrar I needed a password, a Domain Transfer Password. It took me 2 weeks to wheedle this sucker out from the previous domain registrar and then email it to everyone trying to help me. The domain transfer failed 3 times, and on the 4th it was half-way there, some sort of mutant half-life – living between domain registrars. After asking for help a 4th time the fine people at iPage did get it resolved for me, but the domain was evidently “Locked”, so I had to get a username and password, log into manage.opensrs.net and unlock the domain and change the domains nameservers.

Each step is predicated on a drug-addled pharmacy structure – we’ll get around to it either in 1 hour, or 7 days depending on how much crack we have to smoke. There is no rhyme or reason, I think they put these obnoxious time estimates down to avoid people from going completely apeshit when one change can take a week for someone to pay attention to.

The Domain Naming System is secure, I have no doubt about that. How is this security vouchsafed? It’s soaked in various username/password combinations (160 bits of security on that password!) but most of all it is a bureaucratic abyss. You stare into it and it stares right back into you, alternately claiming and then blasting your soul into teeny tiny little shreds. I can’t imagine anything being constructed this way. It is as if they placed a scapegoat at the village doors, hung the word efficiency on it’s neck and left it to wander off during a blizzard with a million ravenous wolves running around. It’s designed to be obtuse, the road is not so much a road as it is little strands of pavement showing you where the potholes are, as they are the majority of ‘road’ and each one is big enough to grab a tire and pop off an axle! It’s as if a paranoid schizophrenic was given the keys to the kingdom and let to go on a security rampage. Nothing about this makes any sense to me, so I must have faith that each wave of my feather-and-chicken-bone dreamcatcher gets me all that closer to my target, which is to have domain.com point to IP-Address-That’s-Right.

Nobody should worry about terrorists subverting the DNS system, even with virgins promised, nothing can be proper compensation for this bureaucratic nightmare! Damn!