New Year Resolutions

This new year I resolved to be done with Twitter, Facebook, and Reddit. I had abandoned Twitter a long time ago, Reddit was easy as I was never really invested in that platform anyways, and then most recently leaving Facebook behind.

It needs a little characterization. I haven’t deleted my Facebook account, but what I have done is ceased to engage on that platform. I still do check in once a week just to mop up my timeline recommendations from people putting my name on their posts and otherwise just establishing a heartbeat there so that the people who are on the service and follow me notice that I still live. I suppose that eventually even bothering with the heartbeat updates will grow tiresome and I’ll give up on that as well.

I have instead moved my entire social networking existence to a new service called Imzy. It’s at imzy.com, and I encourage everyone to join me there. There are some pretty good AUP rules in place and communities can also have extended rules, building off the core AUP of the site itself. Imzy is a perfect place to have real discussions with people online. There is a culture in Imzy which I haven’t found anywhere else. It’s this lack of trolling that I witnessed and it’s what led me to dump Facebook.

I don’t know what this means for this blog. Imzy is a great platform all on its own, and when it comes to blogging, my user community has a lot of features that my blog can’t meet. The sense of community I think is what is missing from a lot of services, and my blog. This service is mostly just a billboard for me to yell at the darkness. There aren’t any real conversations going on here, unlike in Imzy.

I figure if I don’t post more blog entries I may just archive all of this stuff and shutter the service completely. Then again, I may just be lazy and let this blog long-tail it to eternity. Only time will tell.

Assert The Win

Sometimes it’s the best thing to assert you win and walk away from a toxic problem. So far today I’ve done that quite a bit. What have I abandoned?

I’ve walked away from Facebook. It’s been four days since I even logged into Facebook and since then I haven’t missed it. I’ve been catching up on my news; the Spiceworks Community board consumes a lot of time. Then after that, I turned my attention to my Pocket list. There just isn’t enough time anymore to deal with Facebook. When I logged into it, I had eighteen notifications, and I frowned and realized that I didn’t care that much. I’m writing a lot of my thoughts into my journal after coming to the realization that sharing with others isn’t going to be a positive experience. Now nearly everything on Facebook is an unpleasant experience. So, abandoning toxic things seems to be a good thing for me.

Another toxic system is Office365. Microsoft and I go back for a long while, right along with my almost palpable hate for that company and their products. Going into just how Office365 lets me down is very dull. Nearly every interaction has me wishing I could just close my laptop, put it in my backpack and run away from my life. Everything that has some Microsoft technology associated with it has me frowning in deep disappointment. Alas, there is no way to escape the Great Beast of Redmond, so we gnash our teeth and endure the horrors.

The final horror is WordPress itself. I use a stock theme, Twenty-Twelve. It’s not a custom theme. It’s not slick or responsive. It’s just a dumb theme. So while reading my blog, I realized just how much I wanted to change the line-spacing for my post entries. This is where my expectations fork, there is an Apple fork and an “Everything Else” fork. The Apple fork has been proven time and time again, that the answer is simple and shallow and easy to get to, understand what the change will do, and make it work. Then there is everything else. Here we have WordPress itself. I wanted to change the line-spacing on my theme. So I go to the Dashboard, and I spend ten minutes blindly stabbing at possible places where this option might be hiding to no effect. Then I do a Google search, which is the first and last place that most possible solutions are born and die. A good Google search almost always results in the answer you are after. So, “WordPress vertical line spacing” led to a place that eventually had the solution in it, but the theme didn’t match what I was expecting. This is the core of frustration, so I modified the search to include the themes name itself, and that helped. I found the setting, and it was in a CSS stylesheet file. I left the WWW when it was still HTML only. CSS irritates me. But anyways, hack CSS, that’s the answer. It’s a dumb answer, but that’s it. So I find about 130 places where line-height is an option. I laugh bitterly at the number. Which section to edit? Are you sure? So I gave it a shot. I set the line-height to 2.0 and then looked at my site. I can’t tell if it improved or not. But the most adaptive solution is to assert it did what I wanted. Mark the win as a notch and move on. Do I care? Well, I wanted to do something. I did something. Did it work? Probably not.

But then we get back to that first fork. That’s why I love Apple so much. Nearly everything they touch MAKES SENSE. I don’t have to struggle with some labyrinthine mystery. Maybe my edits will work, maybe they will break whatever it is, maybe it won’t matter. Maybe any setting I change will be overridden somewhere else, by something that was never documented. That’s the core design principle of both WordPress and Microsoft. I suppose we should just be happy that the most basic functions work. Much like the Internet itself, the fact that any of this works is a daily miracle.

So instead of writing a huge rant, one that nobody wants to read and nobody cares about I will assert that I won, psychologically move forward and be able to forget the conditions that led me to those particular experiences. The blog doesn’t work like you want? Don’t go there. Facebook a cesspool of ugly humanity? Skip it. Microsoft? Ah, if only it would burn to the ground. But we can’t have what we wish, even if we’d do anything for our dreams to come true.

So! Hooray! A Win! Facebook, WordPress, Office365! Just stop worrying about the bomb. It’s “Someone Else’s Problem®”

Google Schmoogle

Today is a day for Google to let me down. Generally, a lot of technology companies end up in the same dustbin. They always promise some glittering awesomeness, but when you start to engage with them, you discover that the awesome is rather half-baked. In this particular case, the first two Google technologies were their Music Play property and Android.

Google Music, or Google Play, whatever it’s called, has a lot of the music that I uploaded from my iTunes when I still had music files that I used to play on my iPod. My musical use has migrated to streaming technology, specifically Spotify for which I am very pleased with. I often times miss my old iPod with my music loaded on it. There was something about the shuffle feature on my old iPod Nano that fascinated me. The old shuffler felt almost psychic or at least sensitive to my environment and conditions. I think it is because the device had its RNG on-device and it was a wearable device. There is something still there I think, and I think back on it fondly. A lot of my music is on Google Music, and today I thought I might uncork some of it. I opened my Safari browser and discovered that Google Music doesn’t work without Adobe Flash. As a general rule of thumb, I don’t use Adobe products at all if I can help it, and that is especially true of Adobe Flash. There was a point in the past where you could have installed HTML 5 on the Google Music site, but Google has since eliminated that option as far as I can tell. So, strike one for Google.

The next strike came when I tried to use my Samsung Galaxy Nook device. This device is loaded with Google’s Android operating system, and I’ve railed against this before. In this particular case, it is related somewhat to the dead horse I keep on beating in regards to Google Android. I had my Nook open, and I was trying to use it. The interface is sluggish as hell, but I have grown to accept that. There is an app I have on my Nook, it’s called “Clean Master” and it’s designed to be a system maintainer for Android. From my experience, paired up with what I’ve seen claimed by “Clean Master” application is that Android is a wet hot mess. Every time I use the app, it finds 350MB or more of “Junk files”, and does scans for “Obsolete APKs.” This scan takes an exceptionally long time. So I’ve fallen down a rabbit hole with the device, trying to get it “cleaned up” because it’s “dirty”. This application is dutifully chugging away, apparently just circling around the same batch of directories for about ten minutes accomplishing nothing. I tap the big button at the bottom. “STOP”. Nothing happens. I then tap it a few more times. “STOP”. “STOP”. “STOP”. In the end it was a comedy, and I started to mumble “STAHP” to the device. At the top of the application is another control that says “Advanced Settings” thinking maybe I could turn the scan for “Obsolete APKs” off. Nope. Tap, nothing, tap, nothing. Tap tap tap tap tap tap. The device stops working altogether and then boop, new screen and it’s back to working! But the options there are useless. So then I try to use the “Home” button, and the Nook just dwells there, thinking. about. it. Then the Home switcher screen appears, and I make the throwaway gesture to get rid of “Clean Master” app. There is “nothing” running on the device, but it’s mostly just sluggish as hell.

So that is what informs my opinions about these companies. Google, Samsung, and Apple. I include Apple because I have a lot of Apple devices, and they don’t behave like this. Even with two giant corporations working together, Google and Samsung, they can’t even touch on what Apple does. My iPhone 6 behaves for me, mostly, and in comparison, it is far better than what Samsung and Google bring to the table. My chief issue is the disconnect between the hardware stats, the Samsung is supposed to have more resources than the Apple products, so it comes down to the OS? It may simply be a fight between iOS and Android in the end. To really focus on my issue, it is all about user interrupt. On my iPhone, the user interrupt, which is to say the events that the user wishes take top priority. The interface is “snappy” and “gets my wishes” and “performs”. Whereas in Android, the user input seems to be treated like a queued wishlist that the user inputs and waits for the device to act on if it wants to, or not. I know it’s not designed to behave this way, or at least it shouldn’t. But the behavior is what informs my opinions. I’ve got an Apple device that is snappy and responsive to me versus a Samsung/Android Nook that seems to want to do its own thing. There is another company represented, and that’s B&N. Mostly at this point I think of B&N as a bystander. They aren’t really involved anymore with Samsung or Android, they’re just marketing books through a channel, and they happened to choose this channel. For what the Samsung Galaxy tablet is, it’s core function that I use it for, which is an eBook reader, it is satisfactory. For a general use tablet or a mobile device capable of more than just eBooks, though? No. And I can’t understand why people who use Android accept this behavior so blindly. Perhaps that’s what being a fan is all about. If you are fond of the underdog, the scrappy alley fighter, then I suppose Android has some romance to it. You want the sad, somewhat over-concussed street-fighter who sometimes pisses himself and forgets his name to come out on top in the end and win the day.

So with these two starting experiences today, the answer is to lower your expectations. I expected too much of Google and of Samsung. The device is just a simple eBook reader, it really can’t be anything else. I will never willfully purchase another Android device, so there isn’t any reason to declare that Android is dead to me, it was dead on arrival after all. The only thing that I can say is that other people seem to enjoy it, and in the end that’s all that matters. After seeing what this Samsung Galaxy can do, I don’t understand the why behind Android’s success, but they are successful and in that, well, that’s good. It’s just not for me.

As for the music, I again lower my expectations. Instead of searching for some way to access my Google Music without Adobe Flash, I’m instead going to try an application that can help me migrate my music collection off to a Spotify playlist, maybe. In that, I have very little faith, and I’ll probably just give up and stop thinking about it altogether. I find myself not really fighting about technology anymore. I find that I’m more apt just to turn it off, put it in a drawer and forget about it for a few decades. If I were a technology company, I would really love to find out what kind of technologies people have put in their drawers and forgotten about, and find out why. That would create a great laundry list of things “not to do” when devising new technologies.

Sample Malware

Today I received a sample email that some of my coworkers caught. They asked me to look into it. The email link led to a bit.ly link, which I was able to extract and through a clever little trick, appending the bit.ly link with a + character doesn’t load the site that the bit.ly link goes to but tells you about the link. This link has been clicked on about 7000 times. Already I know we’re dealing with malware, so now it’s not a question of if it’s a rabbit hole, but rather, how deep does it go?

I pulled the bit.ly link contents out and handed it to curl on the terminal in my Macbook Pro. I don’t expect curl to do anything but show me the text of where this bit.ly link goes. It heads to a PHP file on a presumably hacked web-server or blog. The PHP itself is a HTTP refresh-redirect to a Dropbox hosted file. So I opened up my Virus Lab VM and followed where this led. The Dropbox content said it was a 1MB PDF file, but when I opened that, it led to a phishing attempt.

The phishing hack had an obnoxious URL attached to it, so I pulled that out and discovered it was encoded in base64 format. I decoded that text chunk online, and it revealed a Javascript script-block formed by a single call to document.write(unescape()) function.

Whoever it was, went to a long length to obfuscate their malware. Ultimately it led nowhere because we caught it. I find this sort of thing fascinating to pull apart, like an easy little puzzle to unravel. The phishing attempt is for email username and password, and if someone falls for that, then thanks to people being usually lazy with passwords, once you have one password, chances are you have all of them on every other site.

Just another reason to use a password manager and have individual passwords per individual sites. If one breaches, then the damage is limited to that one site, not all of them.

What Roy Batty Saw

We hired a new coworker and learned that he needed a Cisco VOIP phone. I had one spare unit left, an older Cisco 7912 unit. I went to go plug it in, and the POE over Ethernet simply wasn’t registering on the phone. I knew for a fact that the phone itself was fine, and the switch I was plugging the phone into was functioning well. I also knew that my station cables were working fine, so I used my Fluke LinkRunner to test the cables and the port. Everything checked out; the Fluke indicated proper POE, however, when I plugged the phone in, nothing at all.

I knew that this port had a history of being troublesome, but previously to this I had a Cisco 7940 phone working well in this spot, so it was a mystery as to why a 7912 wasn’t also working. I tested the port a few times, each time seeing proper POE voltage and wattage. Even the switch itself noticed my Fluke tester and was registering that a device was consuming POE supply on the port in question. I couldn’t understand why a phone that works well in one place doesn’t work in another when everything is equal. Obviously, not everything was as equal as I thought. Something had to be wrong.

I looked at the Fluke LinkRunner, it listed POE as coming in on pairs 1 and 2 for the positive circuit and 3 and 6 for the negative circuit. So then I took the Fluke to my testing lab and looked at POE coming from a Cisco Catalyst 3560 switch. The Fluke indicated that 3 and 6 were positive, and 1 and 2 were negative. I immediately figured out what the issue was. Ethernet jacks can conform to T568A or T568B, the difference is subtle and is a flipped pair of conductors. I did a little desk diving and popped the cover off the jack in the wall, everything that I deal with is always T568B. Always. The jack in the wall? T568A. So armed with what I knew, I tugged the old keystone jack out and replaced it with the last good one that I have. Punched it down, and tested it again. The Fluke indicated POE, 3-6-1-2, I plugged in the phone and pop! The phone came to life!

So, just when you think you can just get on with things, always check the standards. You always have to assume that nobody else is. What a mess. But at least it was an easy fix.

FreeBSD Crater

I started out looking at FreeBSD based on a draw from FreeNAS, which then led to ZFS, the primary file system that FreeNAS and FreeBSD use. At work, I am looking at the regular handling of enormous archival files and the further along I went the more I realized that I would also need storage for a long time. There are a lot of ways to ensure that archival files remain viable, error correcting codes, using the cloud, rotating media. So all of this has led me to learn more about ZFS.

I have to admit that at first, ZFS was very strange to me. I’m used to HFS and EXT3 and EXT4 type file systems with their usual vocabularies. You can mount it, unmount it, and check it with an option to repair it. ZFS adds a whole new universe of vocabulary to file systems. There are two parts, the zpool creates the definition of the devices and files you want to use for your file system, and the zfs command allows you to manipulate it, in terms of mounting and unmounting. When it comes to error-checking and repair, that is the feature called scrub. The commands themselves aren’t difficult to grasp but the nature of this new file system is very different. It enables the administrator to perform actions that other file systems just don’t have. You can create snapshots, manipulate them, and even draw older snapshots – even out of order – forward as clones. So let us say that you have a file system, and you’ve been making regular snapshots every 15 minutes. If you need something from that filesystem at snapshot 5 out of 30, you don’t have to roll back the file system manually; you can just pluck snapshot 5 and create a clone. The cloning procedure feels a lot like “mounting” a snapshot so you can access it directly. If you destroy a clone, the snapshot is undamaged, it just goes back into the pile from whence it came. The big claim to fame for ZFS is that it is regarded by many as the safest file system, if one of the parts of it, in the zpool should fail the file system can heal itself. You can tear out that bad part, put in a new part, and the file system will rebuild and recover. In a lot of ways, ZFS is a lot like RAID 1, 5, or 6. Apparently there is a flaw with RAID 5 when you get to big data volumes and from what I can gather, ZFS is the answer to those problems.

So I have ZFS ported over to my Macbook Pro, and I’ve been playing around with it for a little while. It works as advertised so I’ve been enjoying that. One of the biggest stumbling blocks I had to deal with was the concepts of zfs mounting, unmounting and how they relate to zpool’s export and import commands. I started with a fully functional ZFS file system, created the zpool, then mounted it to the operating system. Then the next step is to unmount the file system and export the zpool. Exploring the way you can fully disconnect a ZFS file system from a host machine and then reverse the process. While doing this, I was reticent on using actual physical devices, so I instead used blank files as members in my zpool. I was able to create, mount, and then unmount the entire production, and then export the zpool. When I looked over how to reverse that, import the zpool I just had the system told me that there weren’t any pools in existence to import. This had me thinking that ZFS was a crock. What is the point of exporting a zpool if there is no hope on importing it afterwards? It turns out, there is a switch, -d, which you have to use – and that’s the trick of it. So once I got that, I became much more comfortable using ZFS, or at least exploring it.

So then today I thought I would explore the source of FreeNAS, which is FreeBSD. BSD is a kind of Unix/Linux operating system, and so I thought I would download an installation image and try it out in my VirtualBox on my Macbook Pro. So, I started with the image FreeBSD-10.2-RELEASE-amd64-dvd1.iso and got VirtualBox up and running. The installation was very familiar and I didn’t run into any issues. I got the FreeBSD OS up and running and thought I should add the VirtualBox Guest Additions. I thought I could just have VirtualBox add the additions as an optical drive and that the OS would notice and mount it for me in /mnt or /media. No. So that was a no-go. I then looked online and searched for VirtualBox Guest Additions. I found references to procedures to follow in the “ports” section of the FreeBSD OS. I tried it, and it told me that it couldn’t proceed without the kernel sources. So then I searched for that. This turned into a fork/branch mess and I knew that familiar sinking feeling all too well. You try and fix something and that leads to a failure, so you look for help on Google and follow a fix, which leads to another failure, and then you keep on going. This branching/forking leads you on a day-wasting misadventure. The notion that you couldn’t get what you wanted from the start just sits there on your shoulder, reminding you that everything you do from this point forward is absurd. There is a lot of bullshit you are wading through, and the smart move would be to give up. You can’t give up because of the time investment, and you want to fight it out, to justify the waste of time. The battle with FreeBSD begins. At the start we need the kernel sources, okay, use svn. Not there, okay, how to fix that? Get svn. Sorry, can’t do it as a regular user. Try sudo, command doesn’t exist, look for su, nope, not that either. Try to fix that, can’t. Login as root and try, nope. So I pretty much just reached my limit on FreeBSD and gave up. I couldn’t get VirtualBox Additions added, svn is impossible to load, sudo is impossible to load. Fine. So then I thought about just screwing around with ZFS on FreeBSD, to rescue some semblance of usefulness out of this experience. No, you aren’t root, piss off. I even tried SSH, but you can’t get in as root and without sudo there is no point to go forward.

So, that’s that for FreeBSD. We’re up to version 10 here, but it is still firmly bullshit. There are people who are massively invested in BSD and they no doubt are grumpy when I call out their OS for its obnoxiousness. Is it ready for prime time use? Of course not. No kernel sources included, no svn, no sudo, no su, no X for that matter, but honestly, I wasn’t expecting X.

It points to the same issues that dog Linux. If you don’t accept the basic spot where you land post-install then you are either trapped with Google for a long while or you just give up.

My next task will be to shut down the FreeBSD system and dump all the files. At least I only wasted two hours of my life screwing around with the bullshit crater of FreeBSD. What have I learned? Quite a lot. BSD I’m sure is good, but to use it and support it?

Thank god it’s free. I got exactly what I paid for. Hah.

Surprise! Scan-to-Folder is broken!

That’s what we faced earlier this week in our Grand Rapids office. It was a mystery as to why all of a sudden a Canon iR-3235 copier would stop working when it came to its “Scan to Folder” function. For Canon, the “Scan to Folder” function opens a CIFS connection to wherever you tell it to go and deposits a scanned PDF file to the destination. Everything up to Monday was working well for us.

After Monday, it was broken. Thanks to a Google Form linked to a Google Spreadsheet I have a handy way to log changes I make to the network in a very convenient way. I open up the form, enter my name and the change, and the Google spreadsheet catches the timestamp automatically. So what changed on Monday? I was using Wireshark and found a flurry of broadcast traffic on using two protocols, LLMNR and NBNS. The first protocol, LLMNR is only useful for small ad-hoc networks that don’t have a standard DNS infrastructure, since we do have a fully-fleshed DNS system running, LLMNR is noisy and superfluous. NBNS is an old protocol, and turning it off system-wide is an accepted best-practice. So I turned off NBNS for all the workstations and turned NBNS off on the servers also. It’s 2016, what could need NBNS?

Then we discovered that our older Canon ir3235 copiers suddenly couldn’t save data to CIFS folders. We verified all the settings, and there was no reason the copiers couldn’t send data to the server, whatsoever, or so we thought. The error from the copier was #751, which was a vague error code and nothing we could find online pointed to error #751 being a protocol problem.

I can’t recommend instituting some change tracking system enough for any other IT shop. Having a log, and being able to pin down exactly what happened and when was invaluable to solving this problem. As it turns out, Canon copiers require NBNS, but not specifically that protocol. When you turn off NBNS on a server, that closes port TCP/139. The other port for CIFS traffic, TCP/445 is used by modern implementations of CIFS. These Canon copiers only use TCP/139. So when I turned off NBNS to tamp down the broadcast traffic, I accidentally made the server deaf to the copiers. Turn NBNS back on, re-open TCP/139, and that fixes these old Canon copiers.

Cisco Phone Outage Solved!

Since early November of 2015, I’ve been contending with the strangest behavior from my Cisco infrastructure. Only a few number of Cisco IP Phones appear to go out around 7:30 am and then pop back to life right at 8 am. The phones fail and fix all by themselves, a very strange turn of events.

My first thought was that it was power related, so I started to debug and play around with POE settings. That wasn’t the answer. I then moved some phones from the common switch to a different switch nearby and the problem went away for the moved phones. That right there proved to me that it wasn’t the phones, and it wasn’t the Cisco CallManager. It had something to do with the switch itself. So I purified the switch, moved everything that wasn’t a Cisco IP Phone off it and the problem continued.

I eventually got in touch with Cisco support, and they suggested a two-prong effort, set SPAN up on the switch and run a packet capture there, and set SPAN up on a phone and run a packet capture there as well. The capture on the switch showed a switch in distress, many ugly lines where TCP was struggling. The phone capture was immense, and I got it opened up in Wireshark and went to the start of the days phone failure event. The minute I got to the start, 7:33:00.3 am the first line appeared. It was an ICMPv6 “Multicast Listener Report” packet. One of many that filled the rest of the packet capture. Millions of packets, all the same.

The multicast packets could explain why I saw the same traffic curves on every active interface. When a switch encounters a multicast packet, every active port responds as if the packet was sent to that port. As it turns out, once I extracted the addresses of where all these offensive packets were coming from, sorted the list, and dropped the duplicates I ended up with a list of four computers. I poked around a little bit more on Google and discovered to my chagrin that there was a specific Intel Network Interface Controller, the I217-LM, which was uniquely centered in this particular network flood scenario. I looked at the affected machines, and all of them were the same, HP ProDesk 600 G1 DM’s. These tiny computers replaced a good portion of our oldest machines when I first started at Stafford-Smith and I never even gave them a second thought. Each of these systems had this very Intel NIC in them, with a driver from 2013. The fix is listed as updating the driver, and the problem goes away. So that’s exactly what I did on the workstations that were the source of the multicast packet storm.

I can’t believe that anyone would design a NIC like this, where there is a possibility of a multicast flood, which is the worst kind of flood I think. All it takes is a few computers to start flooding and it sets off a cascade reaction that drags the switch to the ground.

We will see in the days to come if this solved the issue or not. This has all the hallmarks of what was going on and has filled me with a nearly certain hope that I’ve finally overcome this three-month headache.

Sparks and Shorts

Today I had a chance to get to work earlier than usual and connected up all my equipment and logged into the switch that was causing all the grief with my phones. Everything looked good, and all the phones were up and behaving fine. I logged into the switch, and I had a thought, Cisco devices have impressive debug features. If power is an issue, why not debug power?

So I turned on the debug traps and adjusted the sensitivity for the log-shipper and turned on the inline power debug for both the events manager and the controller. My Syslog system started to flood with new debug logs from this switch. The phones continued to behave themselves, so I sat down and looked at the log output. There were notable sections where the switch was complaining about an IEEE short on a port. OMFG. AC power being sent down twisted-pair Ethernet, and we’ve got a short condition!? Why did Cisco never even look for short conditions? Upon further investigation, anything that was connected to a server, computer, or printer were all randomly shorting out. These shorts were causing the POE police system to scream debugs to the Syslog system. So I found all the ports that did not have Cisco IP Phones on them and were also not uplinks to the backbone switch and turned off their POE.

Now that all the POE is off for devices that would never need it, the debug list has gone silent for shorts. It is still sending out debugs, but mostly that is the POE system regularly talking back and forth to the Cisco IP Phones, and that output looks tame enough to ignore. I updated my Cisco TAC case, and now we will wait and see if the phones fail in the mornings. At least, there can’t be any more POE shorts in the system!

Incommunicado

Here at work, I’ve got peculiar sort of failure with my Cisco phones. In the mornings, sometimes, all the phones connected to two Cisco Catalyst 3560-X 48-port POE switches all fail around 7:35 am and then all un-fail, all by themselves around 7:50 am.

I’ve tried to engage with Cisco TAC over this issue. I started a ticket when we first started noticing it, in November 2015. Yesterday I got in touch with the first Cisco TAC Engineer and was told that it was a CallManager fault, not a switching fault and that the ticket on the switches would close.

So I opened a new ticket for CallManager. Once I was able to prove my Cisco entitlements I was underway with a new Cisco TAC Engineer. So we shall see how this goes. What concerns me is that Cisco told me that obviously it wasn’t the Catalyst switches. I am a little at odds with this determination because as part of the early diagnosis of this problem we had a small group of users who couldn’t endure phone failures at all, so I moved their connections from the Catalyst 3560 to the other switch, a Catalyst 3850. For the phones that failed on the 3560, they stopped failing when attached to the other switch. That shows me that the issue isn’t with the phones, or the CallManager, but rather the switches themselves. But now that TAC has ruled out the switches, we’re looking at phones and CallManager.

My experience with Cisco so far is checkered. Their hardware is very handsome and works generally. That’s as far as I can go under the auspices of “If you can’t say anything nice, don’t say anything at all.” because the rest of what I have to say is unpleasant to hear. Alas, they have a name and public credibility, and one checkered customer isn’t going to alter the path of a machine as large and determined as Cisco.

We’ll see what TAC has for me next. I am eagerly in suspense, and I’ll update this blog if we find the answer. Holding the breath is probably inadvisable.