Rejected automation?

Oliver Lowe@lemmy.sdf.org · 1 year ago

Rejected automation?

thisisnotgoingwell@programming.dev · edit-2 1 year ago

I work in networking, a job that traditionally has been managed by a terminal and vendor-specific syntax. I used to hate the thought of automation when I was younger, why would something as important as networking be automated? I’ve made my career on being the clutch guy, troubleshooting complex problems, I love the art of understanding every cog in the machine and being able to visualize it. Then I started learning Python, and learning it was extremely difficult for me. It felt like an eternity between the time I poured myself into learning Python until the time I could actually make things people would want to use.

I was a supervisor working in a NOC. A NOC that had many beaurocratic requirements which got in the way of break/fix operational support, such as having to manually write an email to every customer that had an alarm, and calling every point of contact for that customer, as well as notifying the field techs of outages in their areas, and managing real operational issues. So many times I had to let real work slip through my hands because there were so many calls, so many cases, so many things to do.

Like most NOCs, we viewed alarms from SNMP. When something failed to ping, it would generate a loss of comms alarm. I had this idea to automatically notify the field tech for the specified area when a customer site was downed for more than 30 minutes, and that was a very complex thing to do, it required that I clean a lot of data… I spent days converting things like date strings into proper formatting. Once I presented it, I was told that we couldn’t do this, because some political agreement made it to where the NOC was required to provide “positive contact” to other groups. I wrote it, tested it as proof of concept, specd out costs(MRC for the API I was using to send text messages was extremely cheap, it would cost the company about 6 dollars per month). Just like that, it was dead.

My director then wanted me to do something similar for our phone systems. Since our queue depended on user agent availability(your presence status), my boss wanted me to write a program to notify him if someone was unavailable for too long and the reason why. Yes, he wanted to know if someone took more than a few minutes to take a shit or get coffee.

That’s when I learned, boomers only care about micro management, not efficiency.

Oliver Lowe@lemmy.sdf.org · 1 year ago

Thanks for sharing. I did a bit of work for a NOC and know exactly what you mean about letting real work slip through your hands. I wasn’t directly responsible for managing the alarms, but it felt strange to be writing software streamlining the workflow. All the time I spent I felt like I could have just helped the technicians actually solving problems they faced in their day to day - to stop the alarms going off in the first place!

thisisnotgoingwell@programming.dev · 1 year ago

To be fair, most of the work that you have to do in a NOC is total bullshit. About 30% of the time you will be working on technical issues, and for most other people in the NOC, that would mean escalating the technical issues to me. Unfortunately, I had to earn the stripes, which means I had to work harder than everyone else, which meant doing their work as well as handling all escalations. Eventually, I was promoted to a supervisor for my efforts, but I did not want to be in a managerial role.

The real bulk of NOC work that is tiresome is the amount of alarms that are unnecessary. Managing SNMP is a nightmare, and configuring it properly involves a deep level of engineering knowledge. You can either tune the alarm board to only show certain alarms(which means parsing through many alarms to find out what is necessary and what isn’t), or you make sure that devices that are onboarded are configured locally for what SNMP traps they will alert for. Typically, the devices’ SNMP settings are not configured, so all alarms get sent to the SNMP server, and the SNMP server was never tuned to know which alarms it should show or it shouldn’t, so there are alarms which don’t really “mean anything” and alarms that “could potentially mean something if it’s correlated with this other alarm,” but most of the work is sifting through so much shit, to then have to troubleshoot a network issue for a network that was never documented in the first place.

Rejected automation?

Rejected automation?

Why in-car computer systems are so bad - Part 1