Having spent the last 9 years in industry, it’s been a refresher diving back into academic waters, not unlike taking a cold shower on a blistering hot day. It shocks you a bit at first, but once acclimated you feel so much better. One of the primary reasons for my return to the ivory internet tower is to expand the horizon of knowledge and understanding, both my own and of the systems/networking field, in large part motivated by my experience in industry. Innovation and expansive thinking are often scant luxuries, indulged in only when the rare respite between development cycles arise. Over the last decade or so, I’ve strolled the halls of a telco giant in its waning glory, battled through the trenches of an internet media dot com, and sunk deep roots in the realm of network monitoring, planning, and OSS. From these varied yet interrelated threads, I’ve come to appreciate the need for structured yet organic software engineering. Lessons learned at all levels need to permeate the development chain, lest we hemorrhage time and effort spinning our wheels in the same mud pits of system support, feature accretion, and redesign. The overall take away is that we need to work smarter, not necessarily harder, with efficient processes that facilitate rather than impede creativity and diligent design.
Moving from the confines of a telco giant, with its rather suffocating approach to the software development process, to an internet media startup was a contrast in extremes. On one hand, there was a strict indoctrination of process and well-defined, often limited responsibility, while on the other it was the great westward frontier with unbounded avenues of exploration and the unremitting burden of responsibility for survival. From the moment I set foot among the cubicles of the megacorp, I was bombarded with process. Training sessions brought me up to speed on CMM, while endless manuals and documentation preached an endless stream of SOP and industry jargon. Once I was actually coding, work would come in starts and fits, as we had meeting upon meeting, sometimes meetings about meetings, with every tiny sliver of work scrutinized, every simple MR debated until productivity came to a standstill. We were all trapped in the bowels of a giant, inefficient network with far too many layers of abstraction (middle management) in the stack. Like ATM, our standards were over-argued and our inflexible MTU (work division) dominated by packet overhead.
Transitioning to the wide-open plains of a startup turned everything on end. Zero process and 110% workload and responsibility meant long days in the co-lo, fiddling with firewalls and DNS, wiring load balancers, and setting up servers, on top of building web applications. Freed from the shackles of bureaucracy, productivity soared, just not exactly in the right areas. We became jacks of all trades, yet masters of none. It was the ultimate chart-your-own-course adventure, but as with any uncharted territory, it was fraught with the perils of survival. Creativity soon drowned in the unrelenting torrent of technical administrivia while funding issues threatened to capsize the ship. The realities of an unsustainable business model and premature broadband permeation finally waylaid our crew and left us adrift amidst the flotsam of shipwreck.
Somewhere inbetween the polar extremes of megacorp and struggling startup, was my next extended stop in network simulation and monitoring at a small company with stable revenue, quick to adapt and fluid in execution. There were meetings, but thankfully few. Each developer had free reign to design and code as they pleased, yet were spared the burden of system administration. The whole machine seemed to hum with the smooth resonance of productivity as client requests came in and features were churned out on rapid development cycles. Yet, there was still something amiss in this idyllic setting. It’s one thing to build a powerful, but sometimes buggy tool for planners, but it’s a whole other ballgame when you step onto the OSS field. Systems in the NOC need to run 24/7 with 5 nines reliability. Our flexibility was starting to become an Achilles heel, pointing to the need for a better balance between process and “productivity”. Otherwise, developers would soon drown in the sea of support, unable to cope with the huge discrepancy in scale between deployment and evaluation.
Optimally, there needs to be a zen-like equilibrium between the yin-yang ebb and flow of process and organic creativity. Too much emphasis on process degrades performance and efficiency, with its excessive overhead of evaluating every “process gate” and modification. Rigid development paths preclude new ventures and stifles the creativity of able talent. Even if developers have the time, the ideas, and the ability, the approval process and business case justification for every side project simply kills innovation dead in its tracks. Google’s 1/5 policy seems to address this well. Yet, without process, and the resources required for each stage, roughly: system requirements, design, development, testing, QA, and OAM support, the team will crumble under the sheer weight of customer demands and SLA‘s. Support teams at all levels, from sys admins, to testers/QA, to the multi-tiered customer-facing technical support are necessary and worthwhile investments to improve product quality and ensure customer satisfaction, while insulating the core development team from orthogonal workloads. While difficult to realize for small companies, given resource constraints, without this support base, the company and its products will have difficulty scaling.
For developers, the onus is to treat system architecture, as well as reliability and robustness, as first class citizens, and to not just focus on features. Refactor and redesign from the ground up when appropriate to adapt to new enabling technologies rather than just adorn a brittle infrastructure with incremental functionality. Glean from the research literature and leverage available frameworks and packages. Yet, don’t assume they always work 100% as advertised. Scour the forums and evaluate the system code and performance. Apply distributed systems design principles to ensure consistency and availability. As always, unit test, regression test, and system test, and then test some more – if you have the time. Of course, having the right, automated testing tools helps as well. If you can’t find them, build them, again if you have the time, which brings us to the next point.
Perhaps most importantly, everything starts with vision and direction to guide the evolution of process and product. With an overarching goal in mind, daily decisions can be made with the proper level of discernment and discretion. Does this direction align with our goal and values, does it take us one step further and improve our company and its offerings? Having the flexibility to respond quickly to a changing market and customer demands is essential. However, decisions cannot rest solely on reactive tendencies. To truly innovate and produce discontinuous change, requires proactive, often difficult decisions at every level, balancing between the need to explore new territory or safely exploit existing expertise. Crafting new architectures, adopting new development paradigms, and inventing new products that the customer has yet to realize it needs often means giving developers the leeway to tinker and dream, diverting resources away from cash cows, and just saying no to some distracting, low-hanging fruit. People need the time and space to incubate ideas, prototype solutions, and build the necessary tools for support in an organic fashion. For change to occur, this mentality has to pervade the organization from top to bottom.
Now that I’m back on the graduate path, hopefully these opportunities will arise in interesting and unforeseen ways, organically and collegially. Certainly, the mentality is there with a seemingly boundless license to explore. While its taken some time to catch up on recent research and to adjust to the pace of learning in the academic setting, being able to think long and hard about interesting problems again has been a jolt and a revelation, and I’m glad to be back.