By Bill Higgins.
I work in the part of IBM Software Group that builds Service Management products. If you have no idea what “Service Management” is, it’s just software that manages other software and systems. Think monitoring, deployment, mobile device management, etc. And not just IT. As more and more physical devices become smart and become connected to the Internet, we build software that also manages things.
I wanted to share my experience of what can happen when a large organization adopts the culture, processes and tools of a web startup. In our case, my very small IBM development team became the catalyst for driving a significant and positive change in a multi-billion dollar software business in five months. To be clear, it wasn’t easy – changing how we functioned led to frequent friction and conflict – but in the end it worked, and we made a difference.
In mid-2013, the IBM Service Management business and development leaders decided to make a big bet on moving our software to the cloud. Traditionally we have sold “on premises” software products. These are software products that a customer buys, downloads and installs on their own equipment, in their own data centers and facilities. Although we love the on premises business, we realized that cloud delivery of software is also a great option, and as our customers evolved to a hybrid on premises / cloud future, we needed to be there to help them.
We decided that initially we would make three capabilities – Performance Management, Help Desk and Workload Automation – available on the cloud. We also decided that we wanted a uniform and delightful learn, explore, try and buy experience for each of the capabilities. This implied we needed a common Internet-facing web property. This idea eventually became IBM Service Engage.
There was only one problem: our imagined web front-end didn’t exist, and we wanted to announce the offerings at our IBM Pulse 2014 conference in February 2014. This is where my team comes into the story.
Building a close-knit team
I was fortunate to join a great team when I was a young developer. In the mid-2000s, IBM reassigned many of the developers who had built the Eclipse platform to build a new platform for team-based tools. This became the IBM Rational Jazz platform and resulted in products like IBM Rational Team Concert. I joined this team in the early days and was able to work alongside brilliant developers, designers and architects, all masters of agile development.
In the early 2010s, I left Rational and joined the Tivoli brand, which is responsible for Service Management products. In some ways Rational provides “dev stuff” and Tivoli provides “ops stuff.” This led me to learn about and embrace the DevOps movement.
When I took a lead development role on the new SmartCloud Orchestrator (SCO) product, I recruited a small team of passionate and smart developers who, like me, worked for IBM in Raleigh. I spent a lot of energy on this recruitment because, as I had seen from my Jazz experience and by reading about high-performance software development teams, I believed that a small, gelled group of passionate and smart developers could significantly outperform a large, distributed project team. And long-term, these top developers would recruit other top developers, resulting in a virtuous circle.
Our team was passionate about software development; we all embraced the ideas of DevOps culture and supporting practices like continuous deployment. In our work on SmartCloud Orchestrator, we built a build system that could take a new level of SCO code and deploy all of its components – including OpenStack -– on a set of hardware and then run automated tests against it. This automation was critical to helping us ship our first release of SCO in mid-2013.
In 2013, my wife and I welcomed our third child to the world, but the day of her birth also brought an exciting new project at work. My beeping iPhone indicated it was the CTO of the Service Management division, Dave Lindquist. He had sent me a cryptic but intriguing text message: “We’re starting a new project and we want you to lead it. You just need to build the team.” I replied, “Sounds very interesting. Already have the team. About to have a baby. Will ping you in a few weeks!”
After I returned from paternity leave, Dave was one of the first people I contacted. He told me about our strategic shift to make cloud-based offerings a major focus, and told me about the idea of the new web front-end which became known as Service Engage. He shared his idea that to have the results of a successful web startup, we should structure the team like a web startup.
We hadn’t talked in a while so I explained how I had quietly recruited an A-Team for the work on SmartCloud Orchestrator and about our progress on modern practices like continuous deployment. Over the next couple of months, Dave and the senior leadership team shuffled things around so that we could transition our SCO work to some new developers and we could begin work on Service Engage. Dave and his boss Chris O’Connor, the VP of Strategy & Engineering for our division, worked with the real estate team to give us a shared space in which we could physically work like a web startup.
Creating an app with less than five months to go
We moved into the lab and wrote our first lines of code in mid-September 2013. Interestingly, we spent the first two weeks building a new continuous deployment (CD) system that incorporated all the good stuff from our previous CD system, chucked the stuff that didn’t work, and added support for a new deployment target: IBM Bluemix, then in an internal alpha state.
Since our team was full of passionate programmers, they had experimented with modern platform-as-a-service systems like Heroku. We heard from people like Dave that IBM was investing major resources in our own, at that time unannounced, PaaS based on Cloud Foundry, codenamed Bluemix, spearheaded by some of our old friends like Danny Sabbah who had been the General Manager of Rational when we built Jazz.
We made a bet that even though Bluemix was early, it would mature quickly enough that we could make it the foundation for the IBM Service Engage front-end. So we built a continuous deployment system that could do fully-automated zero-downtime deploys to Bluemix, writing a huge amount of test automation to ensure that the app only improved and never regressed.
One other safeguard we put in place was a mandatory code review system. We learned about this from our work with OpenStack and having read about continuous deployment at places like Facebook. We saw several benefits to requiring code reviews:
- An extra set of eyes to spot any non-obvious problems that are hard to catch with test automation prior to pushing to production
- Positive peer pressure to align to team standards, e.g. our 100% automated unit test coverage goal
- Improved situational awareness for code moving to production, in case an unexpected problem occurred
- Improved tribal knowledge of the evolving code base
We also set a goal for the team that everyone should be full-stack developers, i.e. able to make high-quality contributions to any part of the system – backend web code, front-end web code, model and persistence logic, authentication, etc. We enabled this in a simple way: If you were weak on any one part of the system, you got the next task related to modifying that part of the system. We used supplementary practices like pair-programming to more quickly diffuse knowledge.
Because we had a short time period – only five months to the Pulse conference – we had to be selective about what we worked on. We used the “minimum viable product” concept that we had learned from a team book club reading of Eric Ries’, “The Lean Startup,” to help define the simplest possible system that would help us meet our business goals for IBM Pulse.
I won’t lie, this was hard to intuit since our definition of “minimum” was often much less than other peoples’ idea of “minimum”. But we were resolute that we had to ship something of extremely high quality, so we simply had to carve it down to the basics. This was a frequent source of tension and conflict, but we stuck to our guns.
Finally, I’d like to talk about two steps we’ve added since the initial launch.
Shortly before Pulse, we started adding instrumentation to the site based on IBM Digital Analytics to help us understand how users respond to the user experience we design. Before, our design decisions were driven entirely by intuition. While intuition by experienced and talented people is essential, we now use empirical observations of user behavior to help us understand what we get right and what we need to change.
The second addition now that we’re live on the Internet, we implement new features not via a staging site but with techniques that we’ve learned from our friends at Etsy like ramp-ups and dark launches. Now, if we’re working on a new version of the ‘request free trial’ experience, we will deliver an early version to the production site, limit access via various means, and gain confidence in the technical quality of the code and the new user experience via monitoring and instrumentation.
Big difference in a short time
In many ways the IBM Service Engage web app is just another nice-looking web app in a world that has tens of thousands of other nice-looking web apps. But it was a special project for us. We proved to ourselves that a small team could operate like a web startup inside a huge company, and make a big difference in a short amount of time.
Although I authored this blog entry, I’d like to make it clear that I contributed at most 10% of the ideas I’ve talked about here. The set of good ideas were contributed by the talented and passionate team members. Thank you team for what you have taught me.
If you’d like to check out Service Engage, it’s here: https://www.ibmserviceengage.com. I hope you like it.