Cloud | Q&A
Inside Apollo's Massive Learning Platform
As the company behind the University of Phoenix and numerous other educational holdings around the globe, Apollo Education Group knows something about scale. CT asked CIO Mike Sajor about the institution's new learning platform, how it supports hundreds of thousands of students, and the choice to host it in a private cloud.
- By Dian Schaffhauser
Few institutions of higher education have the size of the University of Phoenix with its 250,000 students and campuses in 37 states. But the educational holdings of Apollo Education Group don't end there. There's also Western International University with a more modest 4,700 students; colleges and universities in the United Kingdom, Chile, Mexico and Australia as well as companies dedicated to delivering financial and other career courses; Carnegie Learning, which develops math programs for grades 6-12 and beyond; and more recently the beta launch of Balloon, an online service to help job seekers identify the skills they need for the jobs they want and the courses that can help them attain those skills (and not just Apollo-delivered either).
The man in charge of the IT infrastructure behind those endeavors is Mike Sajor, who became CIO of Apollo in 2012 after similar stints in the retail segment (ANN Inc., which operates Ann Taylor and related brands) and the pharma industry (Merck).
For several years the company has been developing a proprietary classroom platform to support its humongous student population. Initially, the system was designed to run on a public cloud environment. But when the Apollo Compute Platform (APC) launched late in 2013, the company introduced a new approach to the infrastructure supporting it. The system was built on top of VCE, a "data center in a box" packaged up and supported by Cisco, EMC, VMware and Intel.
Here, Sajor explains why the company moved its new learning system off of the public cloud and onto a private one (for now), why it chose VCE as the go-to vendor and what APC delivers to Apollo as a result.
CT: Explain the classroom platform.
Sajor: It's a very social learning platform and it really exploits interior personal communication between the students as a mechanism to advance learning concepts. The discussion platform within the classroom is richly exploited in the curriculum and allows students to interact with each other and with faculty members. If you talk about it, you tend to learn it better.
Another aspect: We leverage the platform to collect a vast amount of data about students as they traverse their learning journey. We know what they're doing, when they're doing it, how long it takes, anything they do along the journey that might not have been the right choice. We collect that data ... and use it to create some set of information about student behaviors. We generate insight; and insight tells us an interesting fact about a student or even a cohort of students. Then we use that insight to create an intervention that will change the probability of the student outcome.
CT: Give an example of how that might work.
Sajor: You're a student and you're going along and submitting assignments, doing reading, doing all those things one would normally do in the course of a class. Assignments are generally due in your class Sunday night. In the first few weeks you turn your assignments in on Friday. And suddenly, you turn in an assignment on Saturday evening, and the next week you turn one in mid-day Sunday. Well, we're going to notice that in our analytics. We'll pick that up and say, "Wait a second. Sally Student now has perturbation in her behavior. She was exhibiting a behavioral pattern over time since she started as a student. Now her pattern has shifted." That becomes an insight. What we do at that point is flag the faculty member or an academic adviser or enrollment adviser to contact Sally using her preferred mode — e-mail, phone call. And we'll ask, "Hey Sally, we noticed you're turning in your assignments a little bit later than you normally did. Is there anything we can do to help you?" You'd be amazed at the answers we get, like, "My childcare on Thursday and Friday night fell apart." That gives us an opportunity to intervene. We can say, "You're in Spokane. We know some childcare providers. We can't recommend anybody; but we can give you a list that might help you."
We built that in the public cloud environment — not in our own data center, not on VCE.
CT: What was going on behind the scenes that made you realize this public cloud wasn't exactly what you needed when APC went live?
Sajor: First, just plain economics. Cloud infrastructure is an area that is changing and changing fast. The world is just starting to get used to this thing. It's been around a few years. It's starting to hit its stride. The economic models are starting to shift. A rapidly changing environment leads to rapid changes in terms of technology. Those changes in technology change the economics again, so you have this vicious cycle going on.
The second thing was, from my vantage point, public cloud providers are great. They really perform a highly useful service in terms of being able to acquire capacity fast — like overnight — and turn things up and down. They're terrific from that vantage point. But when you start getting to the massive scale that we're talking about here, where we're running a quarter of a million students on a platform, there's a conversation about reliability, stability, scalability, disaster recovery. Today, there's more flexibility in assuring that kind of performance in a private cloud environment than in a public cloud environment.
Others' mileage may vary. At Apollo we're blessed with a state-of-the-art data center. And we've got a great disaster-recovery facility up in Las Vegas. All the infrastructure was there to let us do it. The data center, the generators, the air conditioners — that was all there. But the private cloud equipment wasn't.
The decision we made was to go out and acquire the private cloud equipment: the VCE hardware and VMware software and all the other accoutrements to enable us to stand up our classroom environment in this private cloud. We are as cloud today as we were in the public space. It just happens to sit on my floor. I can go hug it.
CT: How did you happen to settle on VCE? Were you already using Cisco and EMC and VMware?
Sajor: Show me a professional IT shop that isn't already using all those guys, and I will show you a professional IT shop that's lying. Everybody uses those guys. You can't not.
There are four or five alternatives out there — we looked at all of them. We went VCE for two reasons, one more important than the other. One, the economics stood out compared to the alternatives. Two, VCE was a true partner in this process. It was the degree to which VCE stepped up in the partnership.
This whole thing is about student outcomes. It's about making sure we can deliver education to a student 24/7/365 anywhere in the world. What I don't need is a platform that goes bump in the night. If that does happen, I want to make sure that my partners are going to be there and that they've got my back. I know if the chips are down and I've got a problem, I can call Frank Hauck, the president of VCE, and he'll answer the phone. That gives me comfort.
CT: Partnership is a vague term. Besides being able to call Frank, what else is there?
Sajor: Good grief. It took all kinds of forms. Unsurprisingly, when we were standing up our intermediation layers that handle release automation and some of the subtleties of virtualization, we were pushing on VMware in some strange and interesting ways, ways they weren't really used to. They put their line engineers on our premises, coupled into their engineering organizations, to tune the product to hit some of the points that we needed hit in order to be able to stand up the platform.
That to me is partnership. When you're changing your base product in order to make it work — not as a customization unique to us, which I abhor, but changing the base product, which means it's supported for the long haul — that's good stuff.
CT: Talk about the switchover.
Sajor: About three or four months ago, we completed the move. In one night we moved the entire classroom platform and all the students on it from that public cloud provider onto VCE and nobody noticed. If you think I got any sleep that night, you'd be sadly mistaken. Thank our lucky stars, everything worked perfectly.
CT: This is described as a hybrid cloud. How did you figure out what should be handled on the public cloud part and what's handled on the private cloud?
Sajor: In this first iteration we're doing almost everything in the private environment. That's after looking at it really closely. But we have an intermediation layer we developed that would allow us — if we decide to do things differently — to vector releases through automated release management, deployment, distribution management. We can vector releases on to other cloud providers as we see fit. If we say, OK, we need capacity fast, I can flip those switches and drive that load to a public cloud provider.
The reason why we built that mediation layer is that we knew darn well this market wouldn't stand still. The reality of what's happening is that you're seeing infrastructure start to evolve into an open marketplace where vendors are competing — public, private, doesn't matter. They're competing on an equal footing for my business based on all the things that I care about: cost, reliability, disaster recovery, sustainability, scalability, latency — whatever set of metrics I worry about. I still have the ability to treat this evolving ecosystem of players as an open marketplace. And that gives me an advantage that I really, really want to have.
CT: Is your IT team managing the infrastructure and then bringing in the VCE folks when you're having trouble?
Sajor: Oh, yeah, you bet. Don't get me wrong. We have support arrangements with VCE. But we only need them if things go twang. We manage this environment on our own. That's critical for us. The nice part about it: With the platform we've created, we actually don't have to do that much management of it. It kind of manages itself.
To give you an idea, in standing up the classroom from the day VCE turned the machines over to us to today, we have stood up 39,000 virtual machines on our VCE farm. We've turned off 36,000. You can think about computing in an entirely different way. You turn on a machine, you use it. If you blow it and screw something up, throw it away. It's virtual. Go get another one.
CT: What are the benefits to this new approach?
Sajor: Velocity. It's all about velocity.
Old way: Developer says, "I've got a great idea. I want to create a new application or service. I need some infrastructure to drive that." Old way: They would file a ticket for a new server. Order would go out to your favorite server provider. Server shows up 60 days later. It has to be racked and stacked. An image would be put on it. Tested, hardened, blah, blah, blah. You're talking two to three months from request to "OK, here you go. Here's your new box."
So what do people like me do? We maintain an inventory in the data center of servers sitting there — running, consuming power, consuming monitoring, HVAC — such that when Tom the Developer asks for a new server, I could say, "Ah, I've got one." We'd run this dead inventory that we'd use to backfill. Very manual. Lots of people involved.
This environment: The developer says, "Yeah, I'd like a new server." They log into ACP, which is a portal that sits on top of the VMware infrastructure. They answer a few questions and check a few boxes, hit go, and about 30 seconds later, it comes back and says, "Terrific. Here's your server. Here's its name. It's been provisioned with an image. You're all set."
The velocity factor has increased by orders and orders of magnitude. That gives us flexibility. Developers can stand them up, tear them down, move them around. If developers forget to tear them down and they're not using them, the little dust bin squad goes in and cleans them up.
It's a totally different paradigm. We keep the whole thing nice and neat and tidy. We are enjoying orders of magnitude of velocity improvement over the old way of doing business. Not in an ungoverned way, but in a religiously tightly governed way, so we know exactly where we are, what our capacity is, what free capacity is. We plot rate of consumption of resource, so we know when we have to go back to VCE and serve some more money for some more boxes. We know where we stand with those purchases. If they have to happen, we know when, down to the week. It gives us an entirely differently level of visibility and responsiveness to our development community.
Our developers are the folks working on cognitive learning, adaptive learning, the folks working on the analytics and insights interventions — I don't want them spending more than two nanoseconds of their time thinking about where they're going to get their server capacity. I want every nanosecond they've got on the learning problem and how to help our students succeed. All the mechanics are receding into the background.
|Editor's note: This article has been modified since its original publication to correct a factual error. We previously stated that Apollo made a $2.5 billion investment in McGraw-Hill's education business. A different company named Apollo actually made that investment. We apologize for the error. Last modified May 27, 2014, 11:40 a.m. — D.N.