IT Management
How Institutions Can Prepare Their IT Environments for a Data Science Program
- By Damien Eversmann
- 08/28/23
Data scientist is a hot job, with a median six-figure salary and projected 36% growth in positions over the next several years. Recent advances in artificial intelligence (AI) and other forms of data analytics only expand the potential opportunities for budding data scientists.
These trends have captured the attention of colleges and universities. By one count, more than 1,000 institutions now offer undergraduate or graduate degrees in data science.
There are two aspects of data science institutions should focus on, each of which can benefit the other. First is the actual data science education program that schools offer students. Colleges and universities that don't yet offer such a degree — or that offer only older forms of data-focused study, such as statistics — should consider launching a data science program. A robust program can help them attract and educate high-caliber students who can excel in tomorrow's jobs.
The second aspect is the data science projects that institutions conduct themselves as part of their business operations. Increasingly, schools need data science to sift through troves of student, market, financial, and other data to gain insights that can contribute to more effective education and stronger competitive advantage.
Both aspects of data science require some foundational resources and capabilities. Beyond fielding faculty and staff with data science expertise, institutions should invest in two specific areas.
First is to develop an IT department with an adaptive culture. Data science is a relatively new discipline that's advancing rapidly. The types of data analytics that the IT team must support are evolving at a dizzying pace. The tools that researchers and students must be educated on could be different next year than they are today — as evidenced by the sudden interest in generative AI tools like ChatGPT. Those realities mean the IT department must be agile enough to support this rate of change.
Second is a stable IT environment where security is prioritized. Schools of all types have become prime targets of cybersecurity attacks, making strong data security an imperative. Data science by its nature involves large quantities of data, further raising the security stakes. Two strategies can help:
Place compute and data close together. In the past, organizations operated datacenters on site, where computer servers and data storage were housed behind locked doors. More recently, workloads and data have migrated to public clouds. Today, institutions are rethinking where they maintain their most sensitive information, with some returning certain data on premise.
This centralized approach presents challenges, however. More data is being generated at the edge of the network. For data science researchers, this could be in labs and satellite facilities. For data science practitioners, it could be on internet of things (IoT) devices. In either case, transmitting vast data streams to a central location can be costly, and it risks exposing large quantities of sensitive information.
Rather than transfer data to centralized compute resources, place the compute close to the data. This is achievable today using a containerized IT architecture. A container is a lightweight, standalone package that combines an application with its necessary files and settings.
Containers give institutions the ability to run data analytics applications on small devices. Analysis of the data can take place at the edge, and only the output needs to be transmitted. This can help reduce the amount of data that must be transferred. NASA, for instance, is using containers to conduct scientific analysis on the Internal Space Station.
Secure the technology supply chain. Following the infamous Sunburst supply chain hack of 2020 — an attack that spread to thousands of organizations through popular IT monitoring software — many institutions now worry about supply chain security. And for good reason.
Colleges and universities use all sorts of technology — some of which is "shadow IT" not authorized by the IT department. It's easy to understand how this happens. Faculty members receive grants to conduct research, they require specialized tools, and they don't want to wait for approval from IT. Instead, they build out their own technology portfolio — without necessarily knowing whether it's safe or how best to secure it.
One solution is for the IT department to become as responsive as possible to faculty and staff needs. Another is to invest in an IT architecture and application stack built around open source software. Open source code is developed in a decentralized and collaborative way, relying on community production and peer review.
Commercial IT solutions based on open source software can be more secure than proprietary products, because they benefit from transparency and diverse input. A vibrant open source community can foster best practices in cybersecurity. It can also quickly identify and remediate security issues. And if the solution provider is an established member of the open source community, it can contribute to a more secure supply chain, tracking the code's provenance and confirming it has been thoroughly tested.
Boston University deployed a commercial solution built on an open source machine learning (ML) platform for its computer science program. The solution enables researchers to rapidly train and manage ML models either on premise or in the public cloud. It simultaneously provides an environment for an open source textbook, interactive lectures, and demonstrations. Students use a web browser to access a personalized virtual space for completing assignments and exploring ML models.
Investing in such IT strategies and capabilities can empower institutions to offer a robust data science program and establish their own data science practice. What's more, these two aspects of data science — program and practice — can enhance each other.
An effective data science program can help schools attract faculty and students alike. Likewise, the data science research — and freshly minted data scientists — that emerge from such programs can advance their data science practice, helping them gain new insights to educate students more effectively and better compete in the marketplace.