Why Universities Need to Align Data Storage with Data Value
Universities are voracious data generators, with one well-known institution of around 40,000 students currently producing in excess of 15TB per day from research activities alone. This kind of volume places storage requirements firmly in the petabyte range, comparable to those of large enterprises, with infrastructure needs set to grow further as data-intensive AI tools are more widely adopted.
In many environments, unchecked data growth is now outpacing the ability of IT teams to manage it effectively. It's a situation that has a potentially serious knock-on effect on everything from technology performance and research timeliness to budgets, which, generally speaking, remain under significant pressure.
Central to the challenge is that institutions tend to address data growth in a one-dimensional way: When storage fills up, keep adding more. Compounding the problem is that a significant proportion of university data estates consists of inactive or low-access information that remains on primary storage simply because it has never been assessed or classified. Similarly, universities are understandably risk-averse, to the point that data is retained indefinitely because institutions lack the confidence to archive or delete it.
While this approach provides a certain level of reassurance, in practical terms, it also means high- and low-value data are treated in the same way. This not only increases overall costs but also limits the effectiveness of technology investments in the long term.
Viewing the data growth problem and solution primarily through a storage-capacity lens also misses a critical point: Any lack of visibility into what data exists, where it resides and how it is used creates a fundamental disconnect between expenditure and the value that data actually delivers.
A Shift in Approach
Taking back control of data so it can be managed and budgeted for in line with its value is the first step. It's then about managing the access requirements, both of which require a shift in approach. Institutions need to move away from a reactive habit of expanding storage and towards a more deliberate data management model based on understanding and control.
The starting point is visibility, because without a unified view of the data estate, it is difficult, if not impossible, to distinguish between data that supports active research, for example, and that which is no longer accessed but continues to consume high-performance, costly storage resources.
This approach depends on the ability to analyze large volumes of unstructured data at university scale, which typically means billions of files across multiple systems and locations. This is a data management software challenge, with modern systems capable of analyzing billions of files to provide the visibility needed for informed decision-making.
At this scale, data management simply cannot rely on manual processes and instead depends on automated intelligence to bridge the gap between requirements and resources. This provides the foundation for making consistent, data-driven decisions about how different datasets should be handled, ensuring that storage infrastructure is properly aligned with the actual value and access requirements of each dataset and the associated compliance processes.
Regardless of where data resides, institutions also need to ensure that access permissions are consistently defined and maintained across environments. Without this level of control in place, sensitive or regulated data can remain exposed even if it has been moved to a more appropriate storage tier, potentially undermining both governance and compliance.
Armed with definitive insight, institutions can then begin making informed decisions about which datasets should remain on high-performance infrastructure and which can be moved to more cost-effective archival environments or deleted altogether. This offers a solid foundation for adopting policy-driven lifecycle management, in which data is actively governed throughout its lifespan and, when certain stages are reached, can be moved to a more appropriate setting or deleted permanently.
The shorter-term impact is typically a reduction in pressure on primary storage systems and a more controlled approach to capacity planning. More importantly, it allows budgets to align with actual data needs, so investment is directed towards supporting core institutional priorities rather than just continuing to absorb funds that could be better used elsewhere.
And let's be clear, this isn't just about reducing storage costs, important as that is. It's also about improving how institutions operate at scale and preparing them for a future in which data volumes will grow even further. Breaking the cycle of periodic storage expansion and replacing it with a more predictable, sustainable model is fundamental to sustainable IT investment. Those institutions that get the balance right can enjoy a win-win of improved cost control and more effective support for research and innovation.
About the Author
Steve Leeper is VP of product marketing at Datadobi.