Transforming Research Data Management for Greater Innovation
Discovery depends on data. It's what fuels research, tests our ideas, and drives breakthroughs in science and engineering. One well-crafted dataset can unlock a new drug, reveal hidden climate patterns, or expose insights into human behavior that reshape public policy. Data can be highly sensitive or openly accessible, timeless or ephemeral, irreproducible or disposable, or structured or chaotic.
Research institutions face both opportunity and complexity when it comes to harnessing data effectively. Failure to properly manage it can lead to stalled progress, wasted resources, and limited collaboration.
Data only becomes valuable when used, and when reused, it can potentially become even more valuable. Institutions that want to maximize their research investments need a strategic management approach that balances preservation, accessibility, and security and satisfies stakeholders' needs at the same time.
The Data Deluge
Managing, transferring and wrangling multiple copies and versions of enormous datasets is resource-intensive and costly. Many data archives lack efficient mechanisms to distinguish duplicates and original files, track active versus abandoned datasets, manage version histories, or automate retirement.
Furthermore, researchers often lack the training, time, and motivation to develop and maintain disciplined data storage practices, creating difficulties for data managers down the line. Providing researchers with transparent, intuitive tools and workflows enables seamless integration of best practices into their existing processes with minimal effort, thereby making the entire curatorial process more efficient.
As research data grows exponentially in volume, variety, and velocity, traditional management practices that are heavily dependent on ad hoc, dispersed individual and departmental efforts are failing significantly. Data becomes buried in nested folders with cryptic naming conventions. Storage administrators constantly create space while having no visibility into what they're deleting or its importance. Data scientists spend up to 80% of their time wrestling with data rather than conducting actual research.
The "just keep everything" approach that worked with gigabytes becomes financially and operationally unsustainable at petabyte scale. Yet the alternative of deciding what to delete feels like gambling with potentially groundbreaking discoveries.
Managing research data extends far beyond simple storage provisioning. Institutions must invest in curation, migration, and infrastructure while addressing governance, compliance, and resilience requirements. Costs can easily mount due to data misuse, misinterpretation, and legal exposure when releasing data, thereby discouraging data sharing.
A New Paradigm for University Data Infrastructure
Universities are transitioning to a more secure, scalable, and interoperable infrastructure to accommodate changing demands. But adapting to new systems can sometimes lead to solutions that are less than satisfactory. Cloud storage models with unpredictable retrieval costs and governance gaps create new vulnerabilities while solving old problems.
Institutions are forced to rethink long-term data management strategies at a fundamental level. The traditional model of universal preservation is losing ground to more sophisticated approaches that balance value, risk, and sustainability. Data managers, IT administrators, and researchers get caught in a limitless game of digital crisis management. Strategic frameworks are needed at a time when data lifecycle management is not only beneficial but essential for institutional survival.
The RDMS Framework
Forward-thinking institutions are embracing the Research Data Management Strategy (RDMS) framework, an approach defined by eight essential attributes that should guide any data platform:
- Resilience
- Discoverability
- Manageability
- Accessibility
- Governance
- Scalability
- Versatility
- Security
RDMS is a shared language for evaluating whether current systems actually serve science. It is implementation-neutral, meaning these principles can be applied whether you're working with existing infrastructure or planning a comprehensive upgrade.
Keeping these eight attributes in mind, the most successful research data management transformations focus on a few key areas that address real pain points:
Metadata as the foundation. Metadata is data about data. It holds the descriptive information that explains what a file contains, how it was created, and how it should be used. But metadata should be thought of as the foundation, not the footnote. Comprehensive, automatically harvested metadata should include everything from basic file attributes to instrument specifications, funding codes, retention policies, and governance requirements. The crucial part is that metadata needs to live outside the data files themselves, easily accessible, searchable repositories that work across your entire ecosystem.
Researchers should be able to inspect their data without having to move massive files around. When a researcher can quickly search across all institutional data holdings regardless of storage tier or location, the need for expensive hot storage is eliminated, and security risks from unnecessary data migrations are significantly reduced.
Automated decision-making. The solution to the "what should we delete" issue has nothing to do with better human judgment. Humans should be removed from the equation altogether. Metadata-driven workflows can be implemented to automatically handle deduplication, tiering, and retirement based on rules set at data creation time. This approach shifts responsibility back to researchers (who actually understand the value of their data) while freeing data managers from the moral dilemmas associated with deletion.
Breaking down data silos. Data often resides across multiple locations and storage systems, creating a maze of silos — isolated pockets of information. Silos are problematic because they can slow down research, force duplicate storage, and create blind spots where valuable datasets are hidden. They also increase costs, since organizations frequently over-provision "hot" storage to ensure critical data is accessible.
Instead of trying to centralize everything physically, institutions should create a metadata-aware global namespace that makes distributed data appear as a unified repository. Researchers get seamless access to all available holdings, while administrators maintain control through a single management interface.
Intelligent, not restrictive, security. Typically, the most restrictive policies are applied to entire systems when only a subset of data requires special protection. Comprehensive metadata enables more granular security, protecting sensitive data without unnecessarily restricting everything else. Additional controls include role-based access, multifactor authentication in the data path, and automated compliance workflows to enhance security while improving accessibility.
Transforming Research Data Management One Step at a Time
Whether you're starting with an existing system or planning a major overhaul of your research data management environment, start by conducting an honest assessment of your current state using the RDMS framework: Where are your biggest gaps? Is it discoverability (can researchers find their own data)? Governance (do you know what policies apply to what data)? Scalability (are you constantly running out of space)?
Focus on the areas causing the most pain for your stakeholders rather than trying to address everything at once. Often, implementing comprehensive metadata management provides the foundation for addressing multiple challenges simultaneously.
Successful RDM transformation demands both technology and culture change. Researchers need to view RDM tools as enabling their work, data managers need systems that reduce their workload, and leadership needs to understand that strategic data management is an investment in the institution's future research capacity.
The goal is to replace chaotic, ad hoc responses with systematic, strategic approaches that scale with your institution's research ambitions. More effective data management means more time for actual research, and that's something everyone can get behind.
About the Author
Eric Polet is director of product marketing at Arcitecta.