U California Researchers Release Beta for Big Data Management

A team of California universities has released a beta version of a system for managing big data along with more traditional forms of data. Researchers from the University of California in Irvine, Riverside, and San Diego have banded together to create AsterixDB, a Java-based "big data management system" (BDMS).

The work began in 2009 with funding from the National Science Foundation and, eventually, the state of California and others. The goal was to create a set of new technologies for "ingesting, storing, managing, indexing, querying, and analyzing vast quantities of semi-structured information." The researchers pulled ideas from three areas — semi-structured data, parallel databases, and data-intensive computing — to create a "next generation" open source application that could run on large clusters of commodity computers.

At the heart of the system, the AsterixDB engine operates on a "shared nothing" architecture. Each computer in the cluster runs independently and is self-sufficient.

"We're providing a next-generation platform for storing, managing, coordinating, and making use of Big Data," said Michael Carey, a UC Irvine professor leading the work. Big data is, of course, the output generated moment by moment by numerous online sources, including blogs, micro-blogging sites, transactions, sensors, status updates, and other computing activities. The challenge of managing that data with traditional database management technologies is that it is generated increasingly faster, takes multiple forms, and isn't easily categorized for rapid analysis.

According to an overview posted on the AsterixDB site, the work has targeted usage within multiple scenarios: cases where information is well-typed and highly regular (and predictably so) to situations where the content is textual, irregular, and therefore "hard to anticipate up front." Technical areas have focused on data storage and indexing that's highly scalable, query processing of semi-structured data on very large clusters, and the merging of techniques from parallel database processing and data-intensive computing.

"Big Data crosses a lot of domains, from government to health care to business," noted Carey. "It's hard for us to imagine an area where AsterixDB can't contribute."

Now the authors of the system are hoping to extend real-world testing by finding partners that can use the platform in various domains generating big data. Those environments may currently be using data management schemes based on Apache projects Hadoop, Pig, Hive, and HBase as well as MongoDB, among others.

"We're putting AsterixDB out in an unrestricted open-source form," Carey explained. "Users can do whatever they want with it, and we can learn from what they do and further improve our platform based on their needs."

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

Featured

  • glowing digital brain-shaped neural network surrounded by charts, graphs, and data visualizations

    Google Releases Advanced AI Model for Complex Reasoning Tasks

    Google has released Gemini 2.5 Deep Think, an advanced artificial intelligence model designed for complex reasoning tasks.

  • abstract pattern of cybersecurity, ai and cloud imagery

    OpenAI Report Identifies Malicious Use of AI in Cloud-Based Cyber Threats

    A report from OpenAI identifies the misuse of artificial intelligence in cybercrime, social engineering, and influence operations, particularly those targeting or operating through cloud infrastructure. In "Disrupting Malicious Uses of AI: June 2025," the company outlines how threat actors are weaponizing large language models for malicious ends — and how OpenAI is pushing back.

  • cybersecurity book with a shield and padlock

    NIST Proposes New Cybersecurity Guidelines for AI Systems

    The National Institute of Standards and Technology has unveiled plans to issue a new set of cybersecurity guidelines aimed at safeguarding artificial intelligence systems, citing rising concerns over risks tied to generative models, predictive analytics, and autonomous agents.

  • magnifying glass highlighting a human profile silhouette, set over a collage of framed icons including landscapes, charts, and education symbols

    AWS, DeepBrain AI Launch AI-Generated Multimedia Content Detector

    Amazon Web Services (AWS) and DeepBrain AI have introduced AI Detector, an enterprise-grade solution designed to identify and manage AI-generated content across multiple media types. The collaboration targets organizations in government, finance, media, law, and education sectors that need to validate content authenticity at scale.