Skip to content 🎉 Announcing our Unstructured Data Monitoring Product and Series B Extension
Blog

Major Updates to our Unstructured Data Monitoring Product and a Series B Extension to Fuel R&D

Full press release on VentureBeat

As generative AI use surges, there is record enterprise demand for data quality. To address this need, Anomalo has added a slew of new features to our unstructured monitoring product and raised a Series B extension to accelerate R&D. 

This announcement comes at the end of a remarkable year for the company. We’ve more than doubled our customers in the Fortune 500 (check out recent customer stories with ADP and Lebara), and deepened our partnerships with Databricks, Snowflake, and Google. We’ve also received the prestigious Emerging Partner of the Year award from Databricks, who participated in our initial Series B. 

New funding from Smith Point Capital

We’ve raised $10 million in a Series B extension round from Smith Point Capital, a firm founded by the former co-CEO of Salesforce, Keith Block. This funding will help us accelerate investment in R&D for unstructured data monitoring and deliver the future of data quality for generative AI applications. The team at Smith Point Capital has decades of experience across scaling enterprise software companies, coupled with an incredible network across the industry. Given our growing enterprise customer base, the partnership is the perfect alignment as we continue to scale. 

“Anomalo is rewriting the enterprise playbook for data quality in the AI era,” said Block, “The complexity in managing the enterprise data estate is growing dramatically, driven by a step function change in the proliferation of structured and unstructured data. Maximizing the quality of data in the enterprise has become mission-critical and an important area of investment for Fortune 500 executives. We are proud to lead Anomalo’s Series B extension as they emerge as the leading platform in this space.”

As generative AI surges, data quality takes center stage 

A recent McKinsey Global Survey found that 65 percent of companies across sizes, geographies, and industries now use GenAI regularly, twice as many as last year. A major barrier to GenAI adoption, however, is enterprise data quality. We hear about it everywhere—for instance, Gartner predicts that 30% of GenAI projects, after going through a proof-of-concept, will be abandoned by 2025 due to data issues. Let’s unpack why data quality is so hard when it comes to GenAI projects. 

  • There is no off-the-shelf GenAI model that will “just work” for an organization. These models—from every LLM-powered customer support chatbot to financial forecasting model—require a company to apply an unprecedented volume of their enterprise data toward training and fine-tuning.
  • Ninety percent of all enterprise data is unstructured data (think documents, call transcripts, or order forms). Unlike data quality for structured data, there is no playbook when it comes to data quality for unstructured documents. These documents are often cluttered with duplicates, errors, private information and even abusive language. 
  • Organizations that want to leverage their unstructured data need to identify and resolve quality issues before that data is incorporated into GenAI models and impacts their performance and behavior. The gold mine of data enterprises are sitting on to power their new LLMs may, at best, lead to inaccurate, biased, and generally untrustworthy outputs, or, at worst, lead to security risks if private information is used to train a model. 

In response to these challenges, Anomalo launched our unstructured data monitoring product in June. This was a significant expansion to our data quality platform that monitors the quality of structured data in data warehouses and data lakes. 

With Anomalo’s unstructured data monitoring product, you can curate unstructured text documents and evaluate them for data quality around various document and document collection characteristics, including document length, duplicates, topics, tone, language, abusive language, PII, and sentiment. 

You can quickly assess the quality and fitness of a document collection and identify issues in individual documents, dramatically reducing the time needed to curate, profile, and leverage high-value unstructured text data.

Writing the playbook for unstructured data monitoring

Unlike data quality for structured data, where the issues and resolution strategies are well known, there is no playbook for data quality when it comes to unstructured data. Quality depends entirely on the context of the business, its data, and the ways in which documents are used. 

We’re working with the largest enterprises in the world to invent the playbook from scratch. And what we’re hearing from our customers is that it’s not even about “quality” in the traditional sense, i.e., measuring if a document passes or fails a check. It’s now about understanding what information exists within documents and giving enterprises the keys to deploying these insights at scale across their organization.

To this end, today we’re launching several major advancements to our product’s capabilities:

Custom issues

Enterprises can now customize issues to describe any criteria they want to look for within the document collection. You can also assign severity scores to issues, whether they’re custom issues or the out-of-the-box (OOTB) issues Anomalo provides. 

Anomalo’s initial product contained 15 OOTB issues we could detect within unstructured documents, including document length, duplicates, topics, tone, language, abusive language, PII, and sentiment. Custom issues empower our customers to create their own criteria and designate what classifies as high or low quality for their documents. We’ve also made severity scores customizable, so you can decide which data quality issues matter the most to you.

For example, you may want to be notified if a document’s language appears to be truncated or is clearly missing expected content. Or, you might want to assign a higher data quality severity if Anomalo detects PII in the document.

This feature is so flexible that it can go beyond purely detecting issues and allow customers to assign any criteria they want to look for within the document. Consider a large quick-service restaurant that wants to identify customer reviews complaining that a new product is too expensive. Anomalo’s custom issues feature allows them to create a “too expensive” issue, assign a score of importance, and run that against a collection of documents. 

 

Cloud hosted model-as-a-service support

We do not shy away from the fact that this product is using the latest and greatest foundational large language models (LLMs) to deliver this experience. In fact, we are leaning into it as a strength in this release. First, we make it easy for customers to specify which foundational LLMs they want to use with our multi-model support of LLMs from Anthropic, Google, Meta and OpenAI. 

This is about more than just offering variety though, and addresses a pressing challenge of scaling data quality within GenAI: enterprise security. One of the biggest blockers to more rapid GenAI adoption is the concern of exposing enterprise data across the public web, or having enterprise data leaked into training data for LLMs. We understand that as a result, enterprises may only have certain LLMs approved to run within their cloud environment. With Anomalo’s cloud hosted model-as-a-service support,  enterprises can now leverage the large language models approved to run within their own cloud environment (and hosted by Google Vertex, AWS Bedrock, and Azure AI) to power our unstructured monitoring product. Paired with Anomalo’s existing ability to seamlessly integrate with cloud providers and run entirely within a virtual private cloud (VPC), this keeps data within enterprise data teams’ control and minimizes risk that data is ever used to train or fine-tune models. 

Anomalo has always focused on the enterprise segment, and this announcement reinforces our commitment to this vertical. Enterprise data is messy, sensitive, and often highly restricted. Anomalo’s best-of-breed enterprise security and access controls guarantee compliance with even the most stringent privacy and security requirements for your data. 

Join us on this journey

Today’s announcements bring us a significant step closer to reinventing what enterprise data quality means in the context of GenAI and unstructured data. 

Want to join us on this mission? Apply today for private beta access to our unstructured data monitoring product. If you want to join our team, check out our open positions here.

Get Started

Meet with our expert team and learn how Anomalo can help you achieve high data quality with less effort.

Request a Demo