Introducing Striim Labs: Where AI Research Meets Real-Time Data

Table of Contents

AI research has a proliferation problem. AI and machine learning conferences such as NeurIPS stated that they’re overwhelmed with new paper submissions, with 21,575 papers submitted this year, up from under 10,000 in 2020.

At the crux of the issue is the questionable quality of the papers: whether written using AI tools or rushed through to publishing without robust reviews. In the noise, it’s increasingly difficult for practitioners to discern genuine innovation from “slop”, or to find applicable methodologies that might just be perfect for their use cases.

That’s why we’re launching Striim Labs.

We focus specifically on the intersection of AI/ML research and real-time data streaming: the part of the venn diagram where promising techniques meet production-grade, low-latency systems. Our team will wade through the deluge of research papers to find the most applicable examples for streaming machine learning use cases. We’ll even test them out to make sure they can perform as claimed.

Through exploring emerging techniques, collaborating with Striim customers on real scenarios, and building working prototypes, we want to bring about actionable templates (“blueprints”) that teams can replicate and deploy themselves. Every blueprint will be accessible to the public via GitHub repositories and deployment instructions, and maintain an open line of communication for feedback and collaboration.

What is Striim Labs?

Striim Labs is an applied AI research group we’re launching at Striim: a team dedicated to learning and experimentation at the intersection of AI and real-time data.

Striim Labs will draw on the collective knowledge and experience of a team of data scientists and experts in streaming machine learning. First and foremost, our work focuses on real-time, low-latency use cases that enterprise teams can actually use.

Striim Labs isn’t a purely academic exercise. Nor is it a Striim product demo disguised as thought leadership. It’s a genuine attempt to take promising techniques from recent research and stress-test them against the messiness of real-time data: schema drift, late-arriving events, volume spikes, and all the other things that break what worked in a notebook.

We’ll document what we find honestly, including what didn’t work, what we had to adapt, and where the gap between a paper’s benchmarks and streaming reality turned out to be wider than expected. That transparency is the point. If a technique falls apart under latency pressure, that’s a finding worth sharing too.

The result, we hope, will be a series of prototypes we’re referring to as “Applied Blueprints” that practitioners: ML engineers, architects, and data scientists can experiment with themselves, as well as giving us feedback and suggestions from their own experiences.

What is an Applied Blueprint?

An applied blueprint is a self-contained reproducible prototype that implements a technique or model from a recent research paper.

We’ll build our blueprints using open source tools and technologies (Kafka, Apache Spark, PyTorch, Docker, and others) with defined minimum acceptance criteria (precision, recall, latency).  Our starting point with each blueprint is always based in open source and framework-agnostic tooling, so anyone can run it (not just Striim customers, though we encourage them to check it out!). Each blueprint will live in a public GitHub repository with full deployment instructions. We’ll also publish our work via the Striim resources page and elsewhere, to make it more accessible.

Ultimately, our intention for each blueprint is first to validate a technique within a streaming context, then to integrate it into Striim’s platform natively, extending what Striim offers to our customers out of the box. But again, we stress that each blueprint will be available to everyone, not just Striim users.

What Makes Striim Labs Different?

Here are a few ways we aim to set Striim Labs apart from other data science initiatives.

  • Everything ships with code: Every applied blueprint we publish will feature code you can test, within its own GitHub repo. Not just theoretical whitepapers.
  • Every blueprint has defined, measurable acceptance criteria: We’ll test our models and share based on real results; not a vague promise that it works.
  • Open source first approach: You won’t need Striim’s platform or to be working within a particular cloud environment to learn from or run a blueprint.
  • Transparency about tradeoffs: We’ll be clear and open from the start about model failures and breakages, rather than just sharing polished results.
  • Clear path from prototype to production: Our blueprints will be designed to graduate from prototypes into systems we’ll build into Striim’s platform as native capabilities.

What’s next?

Our first area of focus will be a subject many real-time enterprises are interested in: anomaly detection. Anomaly detection has benefited from a rich body of recent research, but the gap between research papers and production results remains particularly wide. That makes it a great place for us to start, especially since it’s one of the most requested capabilities in a streaming context.

We’ll be launching a series of blueprints on anomaly detection, and our findings on anomaly detection models, in the near future.

Your Move: Get Involved

Striim Labs is designed to be an open, collaborative exercise. We welcome input, feedback, and ideas from practitioners wrestling with data science problems who are curious about the latest innovations in the market.  Here are a few ways you can take part:

  • Suggest papers, techniques, or focus areas you’d like us to text against real-time data.
  • Try our blueprints, and give us real feedback! Tell us where we can improve, and let us know what works and what breaks in your environment.
  • Share your work. We’d love to hear from you if you’re working on similar projects. Feel free to share your GitHub repos or related initiatives.

Where you can find us:

We’re excited to bring new insights, prototypes, and research to you in the following weeks. Thanks for being part of our journey.