Blog 5 mins reading time
How We Used MinIO, ClamAV, TensorFlow, and MeiliSearch to Create a Supercharged Image Search – and Had Fun Along the Way
adorsys were given a fun challenge by one of our clients. They had tons of files being uploaded into a MinIO server (deployed on Kubernetes, naturally), and they wanted to search their images based on quality scores. But not just any old search – they needed to be able to search based on image quality. Challenge accepted! We saw it as a perfect opportunity to flex our machine learning muscles, with a side of security. And, because we’re overachievers, we threw in a bonus feature: virus scanning with ClamAV. Yep, we went the extra mile, and the client loved it.
Here’s how it all came together – from virus scanning (bonus!) to AI-powered image quality analysis and searchability.
The Flow: From MinIO to AI-Powered Image Search
Our system works like this: whenever a file is uploaded to MinIO, a series of tasks kicks off to keep everything secure and give us those sweet quality scores for images. Let’s break it down:
- Bonus Feature! ClamAV to the Rescue
- First up, every uploaded file is scanned for viruses using ClamAV. This was a task we proposed to the client as a little security bonus. They loved the idea – who doesn’t want an extra layer of protection? If ClamAV finds anything nasty, the file is deleted on the spot, no questions asked.
- Is It an Image?
- If the file gets the green light from ClamAV, we then check if it’s an image. If it’s just a regular ol’ document or video file, it can hang out in MinIO in peace. But if it’s an image, we move on to the fun part: machine learning.
- AI-Powered Image Quality Analysis
- The image gets sent to a TensorFlow model that analyzes its quality, scoring it like a highly critical AI art critic. Once we’ve got that score, we store it in MeiliSearch, a fast, scalable search engine, so the client can easily query images by quality later.
Here’s the flow in a handy diagram:
TensorFlow and the Client’s GPU Power – A Match Made in the Cloud
The client had an AWS EKS GPU cluster just begging to be used. Naturally, we seized the opportunity to flex some AI-powered muscle and trained a TensorFlow model to score images. We used ArgoCD to define a pipeline that collected and cleaned image data, trained the model, and deployed it to a TensorFlow server. (A whole process for another blog post, but trust me, it was a breeze with a few trusty Python scripts.)
The pipeline flowed like this:
Note: Why Not TensorFlow Serving?
Good question! We thought about using TensorFlow Serving, but it wasn’t compatible with all the CPUs in our environment at the time. Instead, we built a custom TensorFlow server. Don’t worry, though – we’ll switch to TensorFlow Serving in the future when things are more standardized.
The Webhook App – Powered by NestJS and RabbitMQ
Our NestJS webhook app sits in the middle of this process, making sure all the file upload events get processed smoothly. Here’s how it works:
- Step 1: When a file is uploaded to MinIO, the an event is send through RabbitMQ. Our App catches it and then splits the process: it first sends the file to ClamAV for virus scan
- Step 2: then we handle it
- Step 3: If the file is virus-free and an image, it’s forwarded to the TensorFlow server for quality analysis.
- Step 4: The TensorFlow server returns a quality score, which gets indexed into MeiliSearch for fast querying later. To handle all these events in real-time, we use RabbitMQ, which ensures no event gets lost in the shuffle. It’s our event management hero behind the scenes.
Here’s how the app’s logic looks:
Docker Compose: Keeping It All Together
We used Docker Compose to spin up the whole environment easily.
Bonus: Monitoring and Smaller Apps
We didn’t stop there! For monitoring and keeping track of what’s going on in our system, we use a bunch of smaller apps like TensorBoard (to monitor the AI model) and file-browser (to manage files). We’ll dive deeper into these tools in another article, but suffice to say, they make life a lot easier when you’re juggling multiple services and deployments.
Helm – The Real MVP of Deployment
And of course, all of this is deployed using Helm. Helm made it easy to package everything, from MinIO to ClamAV, TensorFlow, and MeiliSearch, and deploy it onto our Kubernetes cluster. Helm charts let us manage the complexity of our multi-service system, so we could scale and deploy everything in a way that didn’t make our heads explode.
Conclusion: Fun, Stress, and a Happy Client
In the end, we delivered a system that could not only scan files for viruses but also analyze image quality with AI, and make everything searchable with MeiliSearch. And the best part? The client loved it.
So what did we learn along the way?
- MinIO is awesome for scalable file storage.
- ClamAV gives your system an extra layer of security.
- TensorFlow + GPUs = machine learning magic.
- MeiliSearch makes querying fast and easy.
- Helm makes deployment a breeze.
Was it all smooth sailing? Not exactly. But we had fun, learned a lot, and most importantly, the client was thrilled with the result.
Stay tuned for our next article, where we’ll dive into the smaller apps and tools we used for monitoring and file management. Until then, stay virus-free and keep innovating!
Keen to explore how adorsys can guide your company into this world? Reach out to us here, our team will be delighted to discuss tailored solutions for your organisation.
Written by Stephane Segning (Senior Software Architect at adorsys).