We’re working on text/image prompted vision models at DirectAI (https://directai.io). We help clients do detection of bespoke objects by defining them with a single phrase or prompt image. I’d be happy to talk about helping you build out a system like this without having to collect large amounts of data, annotate it, and then train a bunch of custom models.
I think something like detectron2 [1] could help. It is Apache2 license, so commercial friendly. That said the pre-trained weights may not be commercial friendly, so you’ll want to check on that.
Also fast.ai course [2] is a good starting point to understand the basics. If you are pressed for time, just go through Lesson 1.
[1] https://github.com/facebookresearch/detectron2
Edit: added fast.ai, grammar
YMMV though, ultimately accuracy is going to depend on the quality of the labelled data and your use case
There might be models better suited to your specific needs too, but ultimately you’re always going to need the training dataset
[0]: https://labelstud.io/blog/getting-started-with-image-classif...
Xcode has drag-n-drop support if you're an iOS person. Otherwise you could use, eg, TensorFlow
https://developer.apple.com/documentation/createml/creating-...
1. *Data Collection*: Gather a large dataset of images relevant to the task you want the AI to perform [recognize pieces of equipment from various manufacturers]. The dataset should cover a wide range of variations and scenarios that the AI might encounter in real-world situations.
2. *Data Preprocessing*: Clean and preprocess the images to ensure uniformity and remove noise. This might involve resizing, cropping, normalizing pixel values, and augmenting the dataset with techniques like rotation, flipping, and adding noise to increase variability.
3. *Model Selection*: Choose a suitable deep learning architecture for image recognition, such as Convolutional Neural Networks (CNNs), which are highly effective for this task. Popular pre-trained models like VGG, ResNet, Inception, and MobileNet are often used as starting points.
4. *Training*: Split your dataset into training, validation, and test sets. Train the selected model on the training data using techniques like stochastic gradient descent (SGD) or Adam optimization. During training, the model learns to map input images to their corresponding labels or classes.
5. *Evaluation*: Assess the performance of the trained model using the validation set. Common evaluation metrics for image recognition tasks include accuracy, precision, recall, and F1-score. Adjust hyperparameters and model architecture based on the validation results to improve performance.
6. *Fine-Tuning*: Fine-tune the model by adjusting its parameters or using techniques like transfer learning. Transfer learning involves leveraging pre-trained models trained on large-scale datasets (e.g., ImageNet) and fine-tuning them on your specific dataset to achieve better performance with less training data.
7. *Testing*: Once satisfied with the model's performance on the validation set, evaluate it on the test set to assess its generalization ability to unseen data. This step helps ensure that the model performs well in real-world scenarios.
8. *Deployment*: Deploy the trained model in production environments, whether it's on edge devices, servers, or cloud platforms, depending on your application requirements. Implement mechanisms for model monitoring and updates to maintain performance over time.
Throughout the process, it's essential to iterate and refine each step based on insights gained from experimentation and evaluation. Additionally, staying updated with the latest research and techniques in the field of computer vision can help you improve the performance of your image recognition AI.