Why generic auto-tagging fails — and how custom-trained AI turns a product-heavy photo and video library into a searchable asset.
Most DAMs and cloud photo tools advertise “AI tagging.” What they’re actually offering is a generic computer-vision model — usually a wrapper around a third-party vision API like Google Vision, AWS Rekognition, or an open-source CLIP model. These models are trained on billions of images from the open internet and can recognize general concepts: car, beach, woman, laptop, pizza.
That’s useful if you’re organizing a personal photo library. It’s close to useless for a business.
It’s also not new. Generic auto-tagging is roughly 15-year-old technology. It was a real breakthrough when it first appeared — a computer that could actually look at a photo and say “this is a dog” was a step forward in the late 2010s. But the output hasn’t meaningfully improved in the last decade. What has improved is how many vendors ship it as a checkbox feature, often still marketed as “AI-powered” as if it were new. It isn’t. And for any business whose team searches photos by specific product names, it never worked in the first place.
A retail brand looking for “the Aurora Shell jacket in Storm Blue from the SS26 lookbook” is never going to find that photo through generic tags. The generic AI tagged it jacket, outdoor, blue, person, mountain. Those tags are technically correct. They are also entirely unhelpful — because every other jacket in the library has the same tags. You don’t need a black jacket. You need that black jacket. And generic AI has no way to tell one from another.
The problem compounds with scale. A library of 500 photos is browseable manually. A library of 50,000 photos with generic tags is worse than a library with no tags at all, because the tags create a false sense of searchability that wastes everyone’s time.
Generic AI has a ceiling on how specific it can get. It knows dog but not Aussiedoodle. It knows chair but not Herman Miller Aeron. It knows cereal but not your specific SKU of oat clusters with the new holiday packaging. For any business whose catalog lives at a level of specificity below what’s in a public vision model — which is nearly every business — the ceiling is reached on day one.
Generic AI is also inconsistent across similar images. The same product shot from two angles might get tagged with overlapping but non-identical terms — shirt on one photo, top on the next, blouse on the third. None of those are wrong. But when you search “shirt,” you get a third of the results you should.
What teams actually do when generic AI doesn’t work: they give up and tag manually. A designer, a marketer, or an intern sits with a spreadsheet and types the real tag onto each photo. For a catalog with 5,000 SKUs and 20 photos per SKU, that’s 100,000 tagging decisions. Even at a minute per photo, that’s a full year of someone’s time. No team actually does this. Instead, tagging stops, the library decays, and “findability” becomes “ask Sarah, she took the photos.”
Custom AI tagging is a different class of product. Instead of running photos through a pre-trained general-purpose model, you train a model on your actual products — with your own photos of your own catalog as examples. The AI learns what each of your products looks like and applies the right tag on every new upload.
The core use case is products. A retail brand trains the AI on every product in its catalog. A furniture rental company trains it on every piece of rentable inventory. A hardware manufacturer trains it on every SKU, including visually similar variants. From that point forward, every new photo that enters the library is tagged with actual product names — not generic synonyms.
Products are the main focus of custom AI tagging in practice, but the same approach works beyond products. A university can train a custom model to recognize specific buildings across its campus. An event company can train it on specific venues. A sports team can train it on specific stadiums or facilities. Anything visually consistent that your team needs to find by name is a fair candidate. The rest of this guide focuses primarily on the product use case, because that’s where most of the value is — but keep the broader applicability in mind.
Custom AI tagging is not manually adding tag options to a dropdown. That’s metadata management. Every DAM has it, and every DAM still requires a human to do the actual tagging.
It’s also not keyword-based auto-tagging, where the system scans filenames or descriptions and copies text into tag fields. That works when filenames are clean; it fails when they aren’t (which is most of the time).
Custom AI tagging is a model that looks at the visual content of a new photo or video and identifies the right product — with no human in the loop for the tagging step itself.
An important nuance: custom AI tagging is not one problem. It’s a family of problems with very different difficulty levels. Distinguishing two visually distinct chairs is straightforward. Distinguishing two nearly identical tables can be genuinely hard. Electronic devices often carry readable text on the product itself — model numbers, logos, labels — which a well-built tagging system can exploit. Fashion items need visual texture and silhouette recognition. Rental gear often has serial numbers and brand marks that help.
A serious custom-tagging product uses different techniques for different problem types — modern vision transformers for some cases, classical machine learning on top of visual embeddings for others, OCR and text extraction when the product carries its own identifying text. A platform that applies the same approach to every customer’s catalog is going to underperform on the cases where that approach isn’t the right fit.
You don’t need to understand the technology to evaluate this. What you do need to know: ask a prospective platform how they handle your category specifically. A vendor that can explain why fashion tagging is different from electronics tagging is doing real engineering. A vendor that says “our AI does it all the same way” probably isn’t.
Training and hosting a per-customer vision model is expensive. It requires infrastructure, ML engineers, and a product team willing to build a feature that many smaller customers won’t use. Most DAMs took the easier path: plug in a generic vision API, call it “AI tagging,” and ship it — often still presenting that 15-year-old technology as a flagship feature.
Custom-trained vision models do exist outside the DAM world. Cloud providers like AWS, Google, and Azure all offer custom vision services. But those are developer tools. They require an engineering team to integrate, maintain, and operate. They’re not the same thing as a custom AI model built into a DAM and usable by the marketing, ops, or event team who actually owns the photo library.
As far as we’re aware, Tagbox is currently the only DAM that offers custom AI tagging as a fully productized, team-usable feature — not as a developer API you have to wire up yourself.
The workflow is simpler than most people expect. There are four stages.
The first step is just a list — what do you want the AI to recognize? For most brands, that’s a product catalog: every SKU, product line, variant, or model you want to be able to search for. For a university, it might be a list of campus buildings. For an event company, a list of venues.
Within the product case, you can go further and add structure around each product — colorway, season, shot type, campaign, and so on — but at its core the list is just: these are the things we need the AI to identify.
For each item on your list, you provide example photos. In current systems, 20 to 50 examples per item is typical, and in the harder edge cases — products that look very similar to other products in the catalog — up to 100 may be needed. The technology here is improving fast; next-generation models are pushing the typical number down to as few as five examples for most tags.
For fuzzier or more visually varied tags like “lifestyle shot” vs. “packshot,” you want the higher end of the range to cover the visual diversity the model needs to generalize.
This is the only meaningful manual step. For most teams, it’s a one-time effort of a few days — dramatically smaller than tagging the entire historical library by hand.
The platform trains a custom model on your examples. This is typically done by the platform’s team as part of initial setup, not by the customer directly. The end-to-end project — from listing out your products through training, validation, and deployment — usually takes two to four weeks depending on how many items are in scope and how clean the initial training data is.
This isn’t a self-service process in most serious implementations. Getting a custom AI model actually working well requires some collaboration: agreeing on what’s in scope, reviewing the initial model’s outputs, iterating when something tags poorly, and validating accuracy against a holdout set. A platform that claims a custom model will be live five minutes after upload is overpromising.
Once live, every new photo uploaded to the library is automatically evaluated against your trained model. Matching tags are applied. The model also runs retroactively across the existing library to tag historical content, so the investment pays off for everything you already have, not just what’s uploaded going forward.
From that point on, a photographer uploads a batch of product shots and within seconds every photo is tagged with the right product, colorway, season, and shot type — no manual work. A retail marketer searches “Aurora Shell Storm Blue packshot” and gets exactly those photos, not 200 photos of blue jackets.
Catalogs change. New products launch, new campaigns run, new SKUs replace old ones. Custom AI tagging systems let you retrain or add tags incrementally — add a small batch of examples for the new SS27 collection, retrain, and the model picks it up without touching the rest of the catalog.
Yes. Training doesn’t have to be done in English. A French retail brand can train its custom AI on French product names and French category names, and the resulting tags will be applied in French. This matters more than it sounds — for international teams, a forced-English taxonomy creates a permanent translation layer between the photo library and the people who use it.
Photos can be skimmed. You can open a folder of 500 product shots and eyeball your way to the one you need in a few minutes if you really have to. It’s painful, but it’s possible.
Video is different. Video cannot be skimmed. A 60-minute event recording might contain a two-second shot of your client’s product. A product launch video might hit a specific SKU at minute 38. A keynote might cut to a branded slide 22 minutes in. Without search, the only way to know is to watch the whole thing. Multiply that by a library of hundreds or thousands of videos and the entire library becomes unusable — not difficult, unusable.
This is where custom AI tagging’s value is most dramatic. When the model runs on video as well as photos, every product appearance in every video becomes searchable. You don’t scrub through hours of footage hoping to find the right moment. You search the product name and the results surface every clip it appears in, with timestamps.
For product-heavy brands this is table stakes now. Social content is video-first. Product demos and unboxing are video. Marketing reels, TikTok content, YouTube Shorts, launch recordings, customer testimonials — all of it is video. A tagging platform that only handles stills is leaving more than half of your library invisible, and that share grows every year.
For events, rental companies, and anyone producing long-form content, the video case is even stronger. Finding “a 90-second shot of our new lounge setup from the Chicago activation” inside a four-hour event recording is the difference between reusing that footage and shooting it again from scratch.
Bottom line: if your custom AI tagging only works on photos, you’ve solved the easier half of the problem and left the hard half alone.
The use cases are wider than “eCommerce auto-tag your products.” Here are the workflows where custom AI tagging changes how teams actually work.
This is the canonical use case. A brand with thousands of SKUs and a photographer shooting every week has an exponentially growing media library — packshots, lifestyle shots, campaign photos, social content, user-generated content, video stills. Without custom tagging, every new asset is effectively invisible the moment it enters the library.
With custom tagging, every photo is automatically labeled with the actual product name, colorway, season, and shot type. The marketing team can pull “every Storm Blue lifestyle shot from SS26” in seconds when building a campaign. The social team can find “all detail shots of the Aurora line” without asking anyone. PDP teams can locate primary hero images by SKU for launches.
Secondary benefit: the library stays organized as it grows. A brand that doubles its catalog doesn’t need to double its content operations headcount.
Best for: Fashion and apparel | Footwear | Beauty and skincare | Home goods | Furniture | Accessories | Multi-brand marketplaces
Event rental companies live and die by their proposals. A client asks for “something like your lounge from last year’s Chicago activation, but with the brass side tables instead,” and someone on the sales team has to find reference photos fast. Without custom tagging, that’s folder archaeology across thousands of event photos.
Custom AI tagging trained on every piece of rentable inventory — specific couches, bars, staging, lounge pieces, side tables, and so on — solves this. New event photos are automatically tagged with the actual products that appear in them. Sales pulls reference shots in seconds. Design teams build mood boards from real past events. Marketing has a living case-study library organized by product rather than by event.
A real example: Blueprint Studios, a San Francisco-based event production and rental company, uses custom AI tagging to organize its inventory of event furniture and décor across thousands of event photos. Their team’s tagging — specific lounge pieces, bar styles, product families — is applied automatically as new event photos are added, so sales and design teams pull relevant reference imagery for new proposals in seconds rather than combing through folders.
Best for: Event furniture and décor rental | Wedding and party rental | Tenting and staging | Corporate event production | Venue styling and design
Product catalogs with high part-number specificity — PCB layouts, components, assembly variants, industrial machinery, laboratory equipment, automotive parts. The items photographed are visually similar (a rack of identical-looking servers; ten variants of the same PCB) and generic AI simply can’t tell them apart.
These are also the cases where on-product text matters most — model numbers, serial numbers, and branded labels are often the clearest distinguishing feature. A custom AI tagging system that can use OCR alongside visual recognition tags these catalogs far more accurately than one that only looks at the shape of the object.
The payoff is real: engineering and product teams find the right reference image by SKU or part number. Technical documentation is automatically illustrated with the correct variant. Marketing doesn’t accidentally ship a datasheet with a photo of the wrong model.
Best for: Electronics manufacturers | Industrial equipment | Automotive parts | Laboratory and scientific instruments | Semiconductor and components
Events benefit from custom AI tagging in two distinct ways. First, sponsor logo detection: a conference with 20 sponsors generates thousands of photos. Every sponsor wants a curated set of photos featuring their brand, and they want it within 48 hours of the event. Custom AI trained on each sponsor’s logo, booth design, or brand mark automatically collects sponsor-specific photo sets from the general event library. What used to be three days of manual sorting becomes an automated output.
Second, recurring assets across events — specific staging elements, signage templates, activation setups — can be tagged consistently across every event the company runs, creating a searchable archive of what’s been built before.
Sporting events are a particularly strong fit. Broadcast and sponsor rights require knowing exactly when and where specific sponsor logos appear in every minute of footage. Custom AI, especially on video, surfaces that information automatically.
Best for: Conferences with multiple sponsors | Sporting events with branded environments | Trade shows | Corporate events with booth activations | Award ceremonies
Food brands face a specificity problem similar to fashion: generic AI tags everything as food or bowl or plate, which is useless. Custom AI trained on actual SKUs, product lines, packaging variants, and recipe styles produces real searchability across the full content library.
Additionally, food brands often have campaign-specific assets — holiday packaging, limited-edition flavors, regional variants. Custom tagging keeps those searchable long after the campaign ends, so a photo shoot from two years ago is still findable when the campaign is revived.
Best for: Packaged food and snacks | Beverage brands with multiple SKU lines | Restaurant chains | Meal-kit and DTC food | Beauty and personal care CPG
Agencies photograph on behalf of clients. The library is organized by client, but within each client, the same product-level specificity applies. Custom AI tagging trained per client turns a shared agency library into a searchable client-specific asset.
Bonus use case: flagging assets by rights, approval status, or usage — approved for social, PR-cleared, legal-reviewed. When those flags are applied automatically based on visual content (e.g., specific talent or trademarked elements), rights management moves from a manual process to an automated one.
Best for: Creative agencies | Production studios | Content agencies with multiple client libraries
For most brands above 10,000 photos, custom AI tagging is the only economically sensible option. Manual tagging doesn’t scale. Generic AI — still being sold as a flagship feature fifteen years after it was a breakthrough — produces tags that aren’t worth the disk space they’re stored on. Custom AI is the one path that matches the specificity of your business to the scale of your library.
A custom-trained model doesn’t have to stay inside the DAM. Because the same model can be exposed via API, the tagging your team spent a few weeks building can power search and organization across every platform your team already uses.
Practical examples:
The library in the DAM becomes the source of truth. The model trained on that library becomes an asset that lives across your stack. This is the argument for choosing a DAM that offers custom AI tagging with an API over either a closed DAM feature or a bare developer API: you get the productized UI and the cross-platform reach.
Not every “AI tagging” feature marketed as custom is actually custom. The questions below cover both what to look for and what to watch out for in practice.
Is it a DAM feature, or a developer tool?
AWS, Google, and Azure all offer custom vision APIs. They are powerful but they are not photo management platforms — you or your engineering team has to integrate them, maintain them, and build the UI your marketing team uses. For most brands, the right answer is custom AI tagging built into a DAM, not a dev tool that pretends to be one.
Is the model actually trained on your data, or is it a shared model with a keyword filter?
A real custom model learns your visual patterns. A keyword filter just hides tags that don’t match your list — it can only work on terms the generic model already recognizes.
How does the vendor handle your specific category?
Ask directly: how do you handle tagging for our kind of product? A vendor that can explain why fashion tagging is different from industrial-parts tagging, or how their system uses on-product text for electronics, is doing real engineering. A vendor that answers “our AI handles everything the same way” is not.
How many training examples per product?
Systems that work with 20 to 50 examples per item are in the current state of the art. Systems that require thousands of examples are pushing the cost onto you.
Can you retrain or add products incrementally?
Your catalog changes. If every new product requires a full retrain from scratch, the system doesn’t scale.
How long is onboarding?
Realistic: two to four weeks for a serious custom-AI implementation. Much longer and the vendor is doing professional services work they should have productized. Much shorter and they’re overpromising.
What accuracy should I expect?
Well-trained models typically reach 95%+ accuracy. The real-world number depends on how visually distinct the items are, the quality of the training data, and the quality of the photos being tagged.
Does it work on video as well as photos?
For most product-heavy businesses, this is now a make-or-break criterion. A tagging system that only handles stills is leaving more than half of the content library invisible.
What languages does it support?
Custom AI trained on English tags is only useful if your team works in English. Multi-language brands need a platform where both the training and the search interface work in their languages.
Is there an API?
If you want the same trained model to power search across your eCommerce backend, PIM, marketing automation, or internal tools, API access is essential. Without it, the value of the model is locked inside one platform.
What if the AI gets it wrong?
Two things to check. First, most platforms let you correct tags in the UI, and those corrections feed back as additional training data. Second, low-confidence predictions should be held for human review rather than applied silently.
What about data privacy?
Reputable platforms train on customer data in isolation — your training examples and resulting model are not used to improve other customers’ models, and are not shared with the underlying vision provider.
What if I don’t have tagged examples yet?
Most platforms will help you generate them. Common starting points: export product images already labeled in your eCommerce backend (Shopify, Magento, custom PIM), use your last photo shoot’s delivery spreadsheet, or ask the photographer for their raw organization.
Custom AI tagging isn’t a marginal productivity win. It’s the difference between a media library that compounds in value as it grows and one that decays the same way.
A brand that tags correctly from day one can still find the SKU, the colorway, the lifestyle shot six years later. The library becomes a real asset — an indexed archive of the brand’s visual history that marketing, social, sales, PR, and customer service can all pull from in seconds. And because the same model can tag video and be accessed across every tool via API, that asset works everywhere the team works.
A brand that doesn’t tag — or tags with generic AI labels — ends up in the same place most teams end up: with a hard drive full of photos, an institutional memory that lives in one person’s head, and a weekly reshoot of content they already own but can’t find.
The underlying technology has existed for years. The reason it hasn’t been ubiquitous is that almost no platform built it in a way that works for customers outside the enterprise-scale, engineering-team-on-staff world. That’s changing. For product-heavy businesses — retail, rental, hardware, events, food, and the agencies serving them — custom AI tagging is moving from an enterprise-only luxury to a practical requirement.
Part of the Tagbox guides series on media library management.