What’s the story with AI Detection AF (AIAF), Tracking AF (TR-AF), Continuous AF, Target AF, etc?
There's been a lot of confusion about how the OM-1's AI-based subject detection feature works with autofocus tracking. It turns out that the tracking done in normal C-AF mode is a separate system from "Target AF". See the text below for an explanation of the differences.
RDE: Finally, a rather detailed question about autofocus operation: What’s up with AF tracking and continuous AF mode, vs using Target AF in continuous AF mode. It seems that the combination of AF tracking and continuous AF doesn’t work well, to the point that you advise against it. Could you explain a bit about how the various modes work, and what it is about continuous AF mode that interferes with AF tracking?
OMDS: We recommend using Target AF in C-AF when shooting with AI Detection AF. Movements of the subject detected are predicted, allowing AF with high level of tracking performance to be achieved.
OMDS: When tracking is used, tracking the primary subject with the integration of data such as color and total screen vector detection other than subject detection is given priority, so subject detection is handled as part of the information to be tracked, and thus other detection results may be given higher priority than detection of the subject alone.
When tracking, if the subject is lost, AF stops, and is restarted when the subject is detected again, or the button is half-pushed once more. When operating with subject detection alone, operation switches to AF operation in zone settings when a subject is not detected.
OMDS: If you try to shoot the subject, for example a cat or a bird, we recommend Target AF and CAF -- not tracking AF -- with AI subject detection. On the other hand, if you tried some other subject [not an AI subject, just a moving object], we recommend C-AF + TR-AF. Because the AF process is different between C-AF and Tracking AF, so if you use AI Detection AF or AIAF, it’s working only with the information from AI detection, but if you use Tracking AF, the camera AF works not only detection AF, but other information, for example, color information or movement direction and so on. Sometimes, only the AI detection process is better than several AF processes with several pieces of information. This is the reason if you turn on the AI Detection AF or AIAF, we recommend C-AF, but if you tried to shoot a moving subject and its movement is unpredictable, we recommend Tracking AF.
RDE: So I think I understand. The AI detection, the neural network is looking at the image, and all of the colors, the shapes, where the eyes are, that sort of thing. But it’s not relying on the PDAF information to find the subject, it’s just looking at the picture like a human would. On the other hand, when you’re Tracking AF, it’s looking at distance over time and how it’s moving, but not at what the information is like. Although you said that Tracking AF also uses color and shape. So… let me try to boil that down.
RDE: So AIAF is really looking to find the subject just based on what the image looks like to it. And once it identifies the subject, then it will use PDAF just in that area (I think) to focus. Whereas tracking AF, it’s looking for general moving shapes, it’s a blob of red and it’s moving based on distance information and so on. So really they’re just two different things, and trying to combine them doesn’t work well. [Actually, they don’t combine at all, they’re two entirely different systems.]
My summary of how AIAF and Tracking AF work…
There was a lot of back and forth in the above, so I've made this summary a separate subhead here. This is my take on the key points about how AIAF and Tracking AF work, and why you can't combine the two:
- AIAF and Tracking AF are two fundamentally different and separate systems.
- AIAF looks at the scene in front of the camera the way a human would, identifying subjects by their appearance. Once a subject has been identified, the camera uses the PDAF pixels in that specific area to set focus.
- Tracking AF is the conventional AF that we’ve long been familiar with: It uses a combination of distance (from the PDAF pixels), color and shape to identify the subject (or to follow one that you’ve told it to via the user interface), then uses its movement over time to predict its likely future position. The color and shape help it avoid being confused by other objects in the scene as the subject moves around, but there isn’t the sort of AI-based “intelligence” to recognize an object as a specific type of subject.
- If you’re doing continuous shooting with AIAF, the camera is basically re-identifying the subject in each frame and then focusing on it. It doesn’t make any predictions about the subject's future position based on past behavior.
The two processes are quite distinct. Here's a narrative description of how I think they each proceed:
- AIAF’s subject tracking frame by frame is like:
- “Ah, there’s a bird, focus, click
- (next frame) “Ah, there’s a bird. Focus, click”
- etc, etc...
- Tracking AF is like:
- Ok, the human told me to focus on this thing here. It’s a pink, roundish blob about so big, and it’s this far away from me right now. Focus, click.
- Ok, that blob is a bit more to the right now and 2 feet closer to me; Focus, click.”
- All right, between the previous two frames, the blob moved closer to me by two feet and a bit to the right, so I’m expecting it to show up a bit more to the right and another 2 feet closer to me.
- Shift focus by 2 feet. (This can happen ahead of time, before it's time to grab the next frame)
- Look at the scene. Ah, sure enough, there’s that blob right where I expected it to be, but it’s only about 1.8 feet closer to me this time. I’ll make a note of that for next time...
- Focus, click.”
- and so on…
Hopefully that makes it all a bit more clear. AIAF and Tracking AF are two fundamentally different processes, and the camera is either in one mode or the other. In the future, we can hope for the two to be more closely integrated, but it's understandably complex to combine the two separate flows of information. It will require the addition of a whole new higher level of processing, looking at and evaluating the AI-based and conventional distance/shape/color-based tracking information. Ultimately (probably in some distant future), AI could handle the whole job, but the challenge will be coming up with the human-interpreted raw data to use to train the algorithms. It's one thing to manually label tens of thousands of still images to train the AI system with, but creating labeled video streams that also incorporate full PDAF data will be another matter entirely.
Personally, I suspect the integration of distance information into AI-based AF algorithms is something that the R&D departments of multiple camera companies are probably looking at right now. As incredibly capable as modern AF systems are, this is still the area where there's the most room for future improvement. That said though, the speed, accuracy and intelligence of current AF systems have reached levels film-era photographers could only have dreamt of.