I’m currently contributing to a game that will be something of a “lite-MMO” format. The action will rely on things like hotbars and hotkey actions, so there’s no physics-based collision for attacking hitboxes or arbitrarily interacting with some object in front of you (like opening a door). The big AAA examples would be games like World of Warcraft and Final Fantasy XIV - the latter of which I’ll be referring to frequently here.
As a result, the crux of a system like this is some form of targeting. Ultimately targeting is just a way of communicating to the player what they are interacting with. If you want to cast a fireball at an enemy, you gotta target that enemy first. Want to pick that book up off the ground? Target it, then you can interact with it. Robust interaction systems will do their own thing automatically (or at least as “automatic” as a complex system like that can be). They know how to trigger opening the door or punching the goblin. But targeting is what gives it focus, telling both the player and the interaction system what is… well, the target.
So, for this game we needed a targeting system. I took on the challenge of figuring out what that looked like and implementing it. It was hard. The technical side took some work, but the hardest part is building it in such a way that it feels intuitive to use. So intuitive that it feels like the game is reading your mind when you use it.
This will be a bit of a technical write-up with a lot of focus on what each piece in the system looks like and what purpose it serves. The game is also being built in Godot, so much of the perspective will come from using that engine’s tooling. But first let’s cover what “targeting” should be in this context.
Targeting + Interaction
Quick definitions for the sake of this discussion:
Interaction = Anything the player can engage with that changes world state.
This will always require some physical input from the player, whether that be pressing a button or positioning their character in relation to some other thing. Simple examples would be things like opening a door, attacking an enemy, or talking to an NPC. You can get more nuanced and argue something like standing in a PvP control zone to “capture” it is also a form of interaction.
Targeting = A visual interpretation of the player’s current focus.
The player might simply target something and not interact with it (though maybe their character turns their head to look at it?), but they are likely targeting it with some intent. What’s the name of that creature? Can I target this random wall? Oh, I’m being attacked and need to fight back!
In our context we’ll assume the player can only directly interact with on object at a time. Something like an AoE attack would be a form of interaction, but the player doesn’t “target” the AoE itself (or the enemies it’s attacking) and it doesn’t make much sense to allow the player to target multiple creatures at once for an attack. That gets tough on the UI side to clearly represent and can be cumbersome for the player to manage.
With that in mind, this is our barebones requirements for a targeting system for the time being:
The player can “interact” with one thing at a time, therefore the player can also only have one “target” at a time.
The player can only interact with something they are currently targeting. So targeting must first occur before the player interaction is initiated.
The Window
When playing a game the player needs some way of perceiving the game state. You can see where on the board your opponent is and you can see how much cash they have in front of them (if they aren’t hiding some like a dirty cheater). You can hear the crack of a bat against a baseball telling you it was strong hit. For a video game, the visual component is naturally a “screen” of some form. So the player can only see the game state through their TV or monitor.
Our targeting system is intricately tied to this screen. It may have a sound effect when you target or un-target something, but the main purpose of the targeting is to visually show the player “THIS IS THE THING YOU ARE LOOKING AT”. Like a big arrow above an enemy or a circle around their feet. And 99.9% of the time the player will want to target something they can see. Another way to look at it is the player will NOT want to target something they can NOT see, or at the very least they don’t want to target something they don’t know can be interacted with.
This adds another limitation to our targeting system. For the player to target something, they must be able to SEE that target. Add it to the barebones list:
The player can only target something displayed on the screen.
Where to start?
For now, this is enough to get us started. However I’m going to add one more as a personal addition:
The player must be able to engage the targeting system with both a controller and mouse & keyboard.
I have wrist issues so using a mouse for lots of small, rapid motions will shut down my gaming session after ~5 minutes. So I want players to be able to use nearly any input device to successfully utilize the targeting system.
While it might not seem like much, this will add quite a bit of complexity to the problem. To make our lives easier though, let’s look at an example of a game that implements a targeting system like this: Final Fantasy XIV.
I could gush about how much I live FF14’s controller support generally, but we’ll laser in on how their targeting system works. This system has become commonly known as “tab targeting” due to games using the literal “Tab” key on the keyboard to select a target. A player would press Tab to select a target, press Tab again to select the next target, and press Shift+Tab to select the previous target. Or if you have a mouse you just left-click on the thing.
Don’t too hard about the actual “tab” or mouse targeting parts for now, we’ll come back to that in a bit. Right now we need to focus on what that targeting system functionally does.
One input will select a target
If there is not target currently selected, it will pick a new target
If the player is currently targeting something, it will select the “next” target
One input will select the “previous” target
We can get way, way deeper into that rabbit hole… but we won’t for now. Instead let’s finish the food on our plate before we decide we need seconds.
FF14 Targeting Breakdown
To give yourself a better idea of how FF14 does its targeting system, I strongly recommend reading this guide from AhkMorning. If you don’t want to read through all of it, focus on the bits under “Tab Targeting” as that basically covers everything I’ll be writing about here.
This is what I used to capture the requirements for my own targeting system (though I stripped out a LOT of what is mentioned here).
“Controller Targeting - FFXIV” by AhkMorning: https://www.akhmorning.com/resources/controller-guide/controller-targeting/#quick-targeting-reference
Tab some Targets
OK, recap our barebones list that is quickly becoming less barebones:
The player can “interact” with one thing at a time, therefore the player can also only have one “target” at a time.
The player can only interact with something they are currently targeting. So targeting must first occur before the player interaction is initiated.
The player can only target something displayed on the screen.
The player must be able to engage the targeting system with both a controller and mouse & keyboard.
The player needs an input for selecting a new/next target and an input for selecting a previous target.
In a feigned attempt at making this article only “too long” instead of “absurdly long”, I won’t be covering the intricacies of what interaction looks like or how to bind the inputs. Assume that “interacting” with an object is a simple button press, and any form of selecting a target is some other simple button press. The inputs will matter when it comes to the input method (controller vs m&k), but I’ll cover that as it becomes relevant.
Our broad goal is to allow the player to target something while simultaneously communicating what object the player is currently targeting. Now is where we begin the technical parts.
The Tech Tree
First thing’s first: we have to more clearly define what each of our goals above look like.
The player can only have one target at a time.
OK, simple enough. We need something to store a reference to a single object at once that will represent the player’s “current target”. That reference also needs to be able to be `null` in case the player doesn’t currently have a target selected. That’s not too bad.
Targeting must occur before a player can interact with something.
This gets a bit more nuanced. There is a hidden bit of complexity in here that I’ve intentionally glossed over thus far: nowhere in this requirement does it say the targeting and the interaction have to be distinct, separate player actions. In other words, if the player hits the interact button without a current target, the system can try to select a new target first and then automatically start interacting with that object.
We’re only saying that the interaction system should only work when the player has a target currently, even if the interaction logic is just initiated later on the same frame. Alternatively, you could think in reverse and say the following: interaction will always fail when the player doesn’t have a target and can’t acquire a new one. This muddies the water a bit on defining requirements for the interaction system though, so let’s avoid that.
Here’s the takeaway: If the player tries to interact while they don’t currently have a target, the targeting system will be engaged to try and find one first - then the interact logic will be engaged if a target is found.
The player can only target something displayed on the screen.
I usually like a requirement that limits scope, but this one became my personal hell. The game I’m building this system for is a 3D game, so something “on the screen” is really “something in view of the camera relative to 3D world space”. That may sound scary, but it’s honestly a pretty solved problem - Godot has lots of ways to make this fairly easy. But it took me longer than I’d like to admit to finally get everything to click in to place.
We’re not there yet though! Boiling it down, this requirement is also saying that the player can NOT target anything that is NOT on the screen. In our game’s backend it’s easy to track all of our targetable objects (like an enemy list), so when the player wants to target something we’ll need to ensure we only consider the objects in that list that are ALSO visible to the camera. More on that later.
The player must be able to engage the targeting system with both a controller and mouse & keyboard.
Not much to this for the time being. Controller doesn’t have a cursor to poke at things on the screen with, just digital buttons (no, we’re not considering using the analog sticks for the targeting system, don’t even try). We could implement a virtual cursor (OK, you can try a little bit), but I’m not going that route for now.
This is where the “player mind-reading” comes into play. When the player clicks directly on a target, it’s pretty easy to assume what they are trying to target. But if they are ever so slightly off, the targeting would miss! That doesn’t feel good, so we’ll need to add some leeway. And controller doesn’t have any specificity at all. If the player doesn’t have a target and they want to select a new one, we’ll need a way to decide what they’re trying to target.
Unfortunately this gets really fuzzy, so this falls more into the design space. I’ll cover it below as well, but the guide I linked above describes and visualizes it pretty clearly. So go read that if you get sick of reading my words.
The player needs an input for selecting a new/next target and an input for selecting a previous target.
There will be buttons for '“select next target” and “select previous target”. We’ll bundle “select NEW target” in with each of those, so if either is pressed and the player doesn’t currently have a target we’ll just trigger some other logic to automatically find a “new” target. These buttons will also be on the keyboard, so we’ll have true “Tab” targeting as well (or whatever key you want to bind it to). The mouse will also be able to select a target by clicking on an object.
Oh, and don’t forget we need a “de-select target” button so the player can clear their current target and not just target that NPC while they waltz around the whole town. Easy enough with either input method.
Build Mode
Do you like technical requirements? I like technical requirements. The ones above kinda suck, but we’ll make do. Fair warning: I’ll be getting more specific from this point forward as I dig into how I implemented most of the technical pieces in Godot. Let’s get our hands dirty!
Targeting with Godot
I like to try and build my game systems to be as componentized as I can. I’m not a pro that can envision all the components instantly, so it takes some time for me to finally settle on the architecture of it all.
I’ll do my best to describe each of the logical components I came up with. To do so, I feel the need to cover one concept in digital 3D worlds in order to properly describe the rest.
AABB
An axis-aligned bounding box, or AABB, is a representation of a 3D object in the form of a rectangular box. Even if it’s a complex creature with lots of geometry, it will have an AABB to help represent that object in 3D space.
If you want to get a deeper understanding off AABBs and their use cases, I’ll let you do that research on your own. For simplicity’s sake I’m going to describe the AABB as 8 points in 3D space - basically two equally sized squares on the top and bottom of the object. Whenever I mention an AABB, just picture the object inside a cardboard box that perfectly encapsulates the extremities of the object.
This gives us 4 points at the bottom of the object which should be close to whatever surface its on top of, such as the ground for a enemy. We also get 4 points on top of the object giving us easy access to estimate how “tall” the object is. As a whole, we will also use the full 8 points to vaguely estimate where the object is within our 3D world.
All of this is critical to how we’ll design our targeting system, so keep that in mind as we go forward.
Targeting Indicator
To inform the player of their current target, we’ll display something visually to describe an object as that current target. My solution was a circle near ground level of the object alongside an arrow that floats a little above the highest point of the object. The visual part of these is easy enough, but the logic part is a little trickier.
Our first usage of the AABB shows up here. We can use one of the lowest vertices of the AABB to estimate what height the circle should be placed. We can do the same for the arrow by using one of the highest vertices. We’ll also add a small buffer to the height of the arrow so it doesn’t sit literally right on top of the object, let it float up above a little bit.
Next we run into an issue with the circle though. If the object is really wide, we need to adapt the circle’s size to surround the object. We can use the AABB again to accomplish this: use two corners of the lowest square of vertices, split the distance in half, and use that as the radius for our circle. By using the corners to derive the radius, we’ll draw a circle that perfectly touches the edges of the AABB square. Increase that radius a little bit to give it some breathing room and it should look pretty decent.
Targetable Component
If an enemy can be “targeted”, then it needs to be “targetable”. Same with a book, a door, an NPC, whatever. If you want to get nit picky, we will also define that something that is “targetable” is actually just something that can become the “current target”.
Our Targetable component will need to handle setting up anything that object needs to become that current target. This primarily means that the broader system needs to know that this object is in fact a valid target. It will also need to handle visualizing this object as the current target, which may vary based on the object itself.
Remember that we defined a valid target as something on screen. Godot has some built-in helpers for this to determine if an object has entered or exited the screen view. So the Targetable component with instantiate one of these and attach it to the body of the object. We’ll use that information for another component later, but at least it’s wired up now!
To visualize it, the Targetable component also instantiates an Indicator as was described above. Attach that to the body as well and the Indicator’s logic will handle the rest. The one caveat is that the indicator shouldn’t display unless this object is actively the current player target. So we won’t display it by default, then when this object becomes the current target the Indicator can be made visible.
Targeting Manager
Many systems in games will need some logic that sits above the rest, outside of the scope of levels and characters. These can be described as a “global” class because they persist regardless of what the game may be currently doing. This manager component will be the global class for the targeting system.
The Targeting Manager gets an easy job though. We really only need a global class to connect the dots of our smaller components - notably our Targetable objects. So whenever Targetable enters or exits the screen, we’ll have it register itself with the manager to say “Hey, I’m on the screen now!”
The Manager will then keep track of which objects are on the screen in a list. And if an object leaves the screen view, the Manager will remove it from the list if needed. This also abstracts the logic of “what is a valid target” away from other parts of the system, so we could alter how this list is utilized (or even add more to it) without affecting other bits of targeting logic.
Player Targeting Component
At a bare minimum, the player needs to have a way to engage the targeting system. We have our player as its own componentized structure as well, handling things like movement and interaction. So really we just need to tie this system into that object like all the rest.
This player targeting system should have one primary job: Process inputs and execute certain pieces of logic relevant to the input pressed. Those pieces of logic will be responsible for finding a target based on whatever logic it deems fitting. The only other thing this player system needs to do is keep track of the player’s current target.
Thankfully a pretty simple component, but this is the end of the easy stuff. Next come the bits of logic that actually find a target… whatever that means…
Targeting Logic
Finally I can stop dancing around this part and get specific. This section is where the hard stuff comes in. We have some components that create a sort of general mapping between the player and a possible target. But we will almost never have only one possible target in the game. We want multiple enemies to fight at once, towns full of lively NPCs, and lots of bits or bobs for the player to interact with.
The player can be really specific by selecting a target with the mouse. But the only reason the mouse cursor is specific is because it is pointing at a particular pixel on the screen. Again, the player can only select a target on the screen. The mouse cursor inherently can only “click” at stuff on the screen.
If the player instead tries to use the traditional “Tab” targeting (regardless of input tool), we’ll need to simply use quantum physics and computing to enter the player’s brain and extract the singular, subconscious thought that is their intended target. Unfortunately, Godot doesn’t support that feature… nor do any computers that I’ve used. We’ll have to get clever.
Player Intent
We may not be able to read the player’s mind, but we have a pretty good idea of what they’re looking at. It’s the screen! And how do they control what’s on the screen? That’s right, the camera! The player moves their character and points the camera towards whatever they want to visually focus on. We’ll need to use that to our advantage and stretch it as far as we possibly can.
Of course, we can’t simply say “because it’s on screen, it becomes the new target” since multiple targets may be on screen at once. We’ll need a way to prioritize those targets and pick the one that we think makes the most sense to the player. Our saving grace is that player’s don’t tend to manipulate their camera randomly. They typically want to put whatever object is their current focus directly in the center of the screen. Further, the player will usually try to align their focus relative to their character’s position on the screen. Typically this will manifest as the player trying to get their character to “look” directly at their current focus.
Now we have a hint of where to start. We should probably prioritize objects near the center of the screen, and to some extent objects that appear to be "in front” of the player character. This is the design space where we have to decide what parts of the screen are “higher” or “lower” priority. Solutions will vary from designer to game to player and back. Personally, I’m going to mimic what FF14 does so I don’t have to reinvent the wheel.
Prioritization Strategies
I decided to implement 3 forms of target selection in my system:
Mouse cursor targeting: If the player clicks on a target, the system will select that target.
Scan targeting: There are inputs for “left scan” and “right scan” that result in selecting a target that is in that same relative direction. If no target is currently selected, it will select whatever target appears to be nearest to the center of the screen.
Auto targeting: There are two generic inputs for “next” and “previous” target. Targets that appear to be closer to the player character’s frontal cone of vision have higher priority, while those further away have lower priority.
In my opinion, this gives just enough control to the player to select their desired target. You can interweave using the “scan” and “auto” targeting freely to jump between different prioritization styles as well. The mouse targeting is pretty straightforward, but I’ll cover what the others do in a little more detail. But first I need to describe how these strategies actually determine where a targetable object is on the screen.
Positional Screen Casting
When an object enters the camera view, really what happens is that the 3D position of that object is determined to be within the 2D image the game is currently represented by. Keep in mind that the entire visuals of the game are captured on a 2D screen (not considering VR implementations). So the 3D world is actually displayed as a 2D image of pixels and colors and magic.
Because the player uses the screen itself as their targeting implement, it doesn’t make much sense to use virtual 3D positions to determine anything. We’ll instead use the screen itself as our sort of sandbox to play around in. This means we need to project an object’s 3D positions onto the screen space, converting vertices in 3D world space to 2D screen space.
Defining Screen Space Priority
The camera gives us a 2D space that defines the limits of our game view. If the game is in fullscreen, it’s basically the resolution of the screen in pixels. If it’s windowed, it’s just the window bounds.
To define what parts of that space get priority, we need to define “sections” of that window. For example, let’s do a mental exercise. Take your screen view and split it into even quarters by mentally drawing a plus ‘+’ shape into your screen. You get four corners of the screen divided by that shape. So then you could say the top-left gets highest priority, top-right gets next priority, and so on.
To determine if an object falls into any given space, we’ll then use the bounds of each of those inner sections of the window as a priority order. If at least some small part of the object is within that partial screen space, we’ll consider it to be of relative priority to that space. So if an object sits within the top-left and top-right sections, we’ll just consider it to be part of the top-left section since it has higher priority.
Extrapolate this concept further and you can basically draw any shape onto the screen to define some partial screen space. You can also add as many as you want to have more layers of priority. This essentially represents the entirety of the design side of prioritization. Now we need the technical part: how do we determine what part(s) of the screen an object is in?
AABB-CCasting and PPain
I hope your linear algebra is up to snuff, because mine sure isn’t. In order to convert 3D data into 2D data we have a few steps to overcome first. I’ll skip (some of) the trials and tribulations that got me to these solutions, just be ready for math.
Every object in 3D space is made up of a whole bunch of vertices. These are defined by the mesh of the object. We could try and cast each of those to the screen space, but some objects will have a lot of vertices. To loop through every vertex of every potential target just to see if it might be within a single polygon of partial screen space could get expensive.
AABB arrives to save the day! Every object also has an AABB, which is only 8 vertices. It’s also an estimate of that object’s position, but by representing the extreme bounds of the object we can get a pretty good idea of where it sits. There is just one problem: AABBs have no idea what “world space” is. They are only aware of their own local orientation.
To resolve this, we need matrix math. In Godot, each object also keeps track of where the object is in “global” or “world” space - including translation, rotation, and scale. So we can use that to basically convert our AABB’s local space into global space, giving us a box of vertices that actually represent where the player views that object to be.
Warning: Matrix math order matters!
Here is where one of my biggest headaches came into play. I tried what felt like a million iterations of this math and followed every online resource I could find trying to get the effect I wanted. Every time I came up with some weird nonsense.
An object’s transform matrix defines how it’s positioned in 3D space. Godot helpfully allows you to simply “multiply” this matrix by an AABB to convert it to match that transform’s space. When I first wrote this logic, I wrote the matrix math backwards! I had to step back and remind myself of how matrix math works, particularly in relation to 3D computing.
To convert one transform into another’s relative space, Godot let’s you “multiply” them together like you would two numbers.
var my_matrix = matrix_one * matrix_two
The tricky part here is that the first operand needs to be your desired end space and the second needs to be the one you are trying to convert to that space.
So I had the following:
var my_matrix = aabb_transform * global_transform
This basically is saying “convert this global transform matrix to this AABB’s local space”. But I wanted the opposite: I wanted to convert the AABB to global space!
I needed to swap it for:
var my_matrix = global_transform * aabb_transform
That’s it really! Now the AABB of the object I’m considering has been transformed to the global space. Now I can take each vertex and project it to screen space. I was just starting to put the pieces together reviewing some actual linear algebra matrix theory when I found this part of the Godot docs that calls this out directly. Scroll down a little from this link to find a little blue box with a warning about matrix math order: https://docs.godotengine.org/en/stable/tutorials/math/matrices_and_transforms.html#applying-transforms-onto-transforms
Godot makes this easy via their Camera logic with a simple function call, so I loop on the 8 AABB vertices and convert them to screen space. If one of them falls into the polygon of space I’ve arbitrarily defined, then I’ll consider the object to have been “detected” within that polygon of space.
OK, math is done now. Let’s cover the final component and how it ties everything together.
Target Detector
A Target Detector does exactly what it sounds like it does: it detects targets. The default behavior of this is to cast object positions to the screen - this is where the AABB math logic above is applied. But you can also override the logic to have it detect objects in really whatever way you want.
The Manager is tracking (effectively every physics tick) whether or not a Targetable object has entered/exited the screen. So the Detector will retrieve that list and scan through each object. If an object’s AABB has a screen-projected vertex that falls inside this Detector’s screen-aligned polygon, then the object is considered “detected”.
I opted to also do this check every frame to keep an accurate and consistent list running for each detector. I was worried that if I only checked this right when the player tried to select a target, there was a chance that the game would experience a tiny amount of lag when it ran all the detection logic. So I went with the slightly more expensive but constant method of checking every frame.
This is part of why I wanted the optimizations I mentioned above and is a big part of why I’m using AABB vertices in particular. Godot is well optimized for processing AABBs and other matrix math like this, so it felt like a reasonable solution for now. I figure it could always be adjusted later if need be, but for now it runs great!
Detector usage
This implementation actually makes for a surprisingly unified solution for any targeting logic. The Scan and Auto targeting logic simply have several Detectors defined, each with their own polygon of screen space. The targeting logic then organizes the objects tracked by those Detectors. If a Detector has multiple targets in its list, the higher-level logic will decide how to prioritize that sub-list before prioritizing the remaining Detector targets.
For the mouse cursor targeting, I just use a regular Detector. It defines a polygon of space (vaguely a circle) and the targeting logic just moves it along with the mouse cursor. So it isn’t static on the screen, but it behaves exactly the same otherwise.
Scan targeting uses several vertical box shapes that span the vertical height of the screen. One sits in the center, and then 3 on each side to check various parts of the left and right sides of the screen.
Auto targeting first uses a small, wide cone vaguely “in front” of the player character. Next it checks a broad space that covers most of the area “behind” the player and to their sides - this is to check for anything that might be right on top of the player even if they aren’t facing it directly (eg. if an enemy sneaks up from behind). Finally it checks progressively taller and more narrow cones “in front” of the player again to find stuff further and further away.
Limitation
There is one big limitation of this implementation, particularly around using the AABB. Because the AABB only represents 8 points at the extreme top and bottoms of an object, any object that is fairly tall or very close to the screen may not have its AABB points inside a given polygon. So if an object is right up in the camera and clearly covering the entire screen, its AABB points will be both above and below the camera view - therefore the points aren’t detected by anything that relies on screen space and the object can not be targeted whatsoever.
Good news though! There are many ways to resolve this. I opted to only use the vertices of the AABB to determine it’s projected screen position. You could just as easily calculate the edges of the box and project those instead. That would cover more space generally and shouldn’t be too computationally expensive.
You can also simply override that logic, who needs screen space anyway? Set up your Detector to use whatever logic you want really. This already has to be done to some extent when Detectors are tracking multiple objects at once. The Detectors don’t determine the prioritization of those objects - the higher targeting logic does. So you can play with changing both the Detector behavior and the targeting logic as a whole to create various combinations.
In my solutions, my mouse cursor logic first checks that little shape for objects, but if it doesn’t find any it will simply select whatever object has a closest projected position to the mouse cursor position. The other targeting systems will prioritize objects closest to the player in 3D space - exactly how I said we shouldn’t do earlier on. Sometimes you gotta break the rules to make something great!
Wrap up
OK, let’s put it all together with a somewhat neat bow if you skew your eyes and squint a little:
Objects that can be targeted are given a Targetable component.
The Targetable component will manage a visual Indicator and wire up the object to inform the Manager when it enters the screen.
The Manager keeps track of a list of all Targetable objects that are currently on the screen, and are therefore valid targets that could be selected.
The player object has a Targeting System that responds to player input. It then will execute various forms of targeting logic based on what input was received.
This system is also what tracks the “current” player target, which is needed for some of the targeting logic when determining what the “next” or “previous” target should be.
Targeting Logic maintains a set of Targeting Detector objects that will track targets. This logic defines the priority of those targets in whatever way it sees fit. When the player system engages this logic, it will decide what relevant target should be the new “current” target”.
Targeting Detectors simply define how a target is detected. By default, they draw a polygon onto the screen and check if any valid targets (from the Manager) are visually positioned within that polygon.
Woof, got my first article done and it was a technical deep dive. Maybe I’ll try something lighter next time? Probably not, we’ll see.
