For the purpose of this essay when we discuss ‘attention’ we are referring to selective attention. This is loosely when we are exposed to more information than we can process at once so a procedure is needed to selectively attend to one set of information while ignoring others. Hence selective attention pays an important role in information processing in humans and animals.
The amount of attention paid to a stimulus varies along a continuum so the effect it will have at conditioning stimulus will also vary with how much it is attended to. Pavlov noted that the attention animals pay to stimuli can vary by measuring what he called ‘orienting response’. This is the response to a stimulus usually when first presented that allows an animal to fully attend to it. He noted that new stimuli would produce this response and he reasoned it was an adaptive reflex that allowed animals to investigate any changes to their environments. He postulated that the size of the orienting response was therefore a direct measure of how much attention the animal was paying to that particular stimuli. Hence how conditionable a CS is or how associable it is with a US is a reflection of how well attended to it was. 
When a stimulus is presented over and over to an animal it becomes less novel and therefore the animal pays less attention to it. The strength of the OR decreases and this shows evidence of habituation. In reference to conditioning, conditioned learning is shown experimentally to be faster when the stimulus presented is new to the animals and we assume is therefore better attended to. When a stimulus is more familiar we assume it is less attended to as we see a decline in the orienting response. This effect is called ‘latent inhibition’ and reflects a loss of attention on the CS.
This is a common assumption made in both the Mackintosh and Pearce-Hall theories of attention; however, they differ on what they believe determines change in which stimulus we attend to and by how much.
Mackintosh’s theory (1975) is that animals will pay more attention to stimuli that in their experience prove to be good predictors of important events. For animals, this might be a stimulus which predicts the imminent arrival of food (such as smell) or pain (such as an electric shock). This has developed biologically as an animal that quickly detects a signal for food is more likely to survive than an animal that reacts more slowly. As we claimed earlier, stimuli which is better attended is more conditionable, hence animals will learn more readily about such predictive stimuli.  This can also be said to mean that stimuli with high associative strength will be more attended to by animals. However, as we said earlier, we cannot process the information from many stimuli all at once so if many are presented that are indicators of important events then the conditioned stimulus that is the best predictor of the ‘important’ unconditioned stimulus will be the one attended to fully.
Mackintosh’s theory also provides a different account of blocking than other theories such as Rescorla-Wagner. The Rescorla-Wagner model purports that unconditioned stimulus reach a stage where new associations cannot be made because they are predicting as fully as they can.  Mackintosh disagreed and suggested the reason there is sometimes blocking of learning new associations is simply that animals sometimes ignore the second/later stimulus added when trying to induce compound conditioning. Mackintosh actually rejects Rescorla-Wagner’s theory of compound conditioning entirely. Instead, he advocates an equation which accounts for the growth in associative strength to a CS:
ΔV = αᴀ (λ – Vᴀ)
This states that for any CS on any trial, conditioning is unaffected by the properties of a stimulus. For this equation alpha is given for how much a stimulus is attended to, hence the greater this value the more quickly conditioning will occur. These values will be between 0 and 1 depending on whether the stimulus is a good predictor of US compared with other present stimuli. The theory follows that for compound conditioning, conditioning will be normal for the first trial with an added CS, but as the CS is ignored more, blocking effects become evident.
Selective association is the idea that some CS-US relationships are more easily and readily learned about than others. It is no longer the assumption that any CS and US paired together will have equally effective conditioned learning.  For example, Garcia and Koelling (1966) found that rats more readily associate illness with taste than with any audio-visual stimulus. However, the reverse was found when an association was formed between these and an electric shock instead of illness. It is possible this occurs because some stimuli are more relevant to the possibility of events, i.e. taste and illness is a relevant association for checking which foods are poisonous. This suggests that animals figure out which stimulus is the best predictor of important events based on evolutionary processes and/or previous experience. One experiment that supports this is Gemberling and Domjan (1982). Rats were conditioned at only 24 hours old; they were made ill by an injection of lithium chloride or given electric shocks and put in a cardboard box. They learned to associate illness with consumption of saccharin but not being in the box, whereas they learned to associate the electric shocks with being in the box and not with consumption. Since they cannot have learned any other relevant associations in 24 hours of life, it can be assumed the rats are genetically or biologically more predisposed to making some associations over others.
The Pearce-Hall theory of attention makes a very different claim about what factors determine whether an animal pays attention to a stimulus or not. According to some authors who follow this theory such as LaBerge and Samuels (1974), people have two modes of attention that can operate simultaneously, as opposed to only being able to fully attend to one stimulus at a time like the Mackintosh theory. The model is that one of these modes of attention is directed primarily towards new stimuli and learning about how this stimulus interacts with the environment and assessing consequences of it. This attention requires conscious control and is of limited capacity. It is referred to as ‘controlled’ or ‘deliberate’ attention.  One example of this in humans is driving; when it is a novel process, it requires all of our conscious attention to control the processing of information so there isn’t much spare capacity for any other cognitive tasks. However, as this becomes a more familiar process we begin to drive more automatically, leaving more processing mechanism free to attend to other things.
The Pearce-Hall theory (1980) fits in with this as it claims that animals only need to fully attend to a stimulus when learning about it initially. This accounts for Pavlovian conditioning by allowing for attention to be paid to a stimulus when new associations are being formed, but that a CS can be ignored and no longer learned from when this association has been formed. From this point on, any reactions will be automatic and controlled processing is not needed. Controlled processing and therefore attention is directed towards stimuli that need to be learned about. In order to assess whether further learning of associations needs to happen, we can measure how surprising the US that follows the CS is to the animal. If the US is predictable and no surprise is elicited then attention on that CS is no longer needed. A formula can also represent this theory of attention:
α(η+₁) = │λη – Vη│
Here ‘n’ represents the number of trials of conditioning CS and US, and λ-V to express how surprising US is on each trial. When US is surprising, attention to CS will be high on the next trial, as CS becomes a better predictor of US through learned associations, there will be a smaller difference between λ and V and attention to CS will be reduced. 
When applied to compound conditioning, it is postulated in this theory that attention to each element depends directly on how well US is predicted by different stimuli in the compound. V is therefore determined by combined strength of all stimuli in the compound for all trials. However, the Pearce-Hall theory makes the same statements as the Mackintosh theory when it comes to blocking. They both stipulate that the novelty of a new stimulus on the first trial will mean it is more attended to and therefore associations made will be stronger. After this, however, the US will be unsurprising so from then on attention is no longer paid and few further changes will be made to associative strength of CS to US.
Another difference between the two theories is regarding latent inhibition of a CS. The Pearce-Hall theory predicts it should be possible to obtain latent inhibition by repeatedly pairing CS and US as well as presenting CS alone. Learning of the association did happen when this was performed experimentally on rats, and the prediction that playing a tone as a CS before a shock but also alone teaches the rats it is not always an accurate predictor of a shock. Hence, if they have been conditioned to believe it is, attention can be restored and conditioning can happen again, reversing effects of latent inhibition.  Mackintosh theory predicts that pre-training the rats to associate a small tone with a shock will help them learn to more quickly to associate a large tone with a shock as, it would fit the description of best stimulus to predict the event. However, experimentally this does not happen.
There are weaknesses in both theories. In Pearce-Hall, a main weakness is that once a stimulus is ignored, attention can only be restored to it by a surprising event, which is not the case. Hence, this theory too fails to explain all experimental phenomena.
In conclusion, neither theory can fully explain experimental findings on attention. I would argue that we do need both since there seems to be truth in both; however, perhaps an amalgamation of both theories is what is actually needed to explain attention in animal learning. One such theory might be that by LePelley (2004), which incorporates the rules of attention from both theories.  It assumes the reasons for changes in associative strength that Rescola-Wagner does but with the added idea that blocking is due to an interaction of loss of effectiveness of a no longer surprising US, and a loss of attention and associability. However, for this theory to be sufficient at describing the attentional model of learning in animals it would also have to take into account that conditioning does not happen automatically whenever a CS and US are paired, and that prior experiences with a CS can affect its conditionability. Changes in properties of the CS and US would have to be accounted for also. Hence, any future theories would have to take all these things into account if they were to be successful.
 Bouton, M. E. (2007) ‘Learning and Behaviour’, Sinauer Associates, Chapter 4
 Pearce, J. M. (2008) ‘Animal learning and Cognition’, 3rd Edition, Chapter 3
 Pearce, J. M. & Mackintosh, N. J. (2010) ‘Two theories of attention: A review and a possible integration’, C. Mitchell and M.E. LePelley (Eds.) Attention and Learning (pp. 11-39)
 Hall, G., & Pearce, J. M. (1979) ‘Latent inhibition of a CS during CS–US pairings’, Journal of Experimental Psychology: Animal Behaviour Processes, 5(1), 31-42.