There have a lot simple and good interpretations for argmax, it return arg $x$ when function $f(x)$ is maximum, see below numpu code.

import numpy as np
np.argmax([3,2,1])
#=> 0 #return the first index.

But when we reading papers and books, argmax usually come up with some complex formual which are not easy to understand if you are new to machine learning. Here are two examples.

below formula is come from maximum like hood part, chapter 5, deeplearning book:

\[\underset{\theta}{\arg \max} \prod_{i=1}^m P_{model}(x^{(i)};\theta)\]

In this fomula, argmax will return $\theta$, which are weights when probability in learning mode is the maximum, here arg are $x^{(i)}$ and $\theta$, function is product of $P_{model}$

An other example, language translation section in sequence models of coursera deep leraning course,

\[\underset{y}{\arg \max} \prod_{t=1}^{T_y} P(y^{<t>}|x,y^{<1>},...,y^{<t-1>})\]

Here $x$ is sentance before translate, $y$ is sentence after translate, $y^{<t>}$ is every word in it, e.g, we want translate 没有母牛关 to there is no cow level, $x$ is 没有母牛关, $y$ is there is no cow level and $y^{<1>}$ is there, $y^{<2>}$ is is, $y^{<3>}$ is no, etc.

In this example, argmax will return a sentence, which make function $\prod P$ maximum, this function is ask probability of next word with given sentence and alread tranlated part, e.g given sentence 没有母牛关 and already tranlated part like there is no, word cow will make probability $P(y^{<4>}|x,y^{<1>},y^{<2>},y^{<3>})$ maximum, and if every word have maximum probability, the product of those will be maximum, and finally get there is no cow level.