In continuous word representation, we actually would like to acquire a dimension-reduced vector for each word. Then these vectors could also be used as input for a neural network. It performs like a recursive way. We concatenate the representations of each word to represent a window, namely, $$k$$ words $$[ w_1,\dots, w_k]$$ and get:
$$ x = [c(w_1)^T,...,C(w_k)^T]$$
We learn these representations by gradient descent (first order compared with Newton's methods). The neural network parameters and each representation $$C(w)$$ within a gradient step:
$$C(w) \leftarrow C(w) - \alpha \Delta_{C(w)} l$$