Abstract: While continuous word embeddings are gaining popularity, current models are based solely on linear contexts.In this work, we generalize the skip-gram model with negative sampling introduced by Mikolov et al. to include arbitrary contexts.In particular, we perform experiments with dependency-based contexts, and show that they produce markedly different embeddings.The dependencybased embeddings are less topical and exhibit more functional similarity than the original skip-gram embeddings.