N++2: Proprietary Neural Network Software

Made for deep learning, N++2 is a multi-threaded neural network simulator for simulating large artificial neural networks with millions of weights. N++2 is programmed in C/C++ and can be optimized for Intel, RISC, and embedded CPUs. During the training procedure, it exploits the multi-core architecture of modern CPUs using several threads for propagating multiple training patterns in parallel. Relying on CBLAS calls where appropriate, n++2 is also able to benefit from a CPU’s Single Instruction Multiple Data (SIMD) – capabilities .

N++2 is able to speed-up training of “simple” multi-layer perceptrons with only one or two hidden layers and a few hundred connections. But its main purpose is to simulate huge neural networks with a number of hidden layers and up to millions of weights in a “deep learning” setting. For this purpose, n++2 offers additional facilities for constructing symmetric autoencoder neural networks, easy layer-wise pre-training and for arranging neurons in two-dimensional layers (helpful for processing images). Besides fully connected layer types n++2 also comes with an efficient implementation of sparse connection structures, including receptive fields and shared-weights. Thus, n++2 can also be used to simulate LeCun’s convolutional neural networks and to combine these sparse techniques with deep learning and layer-wise pretraining.

N++2 uses Martin Riedmiller’s Resilient Propagation (RProp) for fast and reliable training of the neural networks. On a dual quad-core CPU with two threads per core (16 threads in parallel) n++2 is about 27 times faster on real-world deep neural networks (not just benchmarks) than non-parallel simulator cores.

We have python wrappers for the simulator core so it’s easy to use and we’ve also a Theano-based deep learning pipeline in place for developing new network architectures and researching new training algorithms. Once converged, we implement new findings and architectures in the C++ simulator core for optimized speed and maximized platform compatibility.