As a new subfield of neural networks, deep learning has proven to be highly effective at tackling difficult learning challenges. It is becoming increasingly difficult to build high-performance solutions of neural networks with deep learning as the dimension of the nodes grows in response to the needs of actual applications. In this study, we propose a deep neural accelerator unit (DLAU), a scalable accelerating architecture for large-scale neural networks using deep learning that uses field-programmable gate matrix (FPGA) as the computer chip prototype, to achieve both performance gains and reduced power consumption. The DLAU accelerator uses tiling approaches to investigate locality in deep learning applications and makes use of three pipelined computing units to increase throughput. The DLAU accelerated is capable of speeds comparable to those of Intel Core2 CPUs, as shown by experiments conducted on a cutting-edge Xilinx FPGA board.