Massively parallel processor arrays have been shown to be an effective and suitable choice for image processing tasks . More recently, some of the state of the art processor arrays have been used for real-time machine vision tasks such as intelligent transport system applications  or video processing on mobile applications  providing a much more powerful solution than a conventional processor. A number of Single Instruction Multiple Data (SIMD) processor arrays have been implemented on FPGAs -, which are particularly suited to implementing such processor architectures because of their similarities of both being arrays of fine grained logic elements. In this work, we propose an FPGA implementation of a processor array where the processing elements (PEs) are as small as possible, while providing local memory sufficient for processing greyscale images. The PE is then replicated to form an array. A 32 × 32 PE array is implemented on a Xilinx Virtex 5 XC5VLX50 FPGA using the four-neighbour connectivity with the possibility to scale up using a larger FPGA. The processor array operates at a frequency of 96 MHz and executes a peak of 98.3 giga operations per second (GOPS) (bit-serial operations). A binary edge detection algorithm is performed in 52.08 ns. Uploading and downloading a binary image in a 32 × 32 array takes an extra 687.5 ns. Sobel edge detection of an 8-bit greyscale image is performed in 5.33 µs. Uploading and downloading an 8-bit greyscale image in a 32 × 32 array takes 5.36 µs. With larger FPGAs being available in the future, the array sizes comparable to state of the art custom designed ICs can be implemented on these FPGAs.