In this paper, we present an automated learning environment for developing control policies directly on the hardware of a modular legged robot. This environment facilitates the reinforcement learning process by computing the rewards using a vision-based tracking system and relocating the robot to the initial position using a resetting mechanism. We employ two state-of-the-art deep reinforcement learning (DRL) algorithms, Trust Region Policy Optimization (TRPO) and Deep Deterministic Policy Gradient (DDPG), to train neural network policies for simple rowing and crawling motions. Using the developed environment, we demonstrate both learning algorithms can effectively learn policies for simple locomotion skills on highly stochastic hardware and environments. We further expedite learning by transferring policies learned on a single legged configuration to multi-legged ones.