We consider the problem of load flattening for an electric vehicle (EV) fleet, and use a model-free solution to coordinate their aggregated charging demand using reinforcement learning (RL). In this paper, we define a new Markov Decision Process (MDP) formulation that has linear space and time complexity. We thus improve upon earlier state-of-the-art, demonstrating 30% reduction of training time in our case study, while maintaining similar improvements over both a business-as-usual as well as a heuristic charging scheduling baseline. More specifically, we (i) define new state-action representations and cost functions, (ii) design experiments based on real-world EV charging sessions data to study the impact of different MDP formulations, (iii) learn RL based control policies using Fitted-Q iteration (FQI) algorithm and evaluate them on unseen test data, and (iv) study impact of input parameters (e.g., training sample size) on performance and optimization of RL based control policy (by evaluating performance after each iteration of the FQI algorithm). RL policies learned with our proposed MDP formulations improve the performance of charging demand coordination by 40-50% compared to a business-as-usual policy (that charges EV fully upon arrival) and 20-30% compared to a heuristic policy (that uniformly spreads individual EV charging over time).