Software defined networking (SDN) provides programmable to the networking devices, which makes the network management to be easier than traditional networking paradigm. The distributed controller in SDN is a promising approach to scale the network, and it provides the better quality of experience in today’s emerging internet. In SDN, switches are managed by their associated controller and the load of the controllers are changing according to the flow requests from the switches. As a consequence, uneven load distribution among the controllers become the problem among the SDN controllers. Dynamic migration of switches from a controller with overutilization to another underutilized controller can balance the loads between the controllers, which is termed as switch migration (SM) for the balanced load distribution among the SDN controllers. Several SM approaches have been studied with iterative optimization schemes to select the optimal mapping of switch and controller for the migration. Reinforcement Learning (RL) also has been adopted for the intelligence decision-making in SM domain. However, these schemes assume a limited number of cases in migration action selection. Subsequently, it is difficult for RL agents to finely tune the action selection based on the current state. In the SM domain, such limitation of migration cases is hard to provide the optimal switch-to-controller pair in the training process. Therefore, we proposed a DRL-based SM scheme, NAF-LB, with dynamic weights in the utilization model for the decision making of SM. With NAF-LB, the detection of load imbalance status of the SDN controllers becomes optimized, and the switch can choose more optimal controller from the candidate controller set. To train the SM problem with fine-grained action selection, we utilize the normalized advantage function (NAF). We made the verification for the performance of the proposed algorithm in simulated SDN environments, and the results showed that NAF-LB can detect the utilization status of each resource utilization along with the better load distribution among the controllers. Our proposed scheme improves the load-balancing results by up to 10%–11% over the comparative scheme with coarse-grained action selection in the simulated scenarios.