Did you shuffle the training data after every training epoch?
Did you shuffle the training data after every training epoch?