Thanks for your excellent work~
Actually, I rebuild this structure by tensorflow while I met some problem. I discover that logits and topic_addition may own different scale, such as logits distribution are -20-20 and topic_addition may be -1-1. I am not sure add them directly will effect? When I check the training process, I find that the whole netwok rely on logits much more. Is there any wrong?
Thanks for your excellent work~
Actually, I rebuild this structure by tensorflow while I met some problem. I discover that logits and topic_addition may own different scale, such as logits distribution are -20-20 and topic_addition may be -1-1. I am not sure add them directly will effect? When I check the training process, I find that the whole netwok rely on logits much more. Is there any wrong?