Thanks for sharing your work. Tried to run the DrQA notebook, which has excellent descriptions by the way. Just tried to spin up an Azure ML instance Standard_NC6 (6 cores, 56 GB RAM, 380 GB disk) and GPU - 1 x NVIDIA Tesla K80, to see if I could replicate the results you list after 5 epochs, but get terrible performance. I suspect that for your training you might have used a different set of hyperparameters.
The notebook contains the following:
HIDDEN_DIM = 128
EMB_DIM = 300
NUM_LAYERS = 3
NUM_DIRECTIONS = 2
DROPOUT = 0.3
optimizer = torch.optim.Adamax(model.parameters())
I suspect that it might be different LR from the default learning rate of Adamax? Hope that you still remember something about the configuration :)
Thanks for sharing your work. Tried to run the DrQA notebook, which has excellent descriptions by the way. Just tried to spin up an Azure ML instance Standard_NC6 (6 cores, 56 GB RAM, 380 GB disk) and GPU - 1 x NVIDIA Tesla K80, to see if I could replicate the results you list after 5 epochs, but get terrible performance. I suspect that for your training you might have used a different set of hyperparameters.
The notebook contains the following:
I suspect that it might be different LR from the default learning rate of Adamax? Hope that you still remember something about the configuration :)