Uploading large chunks of data is a pain, since there's not a good way to queue the data to be uploaded, and due to the complexity of the .json validation before ES-insertion, 300 records takes ~ 5 min to upload.
There are at least a few limits to queuing large amounts of data:
- The front-end has a limit for how much data it can store in memory for uploading
- The backend can only accept I think about 1 MB before it complains; as a result, right now the front-end parses the file into ~ 1 MB chunks to send to the backend.
- On the prod server, if there are too many simultaneous requests, the multiprocessing queue can get mixed up and the same record can be inserted multiple times into the index.
Ideally, we could queue a buncha records and let it do its thing overnight. This may involve moving away from the front-end interface, but we'll still have problems with the multiprocessing inserting duplicates.
Uploading large chunks of data is a pain, since there's not a good way to queue the data to be uploaded, and due to the complexity of the .json validation before ES-insertion, 300 records takes ~ 5 min to upload.
There are at least a few limits to queuing large amounts of data:
Ideally, we could queue a buncha records and let it do its thing overnight. This may involve moving away from the front-end interface, but we'll still have problems with the multiprocessing inserting duplicates.