Skip to content

Streamline upload process on the backend #75

@flaneuse

Description

@flaneuse

Uploading large chunks of data is a pain, since there's not a good way to queue the data to be uploaded, and due to the complexity of the .json validation before ES-insertion, 300 records takes ~ 5 min to upload.

There are at least a few limits to queuing large amounts of data:

  1. The front-end has a limit for how much data it can store in memory for uploading
  2. The backend can only accept I think about 1 MB before it complains; as a result, right now the front-end parses the file into ~ 1 MB chunks to send to the backend.
  3. On the prod server, if there are too many simultaneous requests, the multiprocessing queue can get mixed up and the same record can be inserted multiple times into the index.

Ideally, we could queue a buncha records and let it do its thing overnight. This may involve moving away from the front-end interface, but we'll still have problems with the multiprocessing inserting duplicates.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions