Traditional truth discovery approach estimate the truth from a set of values provided by multiple sources, which can be conflicting and heteregoneous. This repository contains Nexus, a proof-of-concept of a solution that improves truth discovery with the addition of a reputation system.
To maximize compatibility across all architectures, the deployment of Nexus is done using Docker. The installation method for Docker is not relevant. Docker Desktop can be used (can be the most straightforward method), but installation through package managers is not an issue, as long as docker compose is available. More information on installing Docker can be found here.
-
Build the Nexus image and launch the containers
docker compose up --build -d
After launching the containers, interaction with Nexus is done through the API available at http://localhost:8000/docs. There are two endpoints at the moment: a consolidate endpoint, which performs the truth discovery process, and a clear endpoint, which clears the database of the reputation system. To send a request to the API, you can use tools such as Postman.
-
A
POSTrequest to theconsolidate/has the following format:{ "objects": [ { "name": "name", "datatype": "string", "claims": [ { "sourceId": "source1", "fact": "Thomas A. Anderson" }, { "sourceId": "source2", "fact": "Thomas Anderson" }, { "sourceId": "source3", "fact": "Tommy A. Anderson" }, { "sourceId": "source4", "fact": "Anderson" } ] } ], "sources": [] }- The
namefield is used to identify the value - The
datatypefield is used to specify the tyoe of the value (string, continuous, or categorical) - The
claimslist contains a list of values provided by the various sources- This list is composed by the pair
sourceIdandfact, corresponding to the identifier for the source providing the value, and the value itself, respectively.
- This list is composed by the pair
- The
-
The previous
POSTrequest returns the following response from Nexus:{ "timestamp": "2025-02-04T15:29:22.613023", "objects": [ { "name": "name", "claims": [ { "fact": "Thomas A. Anderson", "confidence": 0.821, "sourceId": "source1" }, { "fact": "Thomas Anderson", "confidence": 0.797, "sourceId": "source2" }, { "fact": "Tommy A. Anderson", "confidence": 0.786, "sourceId": "source3" }, { "fact": "Anderson", "confidence":0.585, "sourceId": "source4" } ] } ], "sources": [ { "sourceId": "source1", "reputation": 0.5833333333333333, "probabilities" :[ 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.3333333333333333], "ratings": [0,0,0,0,1] }, { "sourceId" :"source2", "reputation": 0.5416666666666666, "probabilities": [ 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.3333333333333333, 0.16666666666666666 ], "ratings": [0,0,0,1,0] }, { "sourceId": "source3", "reputation": 0.5416666666666666, "probabilities": [ 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.3333333333333333, 0.16666666666666666 ], "ratings": [0,0,0,1,0] }, { "sourceId": "source4", "reputation": 0.5, "probabilities": [ 0.16666666666666666, 0.16666666666666666, 0.3333333333333333, 0.16666666666666666, 0.16666666666666666 ], "ratings":[0,0,1,0,0] } ] }- The response includes two important fields:
objectsandsources:- The
objectsfield includes a list of values order by their respective resulting confidence score in descending order, including information on the confidence score, source, and teh value itself - The
sourcesfield includes information on each participating source's reputation, accumulated ratings, and multinomial probability vector.
- The
- The response includes two important fields:
- Author information is omitted at this stage to preserve anonymity.