It4innovations
diff --git a/‎docs/deployment/allocation.md‎
Lines changed: 68 additions & 49 deletions b/‎docs/deployment/allocation.md‎
Lines changed: 68 additions & 49 deletions
diff --git a/‎docs/deployment/cloud.md‎
Lines changed: 32 additions & 24 deletions b/‎docs/deployment/cloud.md‎
Lines changed: 32 additions & 24 deletions
diff --git a/‎docs/deployment/index.md‎
Lines changed: 2 additions & 0 deletions b/‎docs/deployment/index.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/deployment/server.md‎
Lines changed: 11 additions & 16 deletions b/‎docs/deployment/server.md‎
Lines changed: 11 additions & 16 deletions
diff --git a/‎docs/deployment/worker.md‎
Lines changed: 20 additions & 20 deletions b/‎docs/deployment/worker.md‎
Lines changed: 20 additions & 20 deletions
diff --git a/‎docs/faq.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/faq.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/jobs/arrays.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/jobs/arrays.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/jobs/failure.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/jobs/failure.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/jobs/jobfile.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/jobs/jobfile.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/jobs/jobs.md‎
Lines changed: 5 additions & 5 deletions b/‎docs/jobs/jobs.md‎
Lines changed: 5 additions & 5 deletions
@@ -1,46 +1,54 @@
-# Starting HQ without shared file system
+# Starting HQ without a shared filesystem
 
-On system without shared file system, all what is needed is to distribute access file (`access.json`) to clients and workers.
-This file contains address and port where server is running and secret keys.
-By default, client and worker search for `access.json` in `$HOME/.hq-server`.
+By default, HyperQueue assumes the existence of a shared filesystem, which it uses to exchange metadata required to connect servers and workers and to run various HQ commands.
 
-## Generate access file in advance
+On systems without a shared filesystem, you will have to distribute an *access file* (`access.json`) to clients and workers.
+This file contains the address and port where the server is running, and also secret keys required for encrypted communication.
 
-In many cases you, we want to generate an access file in advance before any server is started;
-moreover, we do not want to regenerate secret keys in every start of server,
-because we do not want to redistribute access when server is restarted.
+## Sharing the access file
 
-To solve this, an access file can be generated in advance by command "generate-access", e.g.:
+After you start a server, you can find its `access.json` file in the `$HOME/.hq-server/hq-current` directory. You can then copy it to a different filesystem using a method of your choosing, and configure clients and workers to use that file.
 
-```commandline
+By default, clients and workers search for the `access.json` file in the `$HOME/.hq-server` directory, but you can override that using the `--server-dir` argument, which is available for all `hq` CLI commands. If you moved the `access.json` file into a directory called `/home/foo/hq-access` on the worker's node, you should start the worker like this:
+
+```bash
+$ hq --server-dir=/home/foo/hq-access worker start
+```
+
+!!! tip
+
+    You can also configure the server directory using an [environment variable](./server.md#server-directory).
+
+## Generate an access file in advance
+
+In some cases you might want to generate the access file in advance, before the server is started, and let the server, clients and workers use that access file. This can be useful so that you don't have to redistribute the access file to client/worker nodes everytime the server restarts, which could be cumbersome.
+
+To achieve this, an access file can be generated in advance by the `generate-access` command:
+
+```bash
 $ hq server generate-access myaccess.json --client-port=6789 --worker-port=1234
 ```
 
-This generates `myaccess.json` that contains generates keys and host information.
+This generates a `myaccess.json` file that contains generates keys and host information.
 
 The server can be later started with this configuration as follows:
 
-```commandline
+```bash
 $ hq server start --access-file=myaccess.json
 ```
 
-Note: That server still generates and manages "own" `access.json` in the server directory path.
-For connecting clients and workers you can use both, `myaccess.json` or newly generated `access.json`, they are same.
+Clients and workers should load the pre-generated access file in the same way as was described [above](#sharing-the-access-file). However, you will have to rename the generated file to `access.json`, because clients and workers look it up by its exact name in the provided server directory.
 
-Example of starting a worker from `myaccess.json`
+!!! note
+
+    The server will still generate and manages its "own" `access.json` in the server directory path, even if you provide your own access file. These files are the same, so you can use either when connectiong clients and workers.
 
-```commandline
-$ mv myaccess.json /mydirectory/access.json
-$ hq --server-dir=/mydirectory worker start
-```
 
 ## Splitting access for client and workers
 
-Access file contains two secret keys and two points to connect, for clients and for workers.
-This information can be divided into two separate files,
-containing only information needed only by clients or only by workers.
+The default access file contains two secret keys and two TCP/IP addresses, one for clients and one for workers. This metadata can be divided into two separate files, containing only information needed only by clients or only by workers.
 
-```commandline
+```bash
 $ hq server generate-access full.json --client-file=client.json --worker-file=worker.json --client-port=6789 --worker-port=1234
 ```
 
@@ -56,6 +64,6 @@ For starting server (`hq server start --access-file=...`) you have to use `full.
 
 You can use the following command to configure different hostnames under which the server is visible to workers and clients.
 
-```commandline
+```bash
 hq server generate-access full.json --worker-host=<WORKER_HOST> --client-host=<CLIENT_HOST> ...
 ```
@@ -16,3 +16,5 @@ not required. A common use-case is to start the server on a login of an HPC syst
 [comment]: <> (TODO: describe scheduler)
 
 Learn more about deploying [server](server.md) and the [workers](worker.md).
+
+There is also a third component that we call the **client**, which represents the users of HyperQueue invoking various `hq` commands to communicate with the server component.
@@ -44,19 +44,16 @@ $ hq --server-dir=foo worker start
     $ hq worker start &
     ```
 
-!!! important
-
-    When you start the server, it will create a new subdirectory in the server directory, which will store the data of the current running instance. It will also create a symlink `hq-current` which will point to the currently active
-    subdirectory.
-    Using this approach, you can start a server using the same server directory multiple times without overwriting data
-    of the previous runs.
-
 !!! danger "Server directory access"
 
     Encryption keys are stored in the server directory. Whoever has access to the server directory may submit jobs,
     connect workers to the server and decrypt communication between HyperQueue components. By default, the directory is
     only accessible by the user who started the server.
 
+## Running multiple servers
+
+When you start the server, it will create a new subdirectory in the server directory, which will store the data of the current running instance. It will also create a symlink `hq-current` which will point to the currently active subdirectory. Using this approach, you can start a server using the same server directory multiple times without overwriting data of the previous runs.
+
 ## Keeping the server alive
 
 The server is supposed to be a long-lived component. If you shut it down, all workers will disconnect and all
@@ -98,7 +95,7 @@ have to be connected to the server after it restarts.
 
     If the server crashes, the last few seconds of progress may be lost. For example,
     when a task is finished and the server crashes before the journal is written, then
-    after resuming the server, the task will be not be computed after a server restart.
+    after resuming the server, the task will be recomputed.
 
 ### Exporting journal events
 
@@ -110,7 +107,7 @@ $ hq journal export <journal-path>
 ```
 
 The events will be read from the provided journal and printed to `stdout` encoded in JSON, one
-event per line (this corresponds to line-delimited JSON, i.e. [NDJSON](http://ndjson.org/)).
+event per line (this corresponds to line-delimited JSON, i.e. [JSON Lines](https://jsonlines.org/)).
 
 You can also directly stream events in real-time from the server using the following command:
 
@@ -123,17 +120,15 @@ $ hq journal stream
     The JSON format of the journal events and their definition is currently unstable and can change
     with a new HyperQueue version.
 
-### Pruning journal
+### Pruning the journal
 
-Command `hq journal prune` removes all completed jobs and disconnected workers from the journal file.
+The `hq journal prune` command removes all completed jobs and disconnected workers from the journal file, in order to reduce its size on disk.
 
-### Flushing journal
+### Flushing the journal
 
-Command `hq journal flush` will force the server to flush the journal.
-It is mainly for the testing purpose or if you are going to `hq journal export` on
-a live journal (however, it is usually better to use `hq journal stream`).
+Command `hq journal flush` will force the server to flush the journal, so that the latest state of affairs is persisted to disk. It is mainly useful for testing or if you are going to run `hq journal export` while a server is running (however, it is usually better to use `hq journal stream`).
 
-## Stopping server
+## Stopping the server
 
 You can stop a running server with the following command:
 
 
@@ -1,4 +1,5 @@
-Workers connect to a running instance of a HyperQueue [server](server.md) and wait for task assignments. Once some task
+Workers manage the computational resources of a single computer (node) and use them to execute tasks submitted into HyperQueue.
+They connect to a running instance of a HyperQueue [server](server.md) and wait for task assignments. Once some task
 is assigned to them, they will compute it and notify the server of its completion.
 
 ## Starting workers
@@ -7,8 +8,8 @@ HPC cluster. You can either use the automatic allocation system of HyperQueue to
 workers manually.
 
 ### Automatic worker deployment (recommended)
-If you are using a job manager (PBS or Slurm) on an HPC cluster, the easiest way of deploying workers is to use
-[**Automatic allocation**](allocation.md). It is a component of HyperQueue that takes care of submitting PBS/Slurm jobs
+If you are using an allocation manager (PBS or Slurm) on an HPC cluster, the easiest way of deploying workers is to use
+[**Automatic allocation**](allocation.md). It is a component of HyperQueue that takes care of submitting PBS/Slurm allocations
 and spawning HyperQueue workers.
 
 ### Manual worker deployment
@@ -32,11 +33,11 @@ If you want to connect to a different server, use the `--server-dir` option.
 
     However, if a shared filesystem is not available on your cluster, you can just copy the server directory from the
     server machine to the worker machine and access it from there. The worker machine still has to be able to initiate
-    a TCP/IP connection to the server machine though.
+    a TCP/IP connection to the server machine though. See [this page](./cloud.md) for more details.
 
 #### Deploying a worker using PBS/Slurm
 If you want to manually start a worker using PBS or Slurm, simply use the corresponding submit command (`qsub` or `sbatch`)
-and run the `hq worker start` command inside the allocated job. If you want to start a worker on each allocated node,
+and run the `hq worker start` command inside the created allocation. If you want to start a worker on each allocated node,
 you can run this command on each node using e.g. `mpirun`.
 
 Example submission script:
@@ -69,15 +70,15 @@ Example submission script:
     srun --overlap /<path-to-hyperqueue>/hq worker start --manager slurm
     ```
 
-The worker will try to automatically detect that it is started under a PBS/Slurm job, but you can also explicitly pass
+The worker will try to automatically detect that it is started under a PBS/Slurm allocation, but you can also explicitly pass
 the option `--manager <pbs/slurm>` to tell the worker that it should expect a specific environment.
 
 #### Deploying a worker using SSH
 
 If you have an OpenSSH-compatible `ssh` binary available in your environment, HQ can deploy workers to a set of hostnames using the `deploy-ssh` command:
 
 ```bash
-$ hq worker deploy-ssh <nodefile> <worker-args>
+$ hq worker deploy-ssh <nodefile> <worker-start-args>
 ```
 
 To use this command, you need to prepare a *hostfile*, which should contain a set of lines describing individual hostnames on which you want to deploy the workers:
@@ -109,13 +110,12 @@ $ hq worker stop <selector>
 
 ## Time limit
 HyperQueue workers are designed to be volatile, i.e. it is expected that they will be stopped from time to time, because
-they are often started inside PBS/Slurm jobs that have a limited duration.
+they are often started inside PBS/Slurm allocations that have a limited duration.
 
-It is very useful for the workers to know how much remaining time ("lifetime") do they have until they will be stopped.
+It is very useful for the workers to know how much remaining time ("lifetime") they have until they will be stopped.
 This duration is called the `Worker time limit`.
 
-When a worker is started manually inside a PBS or Slurm job, it will automatically calculate the time limit from the job's
-metadata. If you want to set time limit for workers started outside of PBS/Slurm jobs or if you want to
+When a worker is started manually inside a PBS or Slurm allocation, it will automatically calculate the time limit from the metadata of the allocation. If you want to set time limit for workers started outside of PBS/Slurm allocations or if you want to
 override the detected settings, you can use the `--time-limit=<DURATION>` option[^1] when starting the worker.
 
 [^1]: You can use various [shortcuts](../cli/shortcuts.md#duration) for the duration value.
@@ -126,7 +126,7 @@ The time limit of a worker affects what tasks can be scheduled to it. For exampl
 will not be scheduled onto a worker that only has a remaining time limit of 5 minutes.
 
 ## Idle timeout
-When you deploy *HQ* workers inside a PBS or Slurm job, keeping the worker alive will drain resources from your
+When you deploy *HQ* workers inside a PBS or Slurm allocation, keeping the worker alive will drain resources from your
 accounting project (unless you use a free queue). If a worker has nothing to do, it might be better to terminate it
 sooner to avoid paying these costs for no reason.
 
@@ -152,26 +152,30 @@ This value will be then used for each worker that does not explicitly specify it
 Each worker can be in one of the following states:
 
 * **Running** Worker is running and is able to process tasks
-* **Connection lost** Worker lost connection to the server. Probably someone manually killed the worker or job walltime
-  in its PBS/Slurm job was [reached](#time-limit).
+* **Connection lost** Worker lost connection to the server. Probably someone manually killed the worker or the walltime
+  of its PBS/Slurm allocation was [reached](#time-limit).
 * **Heartbeat lost** Communication between server and worker was interrupted. It usually signifies a network problem or
   a hardware crash of the computational node.
 * **Stopped** Worker was [stopped](#stopping-workers).
 * **Idle timeout** Worker was terminated due to [Idle timeout](#idle-timeout).
 
 ### Lost connection to the server
 
-The behavior of what should happen with a worker that lost its connection to the server is configured
+The behavior of what should happen when a worker loses its connection to the server is configured
 via `hq worker start --on-server-lost=<policy>`. You can select from two policies:
 
 * `stop` - The worker immediately terminates and kills all currently running tasks.
-* `finish-running` - The worker does not start to execute any new tasks, but it tries to finish tasks
+* `finish-running` - The worker does not start executing any new tasks, but it tries to finish tasks
   that are already running. When all such tasks finish, the worker will terminate.
 
 `stop` is the default policy when a worker is manually started by `hq worker start`.
 When a worker is started by the [automatic allocator](allocation.md), then `finish-running` is used
 as the default value.
 
+## Worker groups
+
+Each worker is a member of exactly one worker group. Groups are used to determine which workers are eligible to execute multi-node tasks. You can find more information about worker groups [here](../jobs/multinode.md#groups).
+
 ## Useful worker commands
 Here is a list of useful worker commands:
 
@@ -188,7 +192,3 @@ If you also want to include workers that are offline (i.e. that have crashed or
 ```bash
 $ hq worker info <worker-id>
 ```
-
-### Worker groups
-
-Each worker is a member exactly of one group. Groups are used when multi-node tasks are used. See more [here](../jobs/multinode.md#groups)
@@ -54,7 +54,7 @@ about anything related to HyperQueue, feel free to ask on our [discussion forum]
     each with a single task.
 
     HQ also supports [streaming](jobs/streaming.md) of task outputs into a single file.
-    This avoids creating many small files for each task on a distributed file system, which improves
+    This avoids creating many small files for each task on a distributed filesystem, which improves
     scaling.
 
 ??? question "Does HQ support multi-CPU tasks?"
 
@@ -110,7 +110,7 @@ If `--array` defines an ID that exceeds the number of lines in the file (or the
 
 For example:
 
-```commandline
+```bash
 $ hq submit --each-line input.txt --array "2, 8-10"
 ```
 
 
@@ -9,21 +9,21 @@ recompute only tasks with a specific status (e.g. failed tasks).
 By following combination of commands you may recompute only failed tasks. Let us assume that we want to recompute
 all failed tasks in job 5:
 
-```commandline
+```bash
 $ hq submit --array=`hq job task-ids 5 --filter=failed` ./my-computation
 ```
 It works as follows: Command `hq job task-ids 5 --filter=failed` returns IDs of failed jobs of job `5`, and we set
 it to `--array` parameter that starts only tasks for given IDs.
 
 If we want to recompute all failed tasks and all canceled tasks we can do it as follows:
 
-```commandline
+```bash
 $ hq submit --array=`hq job task-ids 5 --filter=failed,canceled` ./my-computation
 ```
 
 Note that it also works with `--each-line` or `--from-json`, i.e.:
 
-```commandline
+```bash
 # Original computation
 $ hq submit --each-line=input.txt ./my-computation
 
@@ -56,7 +56,7 @@ You can change this behavior with the `--max-fails=<X>` option of the `submit` c
 If specified, once more tasks than `X` tasks fail, the rest of the job's tasks that were not completed yet will be canceled.
 
 For example:
-```commandline
+```bash
 $ hq submit --array 1-1000 --max-fails 5 ...
 ```
 This will create a task array with `1000` tasks. Once `5` or more tasks fail, the remaining uncompleted tasks of the job
 
@@ -22,13 +22,13 @@ command = ["sleep", "1"]
 Let us assume that we have named this file as ``myfile.toml``,
 then we can run the following command to submit a job:
 
-```commandline
+```bash
 $ hq job submit-file myfile.toml
 ```
 
 The effect will be same as running:
 
-```commandline
+```bash
 $ hq submit sleep 1
 ```
 
 
@@ -427,25 +427,25 @@ Here is a list of useful job commands:
 
 ### Display a summary table of all jobs
 
-```commandline
+```bash
 $ hq job summary
 ```
 
 ### Display information about a specific job
 
-```commandline
+```bash
 $ hq job info <job-selector>
 ```
 
 ### Display information about individual tasks (potentially across multiple jobs)
 
-```commandline
+```bash
 $ hq task list <job-selector> [--task-status <status>] [--tasks <task-selector>]
 ```
 
 ### Display job `stdout`/`stderr`
 
-```commandline
+```bash
 $ hq job cat <job-id> [--tasks <task-selector>] <stdout/stderr>
 ```
 
@@ -456,7 +456,7 @@ worker. HyperQueue server remembers how many times were a task running while a w
 If the count reaches the limit, then the task is set to the failed state.
 By default, this limit is `5` but it can be changed as follows:
 
-```commandline
+```bash
 $ hq submit --crash-limit=<NEWLIMIT> ...
 ```