Skip to content

WIP Support for the new riak_control#125

Closed
hmmr wants to merge 22 commits into
OpenRiak:openriak-3.4from
TI-Tokyo:tiot/openriak-3.4/riak_control
Closed

WIP Support for the new riak_control#125
hmmr wants to merge 22 commits into
OpenRiak:openriak-3.4from
TI-Tokyo:tiot/openriak-3.4/riak_control

Conversation

@hmmr

@hmmr hmmr commented Mar 4, 2026

Copy link
Copy Markdown
Contributor

This PR implements a set of new HTTP API endpoints for riak_control (external app):

  • /system_info, returning riak, otp versions, uptime;
  • /cluster, emulating riak admin cluster commands;
  • /tictacaae, returning riak admin tictacaae treestatus;
  • /vnode, for per-vnode backend status report (identical to output of riak admin vnode-status);
  • /security, emulating parts of riak admin security (i.e., users, groups, grants, permissions).

Andriy Zavada added 12 commits February 18, 2026 21:49
* /system_info, returning riak, otp versions, uptime;
* /cluster, emulating `riak admin cluster` commands;
* /tictacaae, returning `riak admin tictacaae treestatus`;
* /vnode, for per-vnode backend status report (currently
  leveled only);
* /security, emulating parts of `riak admin security`
  (i.e., users, groups, grants, permissions).
now printing a JSON object
to work with json parsers that may choke on very large ints
@martinsumner

Copy link
Copy Markdown
Contributor

I have some general concerns here: about the proliferation of rest endpoints, the change of default security stance, and the exposure of those not controlling security through "riak security".

I have an alternative suggestion as an approach, to have a single eval endpoint, that:

  • Is disabled by default, unless configured to be enabled in riak.conf (perhaps by not adding the route at startup, or having an initial check in the forbidden callback).
  • Is split into three categories by URL parsing: monitoring; configuration; security.
  • Has a URI of /eval///?encoded_args=<b64_encoded_args>.
  • In the rest endpoint has lists Mod/Fun/Arity tuples that defines the supported evaluations for each of the three categories.
  • Has three independent controls in riak.conf, to further enable the monitoring/configuration/security category of the eval endpoint.
  • Returns the output of evaluated functions as a binary payload (probably using term_to_binary/1).
  • Should have additional security on top of the configuration controls to restrict access by user, using riak security.

The aim here is:

  • to make everyone secure by default, even when riak security is not enabled.
  • to allow for riak control or other observability tools to expand what they support (e.g. access to riak_kv_util:profile_riak/1, rekon functions, microstate accounting, logger changes etc), simply by adding a new M/F/A tuple to the appropriate list, rather than a new endpoint.

Obviously this creates a requirement for the observability tool to handle term_to_binary/binary_to_term - but I know this has been done in python before to evolve Riak services, so I guess it should be possible to do in elm as well?

Perhaps I'm being selfish here, but especially if there are other observability tools planned, I don't want to have a flood of changes to riak_kv to support lots of new REST endpoints. We have riak eval at the moment for generalised console commands, so why not have the equivalent (with appropriate security settings) for generalised observability commands?

It also think the single endpoint with the proposed URL scheme would make life much easier for those that audit and restrict control to Riak via proxy policies.

@martinsumner martinsumner moved this from Todo to In Progress in OpenRiak 3.4.2 Mar 6, 2026
@hmmr

hmmr commented Mar 9, 2026

Copy link
Copy Markdown
Contributor Author

I agree on all points except this one: execution of arbitrary Erlang code in the fashion of riak eval. The reasoning is that whilst allowing this functionality via riak eval makes sense from security point of view (if someone has ssh access to the box running riak who shouldn't, admins have a much bigger problem already), allowing this via public http would require extremely tight security on that endpoint. In other words, there should be a defined set of requests, all POSTs to a single endpoint (AWS-style), with request parameters and responses in JSON bodies. Not sure we should consider XML for this purpose.

In its current, primitive form, users that will have access to riak from riak_control, have to be created in advance, with riak admin security add-user, and have a riak_control permission granted. This can only be done by operators with ssh access to the riak host. While it is practical, I was wondering if such riak_control users should even be in the same class as other users created in and existing in riak security. How would the three level of access be implemented -- as new permissions granted via riak security add-grant, or in some other way? Specifically asking @WarpEngineer to weigh in on this matter. Also, is basic auth adequate and sufficient?

@WarpEngineer

Copy link
Copy Markdown
Member

I have concerns about eval. Using eval in any language for anything is considered very bad security practice and should be avoided unless absolutely necessary. If we want to reduce the number of endpoints then we can use query parameters or use maps in the body, or some other idea. The more secure option is to add endpoints for different functions, even though it may increase the number of endpoints. We can use common code on the backend to reduce code duplication so multiple endpoints will call the same backend functions, and this will keep the attack surface minimal. If we want to try to keep this as REST-y as we can then that does mean more endpoints, not fewer.

For the authentication parts, normally there would be a 'super' admin that can't be deleted or modified that's able to 'fix' anything that others broke. In our case, logging onto the server and using the CLI solves this issue.

I didn't look through the diff in detail yet, but I did skim it and I don't see any additional grants added. I believe we only have a few grants for buckets (riak_core.) and a few for objects (riak_kv.). So I'm not sure how this UI is going to be protected. Are we going to add new grants to allow this kind of system administration? Something like riak_core.admin or riak_control.add_user or something? I don't think we should allow any user in the system to access this. I suggest a separate grant for each endpoint at a minimum. (Note, however, that these grants must not impact the CLI 'super' admin functions)

@WarpEngineer

Copy link
Copy Markdown
Member

I found it:

Res = riak_core_security:check_permission(
          {"riak_kv.riak_control"}, Security),

This is good. I still suggest we have more fine-grained controls though.

@martinsumner

Copy link
Copy Markdown
Contributor

I agree on all points except this one: execution of arbitrary Erlang code in the fashion of riak eval. The reasoning is that whilst allowing this functionality via riak eval makes sense from security point of view (if someone has ssh access to the box running riak who shouldn't, admins have a much bigger problem already), allowing this via public http would require extremely tight security on that endpoint. In other words, there should be a defined set of requests, all POSTs to a single endpoint (AWS-style), with request parameters and responses in JSON bodies. Not sure we should consider XML for this purpose.

My suggestion was that the eval API would be constrained by a three lists of {Module, Function, Arity} tuples, against which the request would match (with separate lists for monitoring, configuration and security so that these can be specifically enable/disabled). So I don't think I was suggesting "execution of arbitrary Erlang code" ... which I hope means we're in agreement in principle, although the how to implement the restrictions on eval can still be debated.

@martinsumner

Copy link
Copy Markdown
Contributor

Linking this issue - #139 - as we will need to fix this before extending further the number of route definitions.

@hmmr hmmr force-pushed the tiot/openriak-3.4/riak_control branch from 59db078 to 343aa61 Compare March 17, 2026 19:44
@hmmr

hmmr commented Mar 18, 2026

Copy link
Copy Markdown
Contributor Author

Progress report.

I have combined all requests under a single endpoint, /ctl. The full list of requests can be seen here.

An example request:

{
    "action": "GetVnodeStatus",
    "params": {node: "dev2@127.0.0.1", preflists: "all"}
}

A response will be a JSON with a "result" key containing the structured data, or an "error" key with an error string.

All this will be documented in QuickDocs.

Regarding permissions, I can think of the following:

  • cluster observer, with access to ClusterGetStatus but not ClusterStage* or ClusterCommit or NodeRestart, and also to NodeGetConfig and VnodeGetStatus and TictacaaeGetStatus.
  • cluster operator, with full access to all Cluster* requests as well as NodePutConfig and NodeRestart;
  • security, with access to all Security* requests.

Ideas, suggestions (specifically around permission classes and granularity)?

@martinsumner

Copy link
Copy Markdown
Contributor

The permission classes and granularity seem to correct to me. All seems good at this stage.

As mentioned before we should be able to control this without enabling security - i.e. block use of the API, or restrict its used based on the high-level permission classes.

@hmmr

hmmr commented Mar 18, 2026

Copy link
Copy Markdown
Contributor Author

Indeed, bolting riak_control-specific users and permissions onto existing riak security infrastructure was too much of an ad-hoc solution. I propose the following:

  • riak_control API is enabled/disabled in riak.conf. It will be disabled by default.
  • When enabled, operators will create the original superadmin user via riak admin control create-superadmin NAME and supply the password interactively.
  • With these credentials, operators will be able to access the API at /ctl, and in particular, create additional users with more restricted permission sets. These users will reside in a non-overlapping space with users currently managed by riak admin security subsystem.
  • Riak Control API and users and permissions will thus be totally independent from riak security users and permissions, and will not require riak security enabled.

Questions I have:

  • Should the superadmin and other riak_control users be propagated to other nodes in a cluster, or stay local to the node they were created on?
  • Do we need groups in riak_control?

@pjaclark

pjaclark commented Mar 20, 2026

Copy link
Copy Markdown
Member

My thoughts:

  • Permissions at group level.
  • Users in groups.
  • As many groups as wanted.
  • Groups cannot contain groups.
  • Track/log what each user does for auditing purposes.
  • User is member of one group only:
    • Done for simplicity.
    • Avoids having to flatten a complex security model for every REST API call.
    • Avoids having an out of date cache.
  • Secure users with a unique and private generated API key against their username.
  • Make new API compatible with OpenAPI 3.2
  • Be able to reset/disable a managing user via a CLI command
  • Propagate users and groups to all nodes in a cluster but NOT to other clusters.
    • If you have rights for one node, you should have them on all nodes in that cluster.
    • Can't think of a reason NOT to do this.
    • You can turn the control endpoints on/off via the config on any node, so if you want to limit access you have an easy method

E.g. peter@example.org is a member of the group monitoring, and monitoring can view stats but nothing else. Therefore I can see stats of all nodes in the cluster.

@martinsumner

martinsumner commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

I was thinking something much simpler. By "high-level permission classes" I just meant enable/disable the individual groups (operations, monitoring, security)

Riak control API is enabled/disabled in config as you suggest (default disabled). The three categories are than controlled via riak.conf (i.e. either enabled or disabled - and security will be by default disabled, the other two enabled - assuming you enable the global control).

I don't think there should be a need for a user unless you have security enabled. i.e. if you want granular controls, and restrict to peer IPs etc - that is what riak security is for. There should be special permission groups for the three categories, but nothing different to how riak security works at present.

If you don't want to enable security you just have the ability to enable/disable globally, and by category (e.g. operations, monitoring and security). If you don't want to enable security, it is because you have trust built into your environment - so managing super-users etc is overhead.

Security is complicated enough in Riak, I don't think we should add a new layer. We just need to ensure that no-one is compromised by default.

If you really, really want to add an extra layer of protection on top of the enable/disable and existing security controls - then add support for a X-Riak-Cookie header on this API, and require a hash of the erlang cookie to be passed (the thing that if you know you could use to connect with anyway).

@hmmr hmmr force-pushed the tiot/openriak-3.4/riak_control branch from 50a1dbf to 490d974 Compare March 23, 2026 22:10
@martinsumner

Copy link
Copy Markdown
Contributor

Following discussion on 24th March.

Somethings are agreed:

  • disabled by default;
  • the three categories of services within the API (monitoring, administration and security admin).

Somethings there was majority support for:

  • The API to be separate dependancy, with the WM callback modules to existing in that dependancy not riak_kv.

There was then an outstanding decision to be take on one of three options going forward:

  • a) Riak Control to require some middleware which will apply end-user AAA controls, and the high-level model for riak security within Riak for the API to be aligned with other services (based on authenticating the system not the user). As a third-party service, the middleware would not requires the test and assurance of the OpenRiak development team.
  • b) This change to depend on a change to make riak security enablement to be on a per-listener basis, and Riak Control API to be only available via a dedicated listener which must have riak security enabled. Change to require testing of non-functional constraints of existing security implementation (overheads on expanding count of users and complexity of user/group relationships) - and those constraints either to be fixed or warned about in documentation.
  • c) The dependency which includes the API to include its own security apparatus independent of riak security, which either now or in the future will be expanded to support single sign on.

The primary trade-off is between the functional need for enterprise grade security features in riak control, and the risk that project to introduce Control may increase in complexity. This complexity will not just be in the development overheads but in finding resource to assure that work and the necessary testing, in a timely manner in-time for a Riak 4.0 release.

@pjaclark to discuss with @hmmr - and report back next week, on both the preferred choice and the resourcing of a plan for the development, test and assurance of that choice.

@hmmr

hmmr commented Mar 25, 2026

Copy link
Copy Markdown
Contributor Author

Option (c) is more appealing to me: independent pool of users, and independence of whether the standard riak security is enabled.

I even fancy a separate webmachine instance serving /ctl endpoint, and moving it from riak_kv over to riak_api.

@martinsumner

Copy link
Copy Markdown
Contributor

One options we may have with the new SilverMachine prototype, is simplified "drop-in" API support.

Essentially we will have a config item in riak.conf

extended_api_modules = riak_kv_wm_console|60,riak_kv_wm_console_static|65

The number is the priority of the module when matching on routes. If this config is set, on startup we will look for the module in some folder, and code-load it, and then add the module/priority to the routes.

The module in its match_route/3 callback says what URLs to match on, and then governs its behaviour through its other callback functions (defined in riak_api_web_handler).

This will allow for additional APIs to be added to deployed Riak nodes - they don't need to be added to the package. So that something like Riak Control can be added over a Riak installation, rather than having to be an integrated part of Riak.

To view WIP of SilverMachine - OpenRiak/riak_api@openriak-3.4...nhse-o40-orkv.i141-silvermachine

@hmmr hmmr closed this May 14, 2026
@hmmr hmmr deleted the tiot/openriak-3.4/riak_control branch May 14, 2026 02:21
@github-project-automation github-project-automation Bot moved this from In Progress to Done in OpenRiak 3.4.2 May 14, 2026
@hmmr

hmmr commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

Following discussions, a new approach is proposed in which we formally define and expose a set of AWS-style HTTP API requests, served by a separate app, riak_admin_api (rather than squeezing those into riak_kv_wm_* bunch).

A new PR to supersede this one is #152.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants