Implement process info collection on AIX#2028
Conversation
shirou
left a comment
There was a problem hiding this comment.
Thank you for your contribution. This is a clean, well-tested implementation. The code follows project conventions and the test coverage is thorough. A few items to address:
-
Struct definitions should use
cgo -godefs: In this project, kernel structs are defined intypes_<platform>.goas input tocgo -godefs, which generates the corresponding Go structs (see types_darwin.go, types_freebsd.go, types_openbsd go, etc.). Hand-written structs risk padding/alignment mismatches with the actual memory layout — especially since you noted that "the fields are not ordered the same way in the doc than their actual memory layout." While perhaps your implementation has been tested on a real machine, thecgo -godefsapproach gives us a stronger guarantee going forward. -
Could you create a process/types_aix.go and generate the structs from the AIX headers? Something like:
//go:build ignore
/*
#include <sys/procfs.h>
*/
import "C"
type psinfo C.struct_psinfo
type lwpSinfo C.struct_lwpsinfo
type prTimestruc64 C.struct_timestruc64
Then run go tool cgo -godefs types_aix.go on your AIX 7.3 host to produce the generated output. Since you already have an AIX test environment, this should be straightforward.
2. Verify golang.org/x/sys/unix symbols on AIX: Adding aix to the build tag in process_posix.go pulls in symbols like Stat_t, Lstat, SIGSTOP, SIGCONT, etc. Please verify these all compile on AIX.
3. uint64 → int32/uint32 casts: The psinfo struct uses uint64 for PIDs and UIDs, but the return types are int32/uint32. These are safe in practice on AIX, but a brief comment noting this assumption would be helpful.
4. Minor: Consider renaming nullTerminatedBytes → nullTerminatedString since it returns a string.
|
👋 @shirou Thanks for the comments, I agree !
I didn't know about that, that's super convenient ! |
### What does this PR do? Add live process monitoring support for AIX: - Add `pkg/process/procutil/process_aix.go`: implements the `Probe` interface by reading `/proc/<pid>/psinfo` directly (world-readable 448-byte binary, no elevated privileges required). The psinfo struct layout is vendored from the DataDog/gopsutil fork — there is no `DataDog/gopsutil` import. - Add `pkg/process/checks/system_info_aix.go`: `CollectSystemInfo` implementation for AIX via gopsutil v4. - Guard `fp.Stats.CtxSwitches` and `fp.CtxSwitches` nil dereferences in `process.go` and `process_rt.go` (AIX does not provide context switch data). - Exclude AIX from the generic `process_fallback.go` and the Linux-specific `system_info.go`. ### Motivation Follow-up to #49052 and #49108. Cherry-picked and adapted from #47041. ### Describe how you validated your changes Cherry-picked from #47041, which has been tested manually on AIX. ### Additional Notes The process collection logic was originally added in the `DataDog/gopsutil` fork. Since that fork is no longer used, the logic has been vendored directly into the agent. I'm working on upstreaming the logic to shirou/gopsutil but it's still WIP shirou/gopsutil#2028 Co-authored-by: pierre.gimalac <pierre.gimalac@datadoghq.com>
|
👋 Hey @shirou, would you mind giving this PR another review whenever you have time ? Thanks ! |
shirou
left a comment
There was a problem hiding this comment.
Thanks for the well-structured PR — tests and the cgo -godefs reproduction recipe are very nice. A few things I'd like fixed before merging:
Required
-
Sort
ChildrenWithContext(andProcessesWithContext) by PID for parity with the other ports — seeprocess_linux.go:369,process_freebsd.go:305. -
Don't bypass
NewProcessWithContextinProcesses/Children.: The comment explains skipping thePidExistscheck, butNewProcessWithContextalso seedsp.createTime, whichIsRunningWithContextuses to detect PID recycling. As written, AIXProcessobjects from these helpers silently lose that guarantee. Please either callNewProcessWithContextlike the other ports, or seedcreateTimeexplicitly from thepsinfoyou already read. -
readPidsFromDiris currently duplicated inprocess_linux.go:1167andprocess_solaris.go:278. Rather than adding a third copy, please move it to a shared location (e.g.process_posix.go, which now includes aix) and drop the linux/solaris copies. Happy to take that consolidation in this PR.
Please also address
-
Move the "raw
pr_nice" and "combined user+system in.User,.System=0" caveats into the godoc onNiceWithContext/TimesWithContextso API consumers see them. -
RlimitWithContext→RlimitUsageWithContext→ErrNotImplementedErroradds nothing while both are unimplemented; just return the error directly. -
The "psinfo stores PIDs/UIDs/GIDs as uint64; safe to truncate..." comment appears three times — once at the top of the file is enough.
Repeated readPsinfo opens per call (Name + Ppid + Times + MemoryInfo = 4 reads of the same file) is consistent with other ports, so feel free to leave a TODO and address later.
Thanks again for testing on AIX 7.3 and documenting the limitations so clearly.
This PR implements basic process info collection on AIX, reading from the
/proc/<pid>/psinfofile.psinfodoesn't provide all the information that we need but it's readable by any user so this is a good start, its content is described in https://www.ibm.com/docs/en/aix/7.1.0?topic=files-proc-file.Note that the fields are not ordered the same way in the doc than their actual memory layout.
The implementation has been tested on an AIX 7.3 host and compared to
psoutput.Known limitations (on top of many functions just not being implemented):
ps(psseems to apply a formula but not always the same depending on processes, so returning raw value was simpler)pr_sizefield is (slightly) bigger compared tops vsz, probably includes more types of memory thanpsdoesThis can be improved by reading other files but requires same user or running as root, and/or using perfstat with CGO, but starting with only
psinfomakes the PR much simpler.