Skip to content

feat: searchable entry links — name-enriched links metadata + raw links.json (#143 #389)#390

Merged
drernie merged 11 commits into
mainfrom
benchling-note-parser
Jun 15, 2026
Merged

feat: searchable entry links — name-enriched links metadata + raw links.json (#143 #389)#390
drernie merged 11 commits into
mainfrom
benchling-note-parser

Conversation

@drernie

@drernie drernie commented Jun 15, 2026

Copy link
Copy Markdown
Member

What

When packaging a Benchling entry, the packager now makes the objects an entry references searchable by their human-readable name, and records the raw discovery separately. This is the top-line use case from the 2026-06-15 customer call: "show me all the experiments where QB-2743.1 was used."

Two artifacts, with a hard split between curated/searchable and raw facts:

1. links — curated, searchable (promoted into entry.json)

entry.json is the package's metadata_uri, so anything in it is queryable in Elasticsearch. We add a flat links array, one entry per referenced object, four fields each with one job:

"links": [
  { "type": "custom_entity", "id": "bfi_xCUXNVyG", "name": "QB-2743.1", "slug": "qb-2743-1" },
  { "type": "dna_sequence",  "id": "seq_asQya4lk", "name": null,        "slug": "example-seq" }
]
  • type, id — free; id supports downstream linking.
  • name — authoritative Benchling display name via best-effort get_by_id, or null when the lookup fails/isn't supported. The only search target, so every non-null links.name is a real name.
  • slug — lossy token parsed from the webURL, for eyeballing/debugging only. Never searched as a name, never folded into name.

Search scopes to links.name; a bare keyword still matches the metadata.

2. links.json — raw facts (renamed from references.json)

Raw discovery of what the entry points at — id/type/web_url per link, plus entities and results_tables rows — so a re-parse or a changed classification never needs a re-fetch. Drops the derived classifications (category/fetchable/eventable/disposition): those are recomputed from type in code at runtime, not frozen to disk. schema_version bumped to 2.

Why name needs an API call (verified)

A note link is only {id, type, webURL} — no name. The webURL trailing segment is a lossy slug, confirmed against test/openapi.yaml:

webURL slug record name lost
…/bfi-xCUXNVyG-sbn000/editsbn000 sBN000 case
…/seq_bhuDUw9D-test-oligo-abc/edittest-oligo-abc Example DNA Oligo slug ≠ name

QB-2743.1 would appear as qb-2743-1 (case + . gone), so a search would miss. The exact name must come from the API. Name resolution is best-effort and never raises, and requires the app to be a registry/project collaborator (setup requirement, not code).

Discovery layer (docker/src/entry_references.py, pure — no API calls)

  • summarize_references(entry) → the raw links.json payload.
  • link_metadata(entry) → the curated {type, id, name: None, slug} skeleton (caller fills name).
  • slug_from_web_url(web_url, id) → strict slug parse (returns None unless an {id}- prefix is matched).
  • classify_links / extract_entity_references / extract_results_tables cover the full 18-token EntryLink enum.

Name enrichment lives in the caller (EntryPackager._enrich_link_names) via a type → SDK service map (entities first, then inventory and entries — the types flagged on the call); unmapped types keep name=null.

Tests

Parser + packager suites updated for the v2 shape; added coverage for slug_from_web_url, link_metadata, and the entry.json.links / links.json split. Full suite green (443). black + isort + pyright clean.

Refs #143 #389

🤖 Generated with Claude Code

)

Pure parser over a Benchling entry dict that surfaces the objects an entry
points at, in one place for both upcoming features:

- extract_entity_references(): entity IDs from days[].notes[].links[]
  (filtered to entity types, dropping non-entity links like sql_dashboard)
  and from entity-link fields; deduped by ID. -> #143 entity packaging.
- extract_results_tables(): results_table notes carrying assayResultSchemaId.
  -> #68/#69 assay results.
- extract_note_links(): low-level all-links primitive.

No Benchling API calls and no behavior change -- nothing consumes it yet, so
it lands independently of either feature. 13 unit tests; black/isort/pyright
clean.

Refs #143 #68

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@drernie drernie requested a review from Copilot June 15, 2026 17:30
Comment thread docker/src/entry_references.py Outdated
Comment thread docker/src/entry_references.py Outdated
Comment thread docker/src/entry_references.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new pure-Python discovery module to extract (a) entity references and (b) assay results table references from a Benchling entry payload, with accompanying unit tests, intended to be reused by upcoming packaging/features (#143, #68/#69).

Changes:

  • Introduces src.entry_references with helpers and dataclasses for extracting note links, entity references, and results-table references.
  • Adds unit tests validating filtering, deduplication, field-shape handling, and defensive behavior on missing keys.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
docker/src/entry_references.py New extraction utilities for entity links and results-table references from entry dicts.
docker/tests/test_entry_references.py New unit tests covering key extraction behaviors and defensive parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docker/src/entry_references.py Outdated
Comment thread docker/src/entry_references.py
Comment thread docker/tests/test_entry_references.py
Comment thread docker/tests/test_entry_references.py
Generalize the discovery layer from entities-only to the full EntryLink enum
(18 types from test/openapi.yaml), per #389.

- classify_links(entry): surfaces ALL note links, each labeled with a
  LinkCategory (entity/inventory/reference/metadata/not_packageable/uncertain/
  external/unknown). Consumers filter, e.g. `r.is_packageable`. Unknown/future
  types surface as UNKNOWN rather than being silently dropped.
- LINK_TYPE_CATEGORY: type -> category for all 18 tokens; PACKAGEABLE_CATEGORIES.
- Fix entity set: ENTITY_LINK_TYPES now {custom_entity, dna_sequence,
  aa_sequence, batch}. Adds `batch` (a real registry entity, was missed);
  drops dna_oligo/rna_oligo (NOT EntryLink types -- can't appear as note links).
- spec/entry-link-types.json: human-facing reference map (category, packageable,
  id prefix, GET endpoint, webhook events) for all 18 types, plus the
  not-inline-linkable resources. A test asserts its categories match the module
  so it can't drift.

26 unit tests; black/isort/pyright clean.

Refs #143 #389 #68

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@drernie drernie changed the title feat: entry-reference extractor (shared discovery for #143 + #68/#69) feat: entry-reference discovery layer — full EntryLink type map (#143 #389 #68) Jun 15, 2026
When packaging an entry, write a references.json alongside entry.json listing
the Benchling objects the entry points at (entities, classified links, results
tables), discovered from the entry's note links and fields. No records are
fetched -- discovery only.

- entry_references.summarize_references(entry): JSON-serializable payload
  ({schema_version, entities, links, results_tables}); REFERENCES_SCHEMA_VERSION.
- entry_packager._create_metadata_files: emit references.json + document it in
  the package README.

Review fixes (Greptile + Copilot):
- Drop empty-string entity IDs in _field_value_ids, matching the note-link guard.
- Narrow RESULTS_TABLE_NOTE_TYPES to {"results_table"} -- the only type carrying
  assayResultSchemaId; avoids latently capturing generic/registration tables.
- Modernize typing (dict/list/tuple) on the 3.11+ codebase.
- Remove committed spec/entry-link-types.json (relocated to the project's
  scripts/ as a research artifact) and its drift-guard test.

Full suite green (437).

Refs #143 #389 #68

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@drernie drernie force-pushed the benchling-note-parser branch from 41916bd to 26bbfb1 Compare June 15, 2026 17:46
@drernie drernie changed the title feat: entry-reference discovery layer — full EntryLink type map (#143 #389 #68) feat: entry-reference discovery + references.json in each package (#143 #389 #68) Jun 15, 2026
@drernie drernie requested a review from Copilot June 15, 2026 17:50
Confirms _field_value_ids drops empty strings (single + isMulti list),
matching the note-link guard. Requested in review.

Refs #143

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread docker/src/entry_references.py Outdated
Comment thread docker/src/entry_packager.py Outdated
drernie and others added 3 commits June 15, 2026 11:01
… + disposition

`packageable` conflated two questions: can a record be fetched, vs. should it
live in its own package or nested in the entry. Split into orthogonal axes and a
derived disposition.

- FETCHABLE_CATEGORIES (GET-by-id exists) + EVENTABLE_CATEGORIES (own webhooks,
  can arrive independent of an entry) + CATEGORY_DISPOSITION.
- LinkRef.is_fetchable / is_eventable / disposition (replaces is_packageable).
- references.json links now carry {category, fetchable, eventable, disposition}.

disposition makes explicit that nest-vs-standalone is a genuine product decision
ONLY for entities (fetchable AND eventable -> nest_or_standalone). Non-entities
are forced: inventory -> nest (no events); entry/request/workflow -> link (own
package); metadata -> pointer; dashboards/external -> skip.

Project artifact scripts/entry-link-types.json updated to match (not in repo).

Refs #143 #389

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ent)

Promote a curated `links` array into entry.json (the package's metadata_uri)
so packages are searchable by the human-readable name of the entities/objects
an entry references — the top-line use case from the 2026-06-15 call ("show me
all experiments where QB-2743.1 was used").

Each curated link carries four fields, each with one job:
  - type, id  — free; id supports downstream linking
  - name      — authoritative Benchling display name via best-effort GET-by-id,
                or null when the lookup fails/isn't supported (never a slug)
  - slug      — lossy token parsed from the webURL, for eyeballing/debugging only

Verified the human name is NOT recoverable from the webURL: the trailing
segment is a lowercased, punctuation-flattened slug (sBN000 -> sbn000), so the
exact name must come from the API. Name resolution is best-effort and never
raises — it requires the app to be a registry/project collaborator.

Also rename references.json -> links.json and reduce it to raw facts only
(id/type/web_url + entities + results_tables). Derived classifications
(category/fetchable/eventable/disposition) are no longer persisted; they are
recomputed from type in code, so the raw archive stays reprocessable and a
future classification change needs no re-fetch. Schema bumped to v2.

Refs #143 #389

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@drernie drernie changed the title feat: entry-reference discovery + references.json in each package (#143 #389 #68) feat: searchable entry links — name-enriched links metadata + raw links.json (#143 #389) Jun 15, 2026
drernie and others added 4 commits June 15, 2026 13:05
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Note links carry a required type plus optional id (external link has none)
and optional webURL (e.g. location has none). Addresses PR #390 review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@drernie drernie merged commit c335356 into main Jun 15, 2026
3 checks passed
@drernie drernie deleted the benchling-note-parser branch June 15, 2026 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants