Skip to content

Add semantic search workshop#172

Open
keshawillz wants to merge 1 commit into
oracle-devrel:mainfrom
keshawillz:add-semantic-search-workshop
Open

Add semantic search workshop#172
keshawillz wants to merge 1 commit into
oracle-devrel:mainfrom
keshawillz:add-semantic-search-workshop

Conversation

@keshawillz

Copy link
Copy Markdown

Summary

A 10-minute hands-on workshop that builds semantic search over real GitHub issues using Oracle AI Database 26ai, langchain-oracledb, and GitHub Codespaces. Pulls 15 recent issues from oracle/python-oracledb, embeds them with sentence-transformers, runs similarity and hybrid-filter queries, and shows the raw VECTOR_DISTANCE and JSON_VALUE SQL running underneath the LangChain abstraction.

The exact question it answers

How do I use Oracle AI Database and langchain-oracledb to find semantically similar text records, and how do I combine vector similarity with metadata filters like status and labels?

Target audience

Intermediate Python developers (familiar with REST APIs and basic SQL, new to vector search).

How to run

Follow the sparse-checkout instructions in the workshop README to pull just this folder, install dependencies from .devcontainer/requirements.txt, set up FreeSQL credentials in a .env file (template provided), open notebook.ipynb, and run all cells. End-to-end runs in well under a minute.

What's in the PR

File Purpose
workshops/semantic-search-github-issues/notebook.ipynb 10-cell executable notebook covering connection, ingestion, similarity search, keyword comparison, hybrid filter, and raw SQL
workshops/semantic-search-github-issues/README.md What you will build, sparse-checkout setup, workshop files, stack, cell-by-cell summary, related workshops
workshops/semantic-search-github-issues/.devcontainer/devcontainer.json Dev container configuration
workshops/semantic-search-github-issues/.devcontainer/requirements.txt Python dependencies (pinned minimums)
workshops/semantic-search-github-issues/.devcontainer/cache_model.py Pre-caches all-MiniLM-L6-v2 at container build time
workshops/semantic-search-github-issues/.env.example Credential template (FreeSQL user, password, TCPS DSN)
workshops/semantic-search-github-issues/.gitignore Excludes .env from commits

What the workshop covers

  1. Connection setup with python-oracledb thin mode, credentials loaded from .env
  2. Pulling live issue data from the public GitHub REST API
  3. Shaping issues into LangChain Document objects with metadata
  4. Embedding generation with HuggingFaceEmbeddings
  5. One-call vector store creation via OracleVS.from_documents()
  6. Semantic similarity search (similarity_search)
  7. Keyword search comparison (returns zero matches, demonstrates the gap)
  8. Hybrid filter (vector similarity + state=open metadata filter)
  9. Raw SQL equivalent using VECTOR_DISTANCE and JSON_VALUE to show the underlying database operations
  10. Cleanup (drop demo table)

Sponsorship disclosure

This workshop was developed in partnership with Oracle via Freeman & Forrest. The disclosure is included in the workshop's README and in the accompanying YouTube video description.

OCA

Signed and approved under Kesha Williams (KeshaS@comcast.net). All commits in this PR are signed off with the matching email.

Open questions for reviewers

A few things worth confirming during review:

  1. Workshop folder placement. I placed this under /workshops/ since the bundle (devcontainer, README, notebook, env example) matches the workshop pattern described in the repo README. If it fits better elsewhere, happy to restructure.

Signed-off-by: Kesha Williams <KeshaS@comcast.net>
@oracle-contributor-agreement oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant