Skip to content

Dowload tables issue 473#508

Open
djtfmartin wants to merge 1 commit into
devfrom
dowload_tables_issue_473
Open

Dowload tables issue 473#508
djtfmartin wants to merge 1 commit into
devfrom
dowload_tables_issue_473

Conversation

@djtfmartin
Copy link
Copy Markdown
Contributor

@djtfmartin djtfmartin commented May 13, 2026

@djtfmartin djtfmartin requested a review from Copilot May 13, 2026 15:17
@djtfmartin djtfmartin force-pushed the dowload_tables_issue_473 branch 2 times, most recently from 371572d to 8bc4595 Compare May 13, 2026 15:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates the download-tables (now org.gbif.download) artifacts into the occurrence build to eliminate duplicated table/term utilities and remove circular dependencies with pipelines. It also removes now-redundant modules and codepaths, including the legacy HDFS table definitions, Spark table builder, and the dataset deletion CLI pieces.

Changes:

  • Replace in-repo term/table utilities with org.gbif.download:hdfs-tables and org.gbif.download:term-utils (and update imports accordingly).
  • Remove redundant modules/code: occurrence-hdfs-table, occurrence-table-build-spark, maven-extension-avsc-schema-generator, and dataset deletion CLI classes/scripts.
  • Adjust Maven module versions and dependency versions; update CI pipeline to drop table-build Docker steps.

Reviewed changes

Copilot reviewed 85 out of 86 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pom.xml Switch dependency management to org.gbif.download artifacts, remove modules, adjust versions and distribution management.
occurrence-ws/src/main/java/org/gbif/occurrence/ws/resources/TermResource.java Use external org.gbif.terms.utils.TermUtils.
occurrence-ws/src/main/java/org/gbif/occurrence/ws/resources/OccurrenceDownloadDescribeResource.java Use external org.gbif.terms.utils.TermUtils.
occurrence-ws/pom.xml Replace occurrence-hdfs-table dependency with org.gbif.download:hdfs-tables.
occurrence-ws-client/pom.xml Align parent version with root version change.
occurrence-trino-udf/pom.xml Apply version suffix update.
occurrence-table-build-spark/src/test/resources/backfill-example.yml Remove table-build spark test resource.
occurrence-table-build-spark/src/main/java/org/gbif/occurrence/table/backfill/TableBackfillConfiguration.java Remove redundant Spark table backfill configuration.
occurrence-table-build-spark/src/main/java/org/gbif/occurrence/table/backfill/HdfsSnapshotAction.java Remove redundant Spark/HDFS snapshot action.
occurrence-table-build-spark/src/main/java/org/gbif/occurrence/table/backfill/DatasetUpdate.java Remove redundant dataset partition update Spark utility.
occurrence-table-build-spark/README.md Remove documentation for deleted module.
occurrence-table-build-spark/pom.xml Remove deleted module’s Maven build.
occurrence-table-build-spark/docker/Dockerfile Remove Docker packaging for deleted module.
occurrence-spark-udf/pom.xml Align parent version with root version change.
occurrence-search/pom.xml Align parent version with root version change.
occurrence-registry-sync/pom.xml Align parent version with root version change.
occurrence-persistence/pom.xml Align parent version with root version change.
occurrence-mail/pom.xml Align parent version with root version change.
occurrence-integration-tests/src/test/java/org/gbif/occurrence/ws/it/TermResourceIT.java Update TermUtils import to external package.
occurrence-integration-tests/pom.xml Align parent version with root version change.
occurrence-heatmaps/pom.xml Align parent version with root version change.
occurrence-hdfs-table/src/test/java/org/gbif/occurrence/download/hive/OccurrenceAvroHdfsTableDefinitionTest.java Remove tests for deleted legacy HDFS table module.
occurrence-hdfs-table/src/test/java/org/gbif/occurrence/download/hive/ExtensionTableTest.java Remove tests for deleted legacy HDFS table module.
occurrence-hdfs-table/src/test/java/org/gbif/occurrence/download/hive/EventTableDefinitionTest.java Remove tests for deleted legacy HDFS table module.
occurrence-hdfs-table/src/test/java/org/gbif/occurrence/download/hive/DownloadTermsTest.java Remove tests for deleted legacy HDFS table module.
occurrence-hdfs-table/src/main/resources/table/reference-table.avsc Remove generated schemas from deleted legacy module.
occurrence-hdfs-table/src/main/resources/table/multimedia-table.avsc Remove generated schemas from deleted legacy module.
occurrence-hdfs-table/src/main/resources/table/image-table.avsc Remove generated schemas from deleted legacy module.
occurrence-hdfs-table/src/main/resources/table/identifier-table.avsc Remove generated schemas from deleted legacy module.
occurrence-hdfs-table/src/main/resources/table/identification-table.avsc Remove generated schemas from deleted legacy module.
occurrence-hdfs-table/src/main/resources/table/germplasm-measurement-trial-table.avsc Remove generated schemas from deleted legacy module.
occurrence-hdfs-table/src/main/resources/table/germplasm-measurement-trait-table.avsc Remove generated schemas from deleted legacy module.
occurrence-hdfs-table/src/main/resources/table/germplasm-measurement-score-table.avsc Remove generated schemas from deleted legacy module.
occurrence-hdfs-table/src/main/resources/table/germplasm-accession-table.avsc Remove generated schemas from deleted legacy module.
occurrence-hdfs-table/src/main/resources/table/amplification-table.avsc Remove generated schemas from deleted legacy module.
occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/OccurrenceHDFSTableDefinition.java Remove deleted legacy HDFS table definition.
occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/OccurrenceAvroHdfsTableDefinition.java Remove deleted legacy HDFS avro schema generator.
occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/InitializableField.java Remove deleted legacy schema model.
occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/HiveDataTypes.java Remove deleted legacy hive typing logic.
occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/HiveColumns.java Remove deleted legacy hive column utilities.
occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/Field.java Remove deleted legacy schema model.
occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/ExtensionTable.java Remove deleted legacy extension table helper.
occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/ExtensionSchemasLoader.java Remove deleted legacy schema loader.
occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/EventHDFSTableDefinition.java Remove deleted legacy event table definition.
occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/EventDownloadTerms.java Remove deleted legacy event download terms.
occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/EventAvroHdfsTableDefinition.java Remove deleted legacy event avro schema generator.
occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/DownloadTerms.java Remove deleted legacy download terms.
occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/AvroDataTypes.java Remove deleted legacy avro typing utilities.
occurrence-hdfs-table/pom.xml Remove deleted legacy module build.
occurrence-es-mapping/src/main/java/org/gbif/search/es/occurrence/VerbatimSearchHitConverter.java Update TermUtils import to external package.
occurrence-es-mapping/src/main/java/org/gbif/search/es/occurrence/SearchHitOccurrenceConverter.java Update TermUtils import to external package.
occurrence-es-mapping/src/main/java/org/gbif/search/es/occurrence/OccurrenceEsField.java Update TermUtils import to external package.
occurrence-es-mapping/src/main/java/org/gbif/search/es/event/SearchHitEventConverter.java Update EventTermUtils/TermUtils imports to external package.
occurrence-es-mapping/src/main/java/org/gbif/search/es/event/EventEsField.java Update TermUtils import to external package.
occurrence-es-mapping/src/main/java/org/gbif/search/es/BaseEsField.java Update TermUtils import to external package.
occurrence-es-mapping/pom.xml Swap dependency from occurrence-common to org.gbif.download:term-utils.
occurrence-download/src/test/java/org/gbif/occurrence/download/query/TestDownloadHeaders.java Update EventTermUtils/TermUtils imports to external package.
occurrence-download/src/main/java/org/gbif/occurrence/download/util/HeadersFileUtil.java Update TermUtils import to external package.
occurrence-download/src/main/java/org/gbif/occurrence/download/hive/Queries.java Update TermUtils static import/package.
occurrence-download/src/main/java/org/gbif/occurrence/download/hive/ParquetSchemaQueries.java Update TermUtils import to external package.
occurrence-download/src/main/java/org/gbif/occurrence/download/hive/ParquetQueries.java Update TermUtils import to external package.
occurrence-download/src/main/java/org/gbif/occurrence/download/hive/HiveQueries.java Update TermUtils import to external package.
occurrence-download/src/main/java/org/gbif/occurrence/download/hive/EventsHiveQueries.java Update TermUtils import to external package.
occurrence-download/src/main/java/org/gbif/occurrence/download/hive/AvroSchemaQueries.java Update TermUtils import to external package.
occurrence-download/src/main/java/org/gbif/occurrence/download/hive/AvroQueries.java Update TermUtils import to external package.
occurrence-download/src/main/java/org/gbif/occurrence/download/file/OccurrenceMapReader.java Update TermUtils import to external package.
occurrence-download/src/main/java/org/gbif/occurrence/download/file/dwca/archive/DwcArchiveUtils.java Reorder imports; update EventTermUtils/TermUtils to external package.
occurrence-download/src/main/java/org/gbif/occurrence/download/file/dwca/akka/DownloadDwcaActor.java Update TermUtils import to external package.
occurrence-download/pom.xml Add term-utils/hdfs-tables, replace occurrence-hdfs-table dependency.
occurrence-download-service/pom.xml Replace occurrence-hdfs-table dependency with org.gbif.download:hdfs-tables.
occurrence-download-launcher/pom.xml Align parent version with root version change.
occurrence-common/src/test/java/org/gbif/occurrence/common/TermUtilsTest.java Remove tests for deleted in-repo TermUtils.
occurrence-common/src/test/java/org/gbif/occurrence/common/EventTermUtilsTest.java Remove tests for deleted in-repo EventTermUtils.
occurrence-common/src/main/java/org/gbif/occurrence/common/HiveColumnsUtils.java Replace array detection with local isHiveArray and switch to external TermUtils.
occurrence-common/src/main/java/org/gbif/occurrence/common/EventTermUtils.java Remove in-repo EventTermUtils (migrated to external package).
occurrence-common/pom.xml Add dependency on org.gbif.download:term-utils.
occurrence-cli/src/main/java/org/gbif/occurrence/cli/dataset/EsDatasetDeleterService.java Remove redundant dataset deletion service.
occurrence-cli/src/main/java/org/gbif/occurrence/cli/dataset/EsDatasetDeleterConfiguration.java Remove redundant dataset deletion configuration.
occurrence-cli/src/main/java/org/gbif/occurrence/cli/dataset/EsDatasetDeleterCommand.java Remove redundant CLI command entry point.
occurrence-cli/src/main/java/org/gbif/occurrence/cli/dataset/EsDatasetDeleterCallback.java Remove redundant dataset deletion callback.
occurrence-cli/src/main/java/org/gbif/occurrence/cli/common/EsHelper.java Remove ES helper used only by deleted deleter flow.
occurrence-cli/pom.xml Align parent version with root version change.
maven-extension-avsc-schema-generator/src/test/java/org/gbif/pipelines/maven/XmlToAvscGeneratorMojoTest.java Remove tests for deleted Maven plugin module.
maven-extension-avsc-schema-generator/src/main/java/org/gbif/pipelines/maven/XmlToAvscGeneratorMojo.java Remove deleted Maven plugin implementation.
maven-extension-avsc-schema-generator/pom.xml Remove deleted Maven plugin module build.
Jenkinsfile Remove table-build Docker build/release stages.
event-ws/pom.xml Replace occurrence-hdfs-table dependency with org.gbif.download:hdfs-tables.
build/occurrence-table-build-spark-docker-build.sh Remove Spark table-build Docker build script.
Comments suppressed due to low confidence (3)

pom.xml:14

  • The project version is set to 1.1.26-473-SNAPSHOT, which changes the Maven coordinates for every module in this build. If this PR is intended to be merged, this nonstandard version suffix will likely break downstream builds that expect the normal snapshot/release versioning; consider reverting to the standard snapshot version and using the CI/build metadata (or branch builds) to distinguish issue-specific builds instead.
    pom.xml:464
  • maven-extension-avsc-schema-generator is still listed as an internal dependency (version ${project.version}), but the module has been removed from the reactor in this PR. This leaves a dangling internal dependency entry; remove it (or replace it with an external artifact) to avoid confusion and potential resolution failures if it gets referenced later.
    occurrence-common/src/main/java/org/gbif/occurrence/common/HiveColumnsUtils.java:62
  • HiveColumnsUtils.getHiveType() now uses a hard-coded isHiveArray() list, while other parts of the download description logic (e.g. OccurrenceDownloadDescribeResource.fieldDelimiter) use SQLColumnsUtils.isSQLArray(term) to detect array terms. Using two independent sources of truth can lead to inconsistent type vs delimiter output; consider reusing SQLColumnsUtils.isSQLArray here (or otherwise centralizing the array-term logic in one place).
    } else if(TermUtils.isVocabulary(term)) {
      if (TermUtils.isArray(term)) {
        return "STRUCT<concepts: ARRAY<STRING>,lineage: ARRAY<STRING>>";
      } else {
        return "STRUCT<concept: STRING,lineage: ARRAY<STRING>>";

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pom.xml
Comment thread occurrence-download/pom.xml Outdated
@djtfmartin djtfmartin force-pushed the dowload_tables_issue_473 branch 2 times, most recently from 9ecc374 to 4958200 Compare May 13, 2026 16:08
@djtfmartin djtfmartin force-pushed the dowload_tables_issue_473 branch from 323b329 to 7d88330 Compare May 13, 2026 16:11
@djtfmartin djtfmartin requested a review from marcos-lg May 13, 2026 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants