Dowload tables issue 473#508
Open
djtfmartin wants to merge 1 commit into
Open
Conversation
371572d to
8bc4595
Compare
There was a problem hiding this comment.
Pull request overview
This PR integrates the download-tables (now org.gbif.download) artifacts into the occurrence build to eliminate duplicated table/term utilities and remove circular dependencies with pipelines. It also removes now-redundant modules and codepaths, including the legacy HDFS table definitions, Spark table builder, and the dataset deletion CLI pieces.
Changes:
- Replace in-repo term/table utilities with
org.gbif.download:hdfs-tablesandorg.gbif.download:term-utils(and update imports accordingly). - Remove redundant modules/code:
occurrence-hdfs-table,occurrence-table-build-spark,maven-extension-avsc-schema-generator, and dataset deletion CLI classes/scripts. - Adjust Maven module versions and dependency versions; update CI pipeline to drop table-build Docker steps.
Reviewed changes
Copilot reviewed 85 out of 86 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pom.xml | Switch dependency management to org.gbif.download artifacts, remove modules, adjust versions and distribution management. |
| occurrence-ws/src/main/java/org/gbif/occurrence/ws/resources/TermResource.java | Use external org.gbif.terms.utils.TermUtils. |
| occurrence-ws/src/main/java/org/gbif/occurrence/ws/resources/OccurrenceDownloadDescribeResource.java | Use external org.gbif.terms.utils.TermUtils. |
| occurrence-ws/pom.xml | Replace occurrence-hdfs-table dependency with org.gbif.download:hdfs-tables. |
| occurrence-ws-client/pom.xml | Align parent version with root version change. |
| occurrence-trino-udf/pom.xml | Apply version suffix update. |
| occurrence-table-build-spark/src/test/resources/backfill-example.yml | Remove table-build spark test resource. |
| occurrence-table-build-spark/src/main/java/org/gbif/occurrence/table/backfill/TableBackfillConfiguration.java | Remove redundant Spark table backfill configuration. |
| occurrence-table-build-spark/src/main/java/org/gbif/occurrence/table/backfill/HdfsSnapshotAction.java | Remove redundant Spark/HDFS snapshot action. |
| occurrence-table-build-spark/src/main/java/org/gbif/occurrence/table/backfill/DatasetUpdate.java | Remove redundant dataset partition update Spark utility. |
| occurrence-table-build-spark/README.md | Remove documentation for deleted module. |
| occurrence-table-build-spark/pom.xml | Remove deleted module’s Maven build. |
| occurrence-table-build-spark/docker/Dockerfile | Remove Docker packaging for deleted module. |
| occurrence-spark-udf/pom.xml | Align parent version with root version change. |
| occurrence-search/pom.xml | Align parent version with root version change. |
| occurrence-registry-sync/pom.xml | Align parent version with root version change. |
| occurrence-persistence/pom.xml | Align parent version with root version change. |
| occurrence-mail/pom.xml | Align parent version with root version change. |
| occurrence-integration-tests/src/test/java/org/gbif/occurrence/ws/it/TermResourceIT.java | Update TermUtils import to external package. |
| occurrence-integration-tests/pom.xml | Align parent version with root version change. |
| occurrence-heatmaps/pom.xml | Align parent version with root version change. |
| occurrence-hdfs-table/src/test/java/org/gbif/occurrence/download/hive/OccurrenceAvroHdfsTableDefinitionTest.java | Remove tests for deleted legacy HDFS table module. |
| occurrence-hdfs-table/src/test/java/org/gbif/occurrence/download/hive/ExtensionTableTest.java | Remove tests for deleted legacy HDFS table module. |
| occurrence-hdfs-table/src/test/java/org/gbif/occurrence/download/hive/EventTableDefinitionTest.java | Remove tests for deleted legacy HDFS table module. |
| occurrence-hdfs-table/src/test/java/org/gbif/occurrence/download/hive/DownloadTermsTest.java | Remove tests for deleted legacy HDFS table module. |
| occurrence-hdfs-table/src/main/resources/table/reference-table.avsc | Remove generated schemas from deleted legacy module. |
| occurrence-hdfs-table/src/main/resources/table/multimedia-table.avsc | Remove generated schemas from deleted legacy module. |
| occurrence-hdfs-table/src/main/resources/table/image-table.avsc | Remove generated schemas from deleted legacy module. |
| occurrence-hdfs-table/src/main/resources/table/identifier-table.avsc | Remove generated schemas from deleted legacy module. |
| occurrence-hdfs-table/src/main/resources/table/identification-table.avsc | Remove generated schemas from deleted legacy module. |
| occurrence-hdfs-table/src/main/resources/table/germplasm-measurement-trial-table.avsc | Remove generated schemas from deleted legacy module. |
| occurrence-hdfs-table/src/main/resources/table/germplasm-measurement-trait-table.avsc | Remove generated schemas from deleted legacy module. |
| occurrence-hdfs-table/src/main/resources/table/germplasm-measurement-score-table.avsc | Remove generated schemas from deleted legacy module. |
| occurrence-hdfs-table/src/main/resources/table/germplasm-accession-table.avsc | Remove generated schemas from deleted legacy module. |
| occurrence-hdfs-table/src/main/resources/table/amplification-table.avsc | Remove generated schemas from deleted legacy module. |
| occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/OccurrenceHDFSTableDefinition.java | Remove deleted legacy HDFS table definition. |
| occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/OccurrenceAvroHdfsTableDefinition.java | Remove deleted legacy HDFS avro schema generator. |
| occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/InitializableField.java | Remove deleted legacy schema model. |
| occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/HiveDataTypes.java | Remove deleted legacy hive typing logic. |
| occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/HiveColumns.java | Remove deleted legacy hive column utilities. |
| occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/Field.java | Remove deleted legacy schema model. |
| occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/ExtensionTable.java | Remove deleted legacy extension table helper. |
| occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/ExtensionSchemasLoader.java | Remove deleted legacy schema loader. |
| occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/EventHDFSTableDefinition.java | Remove deleted legacy event table definition. |
| occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/EventDownloadTerms.java | Remove deleted legacy event download terms. |
| occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/EventAvroHdfsTableDefinition.java | Remove deleted legacy event avro schema generator. |
| occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/DownloadTerms.java | Remove deleted legacy download terms. |
| occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/AvroDataTypes.java | Remove deleted legacy avro typing utilities. |
| occurrence-hdfs-table/pom.xml | Remove deleted legacy module build. |
| occurrence-es-mapping/src/main/java/org/gbif/search/es/occurrence/VerbatimSearchHitConverter.java | Update TermUtils import to external package. |
| occurrence-es-mapping/src/main/java/org/gbif/search/es/occurrence/SearchHitOccurrenceConverter.java | Update TermUtils import to external package. |
| occurrence-es-mapping/src/main/java/org/gbif/search/es/occurrence/OccurrenceEsField.java | Update TermUtils import to external package. |
| occurrence-es-mapping/src/main/java/org/gbif/search/es/event/SearchHitEventConverter.java | Update EventTermUtils/TermUtils imports to external package. |
| occurrence-es-mapping/src/main/java/org/gbif/search/es/event/EventEsField.java | Update TermUtils import to external package. |
| occurrence-es-mapping/src/main/java/org/gbif/search/es/BaseEsField.java | Update TermUtils import to external package. |
| occurrence-es-mapping/pom.xml | Swap dependency from occurrence-common to org.gbif.download:term-utils. |
| occurrence-download/src/test/java/org/gbif/occurrence/download/query/TestDownloadHeaders.java | Update EventTermUtils/TermUtils imports to external package. |
| occurrence-download/src/main/java/org/gbif/occurrence/download/util/HeadersFileUtil.java | Update TermUtils import to external package. |
| occurrence-download/src/main/java/org/gbif/occurrence/download/hive/Queries.java | Update TermUtils static import/package. |
| occurrence-download/src/main/java/org/gbif/occurrence/download/hive/ParquetSchemaQueries.java | Update TermUtils import to external package. |
| occurrence-download/src/main/java/org/gbif/occurrence/download/hive/ParquetQueries.java | Update TermUtils import to external package. |
| occurrence-download/src/main/java/org/gbif/occurrence/download/hive/HiveQueries.java | Update TermUtils import to external package. |
| occurrence-download/src/main/java/org/gbif/occurrence/download/hive/EventsHiveQueries.java | Update TermUtils import to external package. |
| occurrence-download/src/main/java/org/gbif/occurrence/download/hive/AvroSchemaQueries.java | Update TermUtils import to external package. |
| occurrence-download/src/main/java/org/gbif/occurrence/download/hive/AvroQueries.java | Update TermUtils import to external package. |
| occurrence-download/src/main/java/org/gbif/occurrence/download/file/OccurrenceMapReader.java | Update TermUtils import to external package. |
| occurrence-download/src/main/java/org/gbif/occurrence/download/file/dwca/archive/DwcArchiveUtils.java | Reorder imports; update EventTermUtils/TermUtils to external package. |
| occurrence-download/src/main/java/org/gbif/occurrence/download/file/dwca/akka/DownloadDwcaActor.java | Update TermUtils import to external package. |
| occurrence-download/pom.xml | Add term-utils/hdfs-tables, replace occurrence-hdfs-table dependency. |
| occurrence-download-service/pom.xml | Replace occurrence-hdfs-table dependency with org.gbif.download:hdfs-tables. |
| occurrence-download-launcher/pom.xml | Align parent version with root version change. |
| occurrence-common/src/test/java/org/gbif/occurrence/common/TermUtilsTest.java | Remove tests for deleted in-repo TermUtils. |
| occurrence-common/src/test/java/org/gbif/occurrence/common/EventTermUtilsTest.java | Remove tests for deleted in-repo EventTermUtils. |
| occurrence-common/src/main/java/org/gbif/occurrence/common/HiveColumnsUtils.java | Replace array detection with local isHiveArray and switch to external TermUtils. |
| occurrence-common/src/main/java/org/gbif/occurrence/common/EventTermUtils.java | Remove in-repo EventTermUtils (migrated to external package). |
| occurrence-common/pom.xml | Add dependency on org.gbif.download:term-utils. |
| occurrence-cli/src/main/java/org/gbif/occurrence/cli/dataset/EsDatasetDeleterService.java | Remove redundant dataset deletion service. |
| occurrence-cli/src/main/java/org/gbif/occurrence/cli/dataset/EsDatasetDeleterConfiguration.java | Remove redundant dataset deletion configuration. |
| occurrence-cli/src/main/java/org/gbif/occurrence/cli/dataset/EsDatasetDeleterCommand.java | Remove redundant CLI command entry point. |
| occurrence-cli/src/main/java/org/gbif/occurrence/cli/dataset/EsDatasetDeleterCallback.java | Remove redundant dataset deletion callback. |
| occurrence-cli/src/main/java/org/gbif/occurrence/cli/common/EsHelper.java | Remove ES helper used only by deleted deleter flow. |
| occurrence-cli/pom.xml | Align parent version with root version change. |
| maven-extension-avsc-schema-generator/src/test/java/org/gbif/pipelines/maven/XmlToAvscGeneratorMojoTest.java | Remove tests for deleted Maven plugin module. |
| maven-extension-avsc-schema-generator/src/main/java/org/gbif/pipelines/maven/XmlToAvscGeneratorMojo.java | Remove deleted Maven plugin implementation. |
| maven-extension-avsc-schema-generator/pom.xml | Remove deleted Maven plugin module build. |
| Jenkinsfile | Remove table-build Docker build/release stages. |
| event-ws/pom.xml | Replace occurrence-hdfs-table dependency with org.gbif.download:hdfs-tables. |
| build/occurrence-table-build-spark-docker-build.sh | Remove Spark table-build Docker build script. |
Comments suppressed due to low confidence (3)
pom.xml:14
- The project version is set to
1.1.26-473-SNAPSHOT, which changes the Maven coordinates for every module in this build. If this PR is intended to be merged, this nonstandard version suffix will likely break downstream builds that expect the normal snapshot/release versioning; consider reverting to the standard snapshot version and using the CI/build metadata (or branch builds) to distinguish issue-specific builds instead.
pom.xml:464 maven-extension-avsc-schema-generatoris still listed as an internal dependency (version${project.version}), but the module has been removed from the reactor in this PR. This leaves a dangling internal dependency entry; remove it (or replace it with an external artifact) to avoid confusion and potential resolution failures if it gets referenced later.
occurrence-common/src/main/java/org/gbif/occurrence/common/HiveColumnsUtils.java:62HiveColumnsUtils.getHiveType()now uses a hard-codedisHiveArray()list, while other parts of the download description logic (e.g.OccurrenceDownloadDescribeResource.fieldDelimiter) useSQLColumnsUtils.isSQLArray(term)to detect array terms. Using two independent sources of truth can lead to inconsistent type vs delimiter output; consider reusingSQLColumnsUtils.isSQLArrayhere (or otherwise centralizing the array-term logic in one place).
} else if(TermUtils.isVocabulary(term)) {
if (TermUtils.isArray(term)) {
return "STRUCT<concepts: ARRAY<STRING>,lineage: ARRAY<STRING>>";
} else {
return "STRUCT<concept: STRING,lineage: ARRAY<STRING>>";
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
9ecc374 to
4958200
Compare
Removed unused code
323b329 to
7d88330
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.