Enhancement: Mapping source column names to parquet-compatible destination column names

In the current ingestion pipeline of the CF.Cumulus project, data sources such as SQL Server allow column names that contain spaces. However, the Parquet destination does not support column names with spaces.

**Proposed Solution:**
Introduce a new column in the `[ingest].[Attributes]` table to allow users to specify a destination column name for each source column. This enhancement would function as follows:

- Add a new column, e.g., `AttributeTargetName`, to the ingest metadata table that tracks column mappings.
- Allow users to define `AttributeTargetName` for each source column. This field can be populated with a Parquet-compatible name (e.g., replacing spaces with underscores or removing spaces entirely).
- If `AttributeTargetName` is left blank or null, the system will default to using the original source column name, ensuring backward compatibility and making this a non-breaking change.
- Modify the `[ingest].[GetDatasetPayload]` stored procedure to generate the required TabularTranslator json expression to reference `AttributeTargetName` when writing to Parquet files. If not specified / NULL, the `AttributeName` column will be used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement: Mapping source column names to parquet-compatible destination column names #122

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enhancement: Mapping source column names to parquet-compatible destination column names #122

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions