Data Architect (AI & ML Products)

Data Architect (AI & ML Products)

Job description

Job Title : Data Architect (AI & ML Products)

Experience: 5+ Years

Location: Mumbai

Job Summary:

We are hiring a Product Data Architect to own the data architecture and data foundations for a portfolio of strategic AI/ML products. This role focuses on designing and building product-grade data pipelines, curated data layers, and fit-for-purpose data stores that power analytics and AI/ML use cases. You will define data models, data contracts, quality and governance controls, and access patterns to ensure product teams can reliably consume high-quality data at scale.

Key Responsibilities

  • Design the end-to-end data architecture for assigned products: ingestion → transformation → curated layers → serving/consumption.
  • Build and oversee data pipelines (batch/stream where needed), including orchestration, error handling, recovery, and performance optimization.
  • Define product-level data models (conceptual/logical/physical), including dimensional models, canonical entities, and domain schemas.
  • Establish data contracts with upstream/downstream systems and product services (schemas, SLAs, validation rules, versioning).
  • Implement and enforce data quality and observability: checks, anomaly detection, freshness/completeness, reconciliation, and alerting.
  • Define master/reference data needs and harmonization approaches for product-specific domains.
  • Ensure secure and compliant data handling: access control, PII masking/redaction, encryption standards, retention, and auditability.
  • Partner with Data Engineering, ML/AI teams, and Product/Tech leads to enable use cases such as forecasting, pricing optimization, RAG/knowledge bases, and experimentation.
  • Evaluate and recommend data tooling choices at the product layer (e.g., transformations, orchestration, streaming, serving stores) aligned to scalability and cost.

Must have Skill Sets:

1)  Product Data Architecture & Modelling

  • Strong experience designing product-oriented data architectures: domain boundaries, source-to- consumption flows, and curated layers.
  • Expertise in data modelling (dimensional, normalized, hybrid) and defining canonical datasets for product use cases.
  • Ability to design data products: clearly defined datasets with ownership, contracts, documentation, and usage SLAs.

2)  Data Pipeline & Lake/Lakehouse Design

  • Hands-on architecture of data pipelines (batch and near-real-time): ingestion, transformation, orchestration, and serving.
  • Strong understanding of data lake/lakehouse patterns: bronze/silver/gold, CDC-based ingestion, incremental processing, partitioning, and compaction strategies.
  • Ability to define scalable approaches for data integration from enterprise systems (ERP/CRM/MarTech/R&D/LIMS/manufacturing systems, files, APIs, event streams).

3)  Data Quality, Governance & Observability

  • Proven capability to implement data quality frameworks: validation rules, anomaly detection, reconciliation, and completeness/accuracy checks.
  • Strong understanding of metadata, lineage, and cataloging and how to make data discoverable and trustworthy.
  • Experience defining and enforcing data access controls: classification, role-based access, masking/tokenization, auditability.

4) Performance, Reliability & Cost-Aware Design

  • Expertise in designing performant datasets and pipelines: partitioning, clustering, indexing, query optimization, and workload management.
  • Ability to define operational standards for pipelines: retries, idempotency, backfills, monitoring, alerting, and incident response.
  • Cost/performance tradeoff thinking for storage and compute (especially for large-scale transformation workloads).

5) Integration with AI/Analytics Consumption

  • Strong understanding of downstream needs for BI/analytics, ML feature engineering, and AI applications (including GenAI/RAG where relevant).
  • Ability to shape datasets for consumption: feature-ready tables, semantic layers, and curated marts for product teams.

6) Cross-Functional Delivery & Stakeholder Management

  • Ability to work closely with product teams, data engineers, platform teams, and security/compliance to deliver on product timelines.
  • Strong documentation and communication: data lineage, source mapping, data dictionaries, and pipeline runbooks.

Good to have Skill Sets:

  • Experience with modern orchestration and transformation tooling (e.g., Airflow/Prefect, dbt or equivalents).
  • Familiarity with one or more ecosystems commonly used in enterprise data platforms (e.g., Spark/Databricks, Snowflake/BigQuery, Delta/Iceberg/Hudi).
  • Exposure to master data management, reference data management, and consent/PII governance programs.
  • Domain exposure in CPG/FMCG, pricing/revenue management, marketing/media analytics, supply chain forecasting, or R&D systems.

Qualifications

  • Bachelor’s/Master’s in Computer Science, Engineering, or related discipline (or equivalent practical experience).
  • 5+ years of experience in roles such as Data Architect, Analytics Architect, Data Engineering Lead, or Data Platform Architect with demonstrable ownership of data models and pipeline architectures for business-critical products.

JOB CODE : SKILLMS-115

Categoreis

Newsletter