Primary Responsibilities & Qualification:
- Lead Data Engineering activities by working closely with various teams/ members
- Intensive software development experience under Agile development life cycle processes and tools
- Strong understanding of data engineering concepts and best practices.
- Proficiency in SQL and experience with data modeling techniques.
- Familiarity with AWS services, particularly Redshift, S3, and Glue.
- Knowledge of ETL (Extract, Transform, Load) processes and tools.
- Excellent problem-solving and troubleshooting skills.
- Strong communication skills to collaborate with cross-functional teams.
Data Warehouse Modeling:
- Design and implement data models for the data warehouse.
- Create and maintain data schemas, tables, and relationships.
- Optimize data models for query performance and storage efficiency.
- Ensure data integrity and enforce data quality standards.
Data Ingestion:
- Develop and maintain data ingestion pipelines.
- Extract data from various sources (databases, APIs, logs, etc.).
- Transform and clean data as needed before loading it into Redshift.
- Schedule and automate data ingestion processes.
- Monitor and optimize data ingestion performance.
AWS Redshift:
- Set up and configure Redshift clusters based on workload requirements.
- Tune and optimize query performance through indexing and distribution strategies.
- Monitor and manage Redshift performance, including workload management and query optimization.
- Implement security measures and access controls for Redshift.
- Ensure high availability and disaster recovery for Redshift clusters.
ETL (Extract, Transform, Load):
- Develop ETL workflows using AWS Glue, Apache Spark, or other relevant tools.
- Transform and enrich data during the ETL process to meet business requirements.
- Handle schema evolution and data versioning in ETL pipelines.
- Monitor ETL job performance and troubleshoot issues.
- Implement data lineage and metadata management.
Data Governance and Compliance:
- Implement data governance practices, including data lineage, data cataloging, and data documentation.
- Ensure compliance with data privacy and security regulations (e.g., GDPR,).
- Implement data retention policies and archiving strategies.
Automation and Monitoring:
- Implement automation scripts and tools for managing data pipelines and workflows.
- Set up monitoring and alerting for data pipeline failures and performance issues.
- Conduct regular health checks and capacity planning for the data warehouse.
Documentation and Collaboration:
- Maintain clear and up-to-date documentation for data processes, pipelines, and data models.
- Collaborate with data analysts, data scientists, and business stakeholders to understand data requirements and deliver actionable insights.
Performance Tuning and Optimization:
- Continuously optimize data warehouse performance through query tuning and resource management.
- Implement Redshift best practices for workload management.
- Identify and resolve bottlenecks in data pipelines and ETL processes.
Scalability and Cost Management:
- Ensure the data warehouse infrastructure scales effectively to handle growing data volumes.
- Monitor and manage costs associated with Redshift and other AWS services.
- Implement cost-saving strategies without compromising performance.
- Good knowledge on cyber security: penetration tests, DDOS attack prevention, TLS, PKI etc.
- Application lifecycle management, DevOps, CI and CD
- Experience in designing big data applications
- This individual should be self-driven, highly motivated, and organized with strong analytical thinking and problem solving skills, and an ability to work on multiple projects and function in a team environment.