Master information
Big Data Engineer
Position: Not specified
Start: As soon as possible
End: Not specified
Location: Toronto, Canada
Method of collaboration: Project only
Hourly rate: Not specified
Latest update: Jun 7, 2024
Task description and requirements
Design, develop, and maintain data pipelines using Hadoop Ecosystem, Apache Spark, and PySpark.
Implement data processing workflows to handle large volumes of structured and unstructured data.
Integrate data from various sources and formats into the big data platform.
Ensure data quality, integrity, and consistency across different data sources.
Optimize data processing jobs for performance and scalability.
Troubleshoot and resolve performance issues related to data processing.
Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions.
Communicate technical concepts and results effectively to non-technical stakeholders.
Create and maintain documentation for data pipelines, processes, and workflows.
Follow industry best practices for big data engineering and data governance.
Stay updated with the latest trends and technologies in big data engineering.
Propose and implement improvements to existing data processing frameworks and systems.
Proficiency in Hadoop, Apache Spark, and PySpark.
Strong programming skills in Python and/or Java.
Experience with data warehousing solutions and ETL processes.
Knowledge of SQL and NoSQL databases.
Familiarity with cloud platforms (e.g., AWS, Azure, Google Cloud) is a plus.
Excellent problem-solving and analytical skills.
Strong communication and teamwork abilities.
Ability to work in a fast-paced and dynamic environment.
Experience with real-time data processing frameworks (e.g., Apache Kafka, Apache Spark).
Knowledge of data visualization tools and techniques.
Certification in big data technologies or cloud platforms.