About Fusemachines
Fusemachines is a leading AI strategy, talent, and education services provider. Founded by Sameer Maskey Ph.D., Adjunct Associate Professor at Columbia University, Fusemachines has a core mission of democratizing AI. With a presence in 4 countries (Nepal, United States, Canada, and Dominican Republic and more than 400 full-time employees). Fusemachines seeks to bring its global expertise in AI to transform companies around the world.
About the role:
This is a remote consulting position responsible for designing, building, maintaining and optimizing the infrastructure required for data integration (batch and real-time), storage (including databases and data modeling), processing, and analytics (BI, visualization and Advanced Analytics) using Microsoft Azure in the Media Industry (advertising, marketing and public relationship).
The Data Engineer works closely with cross-functional teams supporting business objectives, and serves as a Azure Cloud solutions subject matter expert (SME) on business logics and collaborates with the Solutioning team on solution design.
Qualification & Experience
- Must have a full-time Bachelor's degree in Computer Science or similar from a top tier school.
- At least 5 years of experience as a data engineer, ETL development and database management, with strong expertise in Azure, working in Media Industry experience preferred.
- 5+ years of experience with Azure DevOps, Azure Cloud Platform, and other hyperscalers.
- Proven experience delivering projects for Data and Analytics tools and technologies, as a data engineer.
- Experience with delivering on business application data requirements.
- Expertise in SQL and building ETL pipelines, including experience with: Azure Data Factory, Azure Databricks, Azure Stream Analytics, Azure Event Hubs.
- Expertise in databases and data warehousing; including experience with: Kimball Methodology, Azure SQL, Azure Synapse.
- Experience with programming languages, including experience with one or more of the following: Python, Java, Scala.
- Experience working on a Scrum Team in an Agile environment and following CI/CD principles.
- Following certifications:
- Microsoft Certified: Azure Fundamentals
- Microsoft Certified: Azure Data Engineer Associate
- Databricks Certified Associate Developer for Apache Spark
- Databricks Certified Data Engineer Associate
Required skills/Competencies
- Strong programming Skills in one or more languages such as Python, Scala, and proficient in writing efficient and optimized code for data integration, storage, processing and manipulation.
- Strong understanding and experience with SQL and writing advanced SQL queries, database design, and optimization techniques.
- Strong understanding of the software development lifecycle (SDLC), including Agile methodologies.
- Strong knowledge of SDLC tools and technologies Azure DevOps, including project management software (Jira, Azure Boards or similar), source code management (GitHub, Azure Repos or similar), CI/CD system (Azure DevOps, GitHub actions, or similar) and binary repository manager.
- Strong understanding of DevOps principles, including continuous integration, continuous delivery (CI/CD), infrastructure as code (IaC), configuration management, automated testing and cost management.
- Deep knowledge in cloud computing specifically in Microsoft Azure services related to data and analytics, such as Azure Data Factory, Azure Databricks, Azure Synapse Analytics (formerly SQL Data Warehouse), Azure Stream Analytics, Azure Event Hub, PowerBI, Azure DevOps, Azure Blob Storage, Azure Data Lake Storage, Azure SQL Database, etc.
- Solid understanding of Data Modeling and Database Design Principles, concepts and techniques. Being able to design and implement efficient database schemas that meet the requirements of the data architecture to support data solutions.
- Experience with NoSQL databases, including Azure Cosmos DB.
- Skilled in Data Integration from different sources such as APIs, databases, flat files, event streaming.
- Strong experience in designing and implementing Data Warehousing solutions in Azure with Azure Synapse Analytics.
- Strong experience in designing and implementing efficient ELT/ETL tools, frameworks and processes in Azure and using open source solutions being able to develop custom integration solutions as needed
- Strong experience with scalable and distributed Data Processing Technologies such as Spark/PySpark, and Azure Event Hub, to be able to handle large volumes of data.
- Strong experience in Orchestration using technologies like Apache Airflow.
- Excellent analytical and Problem-Solving skills to identify and address technical issues, performance bottlenecks, and system failures.
- Proficiency in debugging and troubleshooting issues in complex data and analytics environments and pipelines.
- Good understanding of Data Quality and Governance, including implementation of data quality checks and monitoring processes to ensure that data is accurate, complete, and consistent.
- Good understanding of BI solutions including PowerBI.
- Effective and excellent written and vocal communication skills to collaborate with cross-functional teams, clients, including data architects, DevOps engineers, data analysts, data scientists, developers and operations teams, and ability to think strategically and work cross-functionally with multiple stakeholders and audiences.
- Ability to document processes, procedures, and deployment configurations.
- Ability to understand and explain the business scenarios / issues to the business stakeholders.
- Possesses strong leadership skills with a willingness to lead, create Ideas, and be assertive.
- Understanding of Azure security practices, including network security groups, Azure Active Directory, encryption, and compliance standards.
- Ability to implement security controls and best practices within data and analytics solutions, including proficient knowledge and working experience on various cloud security vulnerabilities and ways to mitigate them.
- A willingness to stay updated with the latest Azure services, Data Engineering trends, and best practices in the field.
- Must be well organized, comfortable working in a fast-paced environment, and able to prioritize effectively team player.
- Ability to work and learn independently to answer questions, while maintaining keen attention to detail and quality.
- Commitment to agility, continuous learning, and ability to adapt to changing business needs.
Responsibilities
- Design, develop, optimize and maintain high-performance, large-scale data architectures prioritizing best practices, building reliable data pipelines using Azure Data Factory (ADF), Azure Databricks, and Azure Synapse Analytics to support ETL/ELT processes into Data warehouses.
- Support data integration (batch and real-time), storage, processing, and infrastructure and ensure scalability, reliability, and performance of data systems.
- Manage and optimize database systems, including building and updating stored procedures and functions.
- Manage structured and unstructured data within Azure Data Lake Storage (ADLS) and Azure SQL Database.
- Design and implement database schemas, dashboard schemas, and data pipelines to support business requirements.
- Develop real-time data processing solutions using Azure Stream Analytics, Azure Event Hubs and Azure Functions.
- Build batch processing systems for large datasets using Azure Data Factory and Azure Databricks.
- Serve as an SME on business logics and ensure their accuracy by verifying them during code reviews.
- Code reviews to ensure code quality, performance, and adherence to coding standards. (PySpark, Scala, & Spark SQL)
- Collaborate with the Solutioning team to design and implement data solutions that align with business objectives.
- Monitor and troubleshoot data pipelines, database performance issues, and data quality problems.
- Provide mentorship and guidance to junior data engineers and foster their professional growth and development.
- Collaborate with cross-functional teams (Product, Engineering, Data Scientists, Analysts) to drive discovery and requirements gathering for data management and business analytics efforts.
- Understand data requirements and translate them into effective solutions.
- Continuously evaluate and implement new technologies and tools. Promote the development of reusable components.
- Design, implement, and maintain data governance solutions. Manage cataloging, lineage, data quality (enhancing data accuracy), and governance frameworks. Implement data validation and quality assurance processes. Lead data security and privacy efforts.
- Cover descriptive, diagnostic, predictive, and prescriptive analytics requirements.
- Actively participate in Agile ceremonies, working on a Scrum team to deliver on projects and initiatives that impact the all functional areas of the organization.
- Contribute to continuous improvement activities.
- Stay updated on market trends, emerging technologies and best practices in data engineering to continuously improve the data infrastructure and processes.
Fusemachines is an Equal Opportunities Employer, committed to diversity and inclusion. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or any other characteristic protected by applicable federal, state, or local laws.