Apply now »

PYSPARK DSA | 4 TO 8 YEARS | PAN INDIA

Job Description

Bachelor’s or master’s degree in computer science, Engineering, or related field
5-8 years of experience in data engineering and machine learning
Extensive experience with Python programming, including Pandas, NumPy, and various libraries for data manipulation and analysis
Hands-on experience with Spark architecture, including data frame operations, lazy evaluation, and UDFs
Strong understanding of DevOps principles and experience with tools like Docker, Kubernetes, and Jenkins
Familiarity with machine learning libraries such as Spark ML-lib or Azure ML is a plus
Previous experience in product-oriented organizations or Tier 1 companies is preferred

Utilize advanced SQL techniques such as joins with in-line views and self-joins for data manipulation and extraction
Proficient in Python programming, including but not limited to variables, functions, loops, conditions, and various data structures
Implement object-oriented programming concepts including polymorphism, abstract classes, and interfaces for code modularity and reusability
Develop and maintain data pipelines using Spark architecture, understanding nodes, clusters, lazy evaluation, and DAG in Spark
Perform data frame operations and user-defined functions (UDFs) for efficient data processing in Spark
Integrate APIs for data retrieval and interaction with external systems
Collaborate with cross-functional teams to understand business requirements and translate them into technical solutions
Apply best practices in software development and data engineering to ensure scalability, reliability, and performance of the ML pipelines

Ref: 1795085

Posted on: Apr 16, 2024

Experience level: Experienced

Contract Type: Permanent

Location:

Bangalore, KA, IN

Department: Big Data & Analytics

Apply now »