Details:
- Fluency in more than one programming language (Python, R, C++, Java, Scala).
- Should be proficient in Data Cleaning, Bucketing, Blending and managing multiple phases of Unstructured to Structured Data development environment.
- Experience with relational SQL and NoSQL databases, including Postgres and Cassandra, MongoDB, Redis
- Production level experience of Elasticsearch / Solr / Lucene.
- Working knowledge of message queuing, stream processing, and highly scalable big data' data stores.
- Build processes supporting data transformation, data structures, metadata, dependency and workload management.
- Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
- Experience with systems: Apache Storm, Spark-Streaming,Apache Gobblin, Apache NiFi.
Min. Qualification:
- 2- 4 year professional working experience
- Strong communication and interpersonal skills
Skills Required:
Mongodb, Apache Strom, Azkaban, Elasticsearch, Python
Roles:
- Identify, analyze and interpret patterns from data sets.
- Write efficient code to automate, analyze and structure data.
- Analyzing results using statistical techniques.
- Create and maintain optimal and robust data pipeline architecture.
- Assemble large, complex data sets that meet functional / non-functional business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimising data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources.
- Partner with Data Science and Engineering teams to solve problems and identify trends and opportunities.