Job Description
Job Description
Job Description
Project Timeline
- Start Date: Immediate
- Duration: 5 6 weeks
- Commitment: Part-time, ~20 hours/week (flexible)
- Schedule: Fully remote
Key Responsibilities
- Evaluate AI-generated data analyses for quality, correctness, and clarity.
- Understand dataset context and apply statistical analysis and modeling for both specific and open-ended prompts.
- Design prompts and create clear, detailed rubrics for reward modeling and evaluation.
- Produce gold-standard responses: data visualizations, explanatory text, and executable Python notebooks.
- Translate data-science reasoning and decisions into gradable criteria for AI agents.
QualificationsRequired Qualifications
- Education / Experience: Bachelor's degree in Data Science, Statistics, Computer Science, or a related field, or equivalent practical experience.
- Professional Experience: 2+ years in industry data-science, analytics, or closely related roles involving hands-on analysis and model evaluation.
- Programming Tooling: Proficiency in Python and core data libraries (pandas, numpy, scikit-learn). Comfortable authoring reproducible Jupyter/IPython notebooks and delivering executable code.
- Data Statistical Skills: Solid grounding in statistical analysis, hypothesis testing, experimental design (including A/B testing), and data visualization practices.
- Communication Evaluation: Excellent analytical writing skills; ability to convert data-science reasoning into clear, gradable rubrics and actionable feedback for AI outputs.
Preferred Qualifications
- Advanced degree (MS/PhD) in a relevant field.
- Familiarity with additional libraries and tools such as matplotlib, seaborn, plotly, statsmodels, and experiment-analysis packages.
- Experience with SQL and working with relational/columnar databases; familiarity with data pipelines and ETL concepts.
- Experience with ML frameworks and workflows (TensorFlow, PyTorch, model evaluation metrics, hyperparameter tuning) and basic knowledge of deployment/containerization (Docker) and version control (Git).
- Experience designing evaluation protocols, annotation projects, human-in-the-loop labeling, reward modeling, or conducting model audits and bias/fairness assessments.
- Familiarity with cloud platforms (AWS, GCP, or Azure) and working with large datasets is a plus.
- Experience with prompt engineering or evaluating LLM-generated data analyses is beneficial.
Examples of Acceptable Experience
- Delivered end-to-end analyses in Jupyter notebooks that included EDA, modeling, visualization, and reproducible code for stakeholders.
- Built or contributed to evaluation rubrics and gold-label datasets for model validation, human evaluation, or reward-model training.
- Conducted A/B tests or controlled experiments and summarized statistical significance and practical implications in clear reports.
- Reviewed and quality-checked ML model outputs, identified failure modes, and recommended corrective actions or metric changes.
Job Tags
Immediate start, Remote work, Flexible hours,