Job Description
				  Job Description 
 Job Description 
 Project Timeline 
-  Start Date:  Immediate 
 -  Duration:  5 6 weeks 
 -  Commitment:  Part-time, ~20 hours/week (flexible)
 -  Schedule:  Fully remote 
 
 Key Responsibilities 
-  Evaluate AI-generated data analyses for quality, correctness, and clarity. 
 -  Understand dataset context and apply statistical analysis and modeling for both specific and open-ended prompts. 
 -  Design prompts and create clear, detailed rubrics for reward modeling and evaluation. 
 -  Produce gold-standard responses: data visualizations, explanatory text, and executable Python notebooks. 
 -  Translate data-science reasoning and decisions into gradable criteria for AI agents. 
 
 QualificationsRequired  Qualifications 
-  Education / Experience:  Bachelor's degree in Data Science, Statistics, Computer Science, or a related field, or equivalent practical experience. 
 -  Professional Experience:  2+ years in industry data-science, analytics, or closely related roles involving hands-on analysis and model evaluation. 
 -  Programming Tooling:  Proficiency in Python and core data libraries (pandas, numpy, scikit-learn). Comfortable authoring reproducible Jupyter/IPython notebooks and delivering executable code. 
 -  Data Statistical Skills:  Solid grounding in statistical analysis, hypothesis testing, experimental design (including A/B testing), and data visualization practices. 
 -  Communication Evaluation:  Excellent analytical writing skills; ability to convert data-science reasoning into clear, gradable rubrics and actionable feedback for AI outputs. 
 
 Preferred Qualifications 
-  Advanced degree (MS/PhD) in a relevant field. 
 -  Familiarity with additional libraries and tools such as matplotlib, seaborn, plotly, statsmodels, and experiment-analysis packages. 
 -  Experience with SQL and working with  relational/columnar  databases; familiarity with data pipelines and ETL concepts. 
 -  Experience with ML frameworks and workflows (TensorFlow, PyTorch, model evaluation metrics, hyperparameter tuning) and basic knowledge of  deployment/containerization  (Docker) and version control (Git). 
 -  Experience designing evaluation protocols, annotation projects, human-in-the-loop labeling, reward modeling, or conducting model audits and bias/fairness assessments. 
 -  Familiarity with cloud platforms (AWS, GCP, or Azure) and working with large datasets is a plus. 
 -  Experience with prompt engineering or evaluating LLM-generated data analyses is beneficial. 
 
 Examples of Acceptable Experience 
-  Delivered end-to-end analyses in Jupyter notebooks that included EDA, modeling, visualization, and reproducible code for stakeholders. 
 -  Built or contributed to evaluation rubrics and gold-label datasets for model validation, human evaluation, or reward-model training. 
 -  Conducted A/B tests or controlled experiments and summarized statistical significance and practical implications in clear reports. 
 -  Reviewed and quality-checked ML model outputs, identified failure modes, and recommended corrective actions or metric changes. 
 
				 
				 Job Tags
				 Immediate start, Remote work, Flexible hours,