Summer Bootcamp: Python, AI, and Agentic Empirical Research

Detailed Syllabus Beyond the Poster

4-Day Summer Bootcamp: University-Style Full Syllabus

This single-column webpage syllabus is separate from the poster and written with the density of a short university course. Each day has lecture topics, lab work, key questions, assignments, and a deliverable. The syllabus adds detailed coverage of LLM principles, CNN/LSTM/GRU/seq2seq/attention/Transformer architectures, and their limits in empirical research.

Course Information

Course Type: 4-day intensive bootcamp; For learners who want a systematic path through Python, AI, causal inference, and agentic empirical research.
Structure: Lecture + code + empirical case + project output; Each day is organized around one concrete research deliverable.
Core Tools: Python / pandas / sklearn / PyTorch / Stata / R / StatsPAI / Codex; The course emphasizes cross-checking across tools, not dependence on one software stack.
Materials: Example data, code templates, task briefings, replication checklists; All materials support the final reproducible research package.
Audience: Advanced undergraduates, graduate students, junior faculty, and empirical researchers; Especially useful for learners with research questions but no automation workflow.

Learning Objectives

Build a reusable Python empirical-research project template for data, code, tables, figures, logs, and documentation.
Master the basic ML workflow: problem definition, feature construction, train/validation/test splits, metrics, overfitting diagnosis, and model interpretation.
Understand the architecture of CNN, RNN, LSTM, GRU, seq2seq, attention, Transformers, and large language models rather than memorizing model names.
Explain how LLM pretraining, instruction tuning, alignment, RAG, function calling, MCP, and tool use enter research workflows.
Separate prediction tasks from causal-identification tasks and learn when RCT, DID, PSM, IV, RD, synthetic control, DML, and causal forests are appropriate.
Turn a natural-language research task into an executable agent plan that leaves auditable files, logs, and outputs.
Practice publication-grade habits: data dictionaries, sample restrictions, robustness checks, interpretation, replication packages, and human review checkpoints.

Prerequisites

At least one course in statistics, econometrics, or empirical research methods is recommended.
Fluency in Python is not required, but students should be ready to install environments, run notebooks, and read basic code.
Students are encouraged to bring one research question, dataset, or paper they want to replicate.
Learners with Stata or R experience can bring existing do-files or R scripts for comparison with Python workflows.
The course assumes one principle: AI can accelerate research, but it cannot replace researcher judgment about identification, data quality, and claims.

Module Schedule

The four days move from data workflow to AI modeling, causal identification, and agent automation. Each day produces a research component that can be saved, rerun, and extended.

Day 1

Python Programming, Data Engineering, and Reproducible Project Structure

Day 1

The first day moves research work from manual operations to scripted workflows. The goal is not a syntax sprint; it is a reusable project skeleton for future empirical research.

Lecture Topics

Python environment: conda/venv, pip, requirements, Jupyter, VSCode, command line, and Git.
Core syntax: variables, lists, dictionaries, functions, modules, exception handling, paths, and file I/O.
Data access: CSV/Excel, APIs, requests, BeautifulSoup, webpage structure, and scraping limits.
pandas: missing values, duplicates, type conversion, merge, concat, groupby, pivot, reshape, and time variables.
Visualization and outputs: matplotlib, seaborn, descriptive tables, figure export, and logs.
Reproducible structure: raw data stays untouched, processed data is generated, scripts rerun, and outputs are traceable.

In-Class Lab

Start from one raw dataset and finish cleaning, variable construction, descriptive statistics, and a first figure.
Move successful notebook exploration into scripts and document how to rerun the project in a README.
Use Git to save the full change from raw data to output table.

Key Questions

What does reproducible mean, and do failures usually come from data, paths, dependencies, random seeds, or manual steps?
Why do agents need clear file structure, explicit data dictionaries, and stable output formats?
When should a workflow remain in Stata/R, and when should it move to Python?

Assignment

Organize one research project directory and write the data source, variable notes, run order, output files, and unresolved problems.

Module Deliverable

A rerunnable data-cleaning and descriptive-analysis pipeline.

Day 2

Machine Learning, Deep Learning, and Large Language Model Principles

Day 2

The second day explains how AI models learn from data. It starts with sklearn-style ML and moves into neural networks, CNN, LSTM, GRU, seq2seq, attention, Transformers, and LLMs.

Lecture Topics

ML tasks: supervised learning, unsupervised learning, classification, regression, clustering, dimensionality reduction, and anomaly detection.
Modeling workflow: feature engineering, train/validation/test splits, cross-validation, loss functions, optimizers, regularization, and metrics.
Neural-network foundations: linear layers, activations, backpropagation, gradient descent, batches, epochs, and learning rates.
CNN: kernels, padding, stride, pooling, feature maps, receptive fields, and parameter sharing.
RNN/LSTM/GRU: hidden states, gates, long-range dependencies, vanishing gradients, and sequence prediction.
seq2seq: encoder-decoder, teacher forcing, beam search, and input-output sequence generation.
Attention: Q/K/V, scaled dot-product attention, cross-attention, and the boundary between auditing and explanation.
Transformer: tokenization, embeddings, positional encoding, multi-head self-attention, FFN, residual connections, LayerNorm, and stacked blocks.
LLMs: next-token pretraining, instruction tuning, alignment, RAG, function calling, tool calling, and multimodal extension.

In-Class Lab

Train or run a sklearn baseline, a simple neural model, and an LLM prompt/RAG approach on the same text or table task.
Draw the information flow for CNN, LSTM/GRU, seq2seq, and Transformers, marking input, parameter sharing, memory, and output.
Ask an LLM to explain a variable and generate code, then verify reliability through execution.

Key Questions

How do sample size, computation, interpretability, and verification cost change as model complexity rises?
Why did Transformers replace many recurrent designs, and why do they still hallucinate?
In research use, which information must come from external evidence rather than model memory?

Assignment

Complete a model-choice memo comparing classical ML, deep learning, and LLM approaches for one research task, including data needs, validation strategy, risks, and recommendation.

Module Deliverable

An interpretable ML/LLM mini-experiment and a written model-choice note.

Day 3

Causal Inference, Econometrics, and Machine Learning

Day 3

The third day places predictive modeling inside a causal-identification framework. ML helps with high-dimensional variables and heterogeneity, but credible claims still require a clear counterfactual and assumptions.

Lecture Topics

Causal foundations: potential outcomes, ATE/ATT, counterfactuals, selection bias, common support, and SUTVA.
Experiments and quasi-experiments: RCT, DID, event studies, PSM, IV, RD, synthetic control, SDID, and GSC.
ML for causal inference: DML, causal forests, heterogeneous treatment effects, and nuisance functions.
Applied modeling: fixed effects, clustered standard errors, sample restrictions, winsorization, balance checks, and parallel trends.
Robustness: alternative variables, alternative samples, placebo tests, sensitivity analysis, mechanisms, and heterogeneity.
Result communication: reading regression tables, explaining economic magnitude, and avoiding overstated causal language.

In-Class Lab

Run a DID or event-study specification on a policy-evaluation dataset and produce a publication-style table.
Cross-check one core result across Python, Stata, or R, including sample size, coefficients, standard errors, and fixed effects.
Use DML or causal forests for high-dimensional control or heterogeneity exploration, then explain why they do not replace identification.

Key Questions

Does causal identification come from data, institutional background, model specification, or an ML algorithm?
How should a researcher choose among DID, IV, RD, PSM, and synthetic control?
Which robustness checks are informative, and which are mechanical table-stacking?

Assignment

Write an identification memo for your own question: treatment, outcome, sample, counterfactual, identifying assumptions, main model, and three robustness checks.

Module Deliverable

A model specification, result table, and robustness plan for a full empirical case.

Day 4

Agentic Empirical Research, MCP, and Paper-Automation Workflow

Day 4

The fourth day connects the previous three days to an agent workflow. The goal is not to replace the researcher, but to make AI act like an auditable research assistant for file reading, code execution, result checking, drafting, and replication packaging.

Lecture Topics

Agent architecture: task planning, tool selection, context management, memory, feedback loops, and error recovery.
Tool use: function calling, MCP, filesystem, Python, Stata, R, databases, browsers, and StatsPAI.
Research decomposition: data audit, variable construction, descriptive statistics, model estimation, figures, and writing.
Auditable automation: provenance, logs, commands, random seeds, intermediate files, version control, and human checkpoints.
Paper workflow: literature summaries, research-design discussion, result interpretation, limitations, appendix, and replication package.
Failure modes: hallucination, false citation, data leakage, overfitting, path errors, silent sample changes, and irreproducible outputs.

In-Class Lab

Rewrite one natural-language request as an agent briefing with goal, data, variables, constraints, outputs, and validation steps.
Have an agent run data-analysis code while saving logs, output tables, errors, and draft interpretation.
Review the agent output manually for sample size, variable definitions, model specification, statistical significance, and wording.

Key Questions

How do we make an agent call tools instead of guessing?
How should an automated research workflow be paused, resumed, and rolled back?
Which writing steps can be automated, and which must remain researcher judgment?

Assignment

Complete an agent-assisted research package: task brief, code, execution logs, result table, figure, report draft, and human-review checklist.

Module Deliverable

An auditable agentic workflow from data to first report draft.

Model Architecture Track

This track places large language models in the broader history of deep learning: local feature extraction, sequence memory, input-output generation, attention, Transformers, and tool-using LLM agents.

CNN: Convolutional Neural Networks

CNNs answer the question of how to recognize stable local patterns. Instead of connecting every input to every output, they share filters over local windows to detect repeated structures in images, spatial grids, or local text patterns.

Architecture Anatomy

Convolution layers slide kernels over inputs to learn edges, shapes, local phrases, or spatial adjacency.
Padding and stride control boundary information, feature-map size, and compression speed.
Pooling compresses local information and increases translation robustness.
Feature maps and receptive fields explain how much of the original input a unit can see.

Research Use

Useful for remote-sensing images, night lights, street views, geographic grids, document images, contract layouts, and local textual features.

Limits and Risks

CNNs are less natural for long-range dependency and complex semantics, so they are often combined with sequence models, attention, or pretrained models.

Lab Connection

Compare hand-crafted features, CNN features, and pretrained embeddings on a small classification task, with attention to overfitting and interpretability.

RNN / LSTM / GRU Sequence Models

The RNN family answers how a model can read a sequence while carrying memory. LSTMs and GRUs use gates to reduce fast forgetting, vanishing gradients, and long-dependency failures.

Architecture Anatomy

RNN hidden states pass information from one time step to the next.
LSTM input, forget, and output gates decide what to write, retain, and expose.
GRU update and reset gates provide a lighter memory-control design.
Bidirectional sequence models use both left and right context for labeling and classification.

Research Use

Useful for firm trajectories, financial time series, user behavior sequences, policy-text evolution, and event histories.

Limits and Risks

Sequence models train slowly and parallelize poorly on long text; Transformers are often better for long or complex dependency patterns.

Lab Connection

Use panel or text sequences to compare lagged features, LSTM, and GRU models on out-of-sample performance and explanation cost.

seq2seq and Encoder-Decoder

seq2seq maps one input sequence into one output sequence. It is an early core framework for translation, summarization, Q&A, code generation, and research-report generation.

Architecture Anatomy

The encoder turns input text, code, or variable notes into contextual representations.
The decoder generates output step by step using prior generated tokens and context.
Teacher forcing stabilizes training by feeding the true previous token during learning.
Greedy search and beam search trade speed, quality, and diversity at inference time.

Research Use

Explains how natural-language research tasks become code, how regression tables become prose, and how literature paragraphs become summaries.

Limits and Risks

Early seq2seq models compress too much into a bottleneck representation, making long text difficult without attention.

Lab Connection

Convert a variable definition or empirical task into structured JSON or Python pseudocode, then inspect how generation errors arise.

Attention Mechanism

Attention answers which parts of the input the model should consult at the current step. Query, key, and value vectors compute relevance weights so the model can dynamically select information.

Architecture Anatomy

Q/K/V: the query asks the current question, keys describe candidate positions, and values carry the information.
Scaled dot-product attention computes similarity weights and weighted sums.
Cross-attention lets the output side read from the input side during generation.
Attention weights can support auditing, but they are not the same as causal explanations.

Research Use

Useful for connecting policy text, paper paragraphs, variable descriptions, interviews, and multi-source evidence.

Limits and Risks

Attention is not reliable causal evidence; long context adds cost, noise, and retrieval errors.

Lab Connection

Use policy text to locate treatment definitions, timing, and sample restrictions, then convert highlighted spans into auditable evidence.

Transformer Architecture

Transformers replace recurrence with self-attention, allowing parallel sequence processing and multi-head relationships. Modern large language models are built on this architecture.

Architecture Anatomy

Tokenization and embeddings turn text, code, and symbols into vectors.
Positional encoding adds order information.
Multi-head self-attention learns dependencies in several representation spaces.
Feed-forward networks, residual connections, and LayerNorm support nonlinear expression, stable training, and deep stacks.

Research Use

Used for paper reading, policy-text encoding, summarization, code generation, table interpretation, retrieval-augmented Q&A, and agent planning.

Limits and Risks

Transformers learn statistical associations and task patterns; they do not automatically guarantee factual accuracy, valid identification, or runnable code.

Lab Connection

Dissect one LLM answer: how the prompt is tokenized, how the response is generated, and which parts require RAG, code execution, and human confirmation.

Large Language Models and Agents

LLMs learn language, code, and knowledge patterns through next-token pretraining, then become task-facing through instruction tuning, preference alignment, RAG, function calling, and tool use.

Architecture Anatomy

Pretraining learns language distributions, commonsense associations, and code patterns from large corpora.
Instruction tuning and alignment improve task following, step-by-step explanation, and output safety.
RAG retrieves external sources before generation.
Tool calling and MCP let models invoke Python, Stata, R, databases, browsers, or StatsPAI instead of only writing text.

Research Use

Supports topic development, literature reading, coding, debugging, table interpretation, report drafting, replication packaging, and multi-tool workflow coordination.

Limits and Risks

LLMs may hallucinate, misread data, omit identification assumptions, or produce irreproducible conclusions. Logs, code execution, citations, tests, and human review are required.

Lab Connection

Turn a natural-language empirical request into an agent plan where every step leaves files, logs, outputs, and human checkpoints.

Assignments and Assessment

Code and Data Workflow

25%

Assess project structure, cleaning scripts, outputs, README, and rerun instructions.

AI Architecture and Model Choice

25%

Submit architecture notes for CNN/LSTM/GRU/seq2seq/attention/Transformer/LLM and one model-choice memo.

Causal Identification Memo

25%

Submit research question, assumptions, main model, robustness plan, and interpretation draft.

Agent Research Package

25%

Submit task briefing, execution records, outputs, report draft, and human-review checklist.

Final Deliverables

A reusable Python empirical-research project template.
An AI architecture map covering CNN, LSTM, GRU, seq2seq, attention, Transformers, and large language models.
A causal-inference empirical-case draft with main model, robustness plan, and interpretation.
An agent-assisted research workflow with natural-language tasking, tool calls, code execution, result verification, and human checkpoints.

Course Norms

The course encourages AI use, but all AI-generated code, prose, and claims must be verified through execution, citations, data checks, or human review.
Reproducibility is the first standard: every result should be regenerable from raw data and scripts.
The course discourages unexplained model stacking. Every model choice needs a data structure, research goal, validation metric, and failure-risk explanation.
Final outputs can become a replication package, course project, research-assistant workflow, or technical appendix for an empirical paper.

Python Programming, AI Fundamentals, and Agentic Empirical Research

4-Day Summer Bootcamp: University-Style Full Syllabus

Course Information

Learning Objectives

Prerequisites

Module Schedule

Python Programming, Data Engineering, and Reproducible Project Structure

Lecture Topics

In-Class Lab

Key Questions

Machine Learning, Deep Learning, and Large Language Model Principles

Lecture Topics

In-Class Lab

Key Questions

Causal Inference, Econometrics, and Machine Learning

Lecture Topics

In-Class Lab

Key Questions

Agentic Empirical Research, MCP, and Paper-Automation Workflow

Lecture Topics

In-Class Lab

Key Questions

Model Architecture Track

CNN: Convolutional Neural Networks

Architecture Anatomy

RNN / LSTM / GRU Sequence Models

Architecture Anatomy

seq2seq and Encoder-Decoder

Architecture Anatomy

Attention Mechanism

Architecture Anatomy

Transformer Architecture

Architecture Anatomy

Large Language Models and Agents

Architecture Anatomy

Assignments and Assessment

Code and Data Workflow

AI Architecture and Model Choice

Causal Identification Memo

Agent Research Package

Final Deliverables

Course Norms