Function Reference
load_raw_data(path=None)
Purpose: Load the raw Broadway CSV.
Args:
- path: Optional path to a CSV file. If omitted, the packaged dataset is used.
Returns:
- pandas.DataFrame: The original raw dataset.
Example:
from broadway_insights import load_raw_data
raw_df = load_raw_data()
clean_broadway_data(df)
Purpose: Clean malformed fields and derive analysis-ready columns such as seat counts and run metrics.
Args:
- df: Raw pandas.DataFrame loaded from the Broadway CSV.
Returns:
- pandas.DataFrame: Cleaned weekly dataset.
Example:
from broadway_insights import clean_broadway_data, load_raw_data
clean_df = clean_broadway_data(load_raw_data())
load_clean_data(path=None)
Purpose: Load and clean the Broadway data in one step.
Args:
- path: Optional custom CSV path.
Returns:
- pandas.DataFrame: Cleaned weekly dataset.
Example:
from broadway_insights import load_clean_data
df = load_clean_data()
summarize_show_runs(df)
Purpose: Aggregate weekly records to one row per show.
Args:
- df: Cleaned weekly Broadway dataframe.
Returns:
- pandas.DataFrame: Show-level summary with run window, revenue, ticket, and theater metrics.
Example:
from broadway_insights import load_clean_data, summarize_show_runs
summary = summarize_show_runs(load_clean_data())
analyze_award_weekly_revenue(df)
Purpose: Compare tracked run length and average weekly revenue between Tony-flagged and non-flagged shows.
Args:
- df: Cleaned weekly Broadway dataframe.
Returns:
- dict[str, float]: Summary statistics for run length and revenue differences.
Example:
from broadway_insights import analyze_award_weekly_revenue, load_clean_data
results = analyze_award_weekly_revenue(load_clean_data())
analyze_theater_size_vs_gross(df)
Purpose: Estimate the relationship between theater size and weekly gross revenue.
Args:
- df: Cleaned weekly Broadway dataframe.
Returns:
- dict[str, float]: Correlations and a simple linear trend estimate.
Example:
from broadway_insights import analyze_theater_size_vs_gross, load_clean_data
results = analyze_theater_size_vs_gross(load_clean_data())
analyze_award_vs_theater_size(df)
Purpose: Compare average theater size for Tony-flagged and non-flagged shows.
Args:
- df: Cleaned weekly Broadway dataframe.
Returns:
- dict[str, float]: Group averages and the difference in average theater size.
Example:
from broadway_insights import analyze_award_vs_theater_size, load_clean_data
results = analyze_award_vs_theater_size(load_clean_data())
build_revenue_scatter(df)
Purpose: Create an interactive scatterplot of theater size versus weekly gross revenue.
Args:
- df: Cleaned weekly Broadway dataframe.
Returns:
- plotly.graph_objects.Figure: Interactive scatterplot.
Example:
from broadway_insights import build_revenue_scatter, load_clean_data
fig = build_revenue_scatter(load_clean_data())
build_run_length_boxplot(df)
Purpose: Visualize tracked run-length differences between Tony-flagged and non-flagged shows.
Args:
- df: Cleaned weekly Broadway dataframe.
Returns:
- plotly.graph_objects.Figure: Interactive boxplot.
Example:
from broadway_insights import build_run_length_boxplot, load_clean_data
fig = build_run_length_boxplot(load_clean_data())