Dataset Gallery: Miscellaneous

FREE

Movies 2000-2018 academy_awards

Movies dataset from 2000 to 2018 used to predict 2019 Academy Awards

oscars oscars_20190216

1.2 MB 119 fields / 1235 instances

237

FREE

diabetes petersen

I plan to delete this...

25.6 KB 9 fields / 768 instances

146

FREE

winequality-red | Training (80%) uwcse180

winequality-red | Training (80%)

org_auto_project_erhong

65.8 KB 12 fields / 1279 instances

123

FREE

whiskies' dataset jarcosr

Data on flavor of 86 single malt scotch whiskies

3.0 KB 13 fields / 86 instances

69

FREE

stumbleupon_evergreen jarcosr

Opinions about the company

2.1 MB 27 fields / 724 instances

51

FREE

Factory predictive maintenance jarcosr

This dataset contains 8.648 instances each on them with 31 descriptive variables from an industrial machine with some failures in it

4.2 MB 31 fields / 8648 instances

41

FREE

Movies (Oscars 2001-2025) academy_awards

This is the dataset used to predict the 2025 Academy Award winners. It contains 299 data fields on 1,377 movies produced between 2001 and 2025.

oscars oscars_20250224

3.3 MB 299 fields / 1377 instances

38

FREE

customer_sales_churn_small jarcosr

This dataset contains sales and behavioral data for 10,000 customers, primarily designed for churn analysis and customer segmentation. It includes metrics tracked over several weeks (labeled W1 through W5) to capture trends in purchasing behavior.

Key Components of the Dataset: Customer Identification: CUSTOMER_ID uniquely identifies each record.

Sales Metrics: Includes total sales (Total_Sale), standard deviation of sales (STD_Sales), and average purchase value (APV). Note that many sales values appear to be log-transformed or normalized.

Temporal Tracking (Weekly Data): Detailed breakdown for weeks 1–5, including:

Visits: Number of visits per week (W1_Visits, etc.).

Sales Performance: Minimum, maximum, and standard deviation of sales per week (e.g., W3_Max_Sale, W4_Min_Sale).

Activity Flags: Binary indicators for whether a customer was active in specific weeks (week_1, week_2, etc.).

Recency: Days_since_last_visit tracks how long it has been since the customer's last interaction.

Customer Segmentation:

Value-based: Categorizes customers as High_value, Low_value, or Regular.

Engagement-based: Categorizes customers by visit frequency (e.g., Frequent_Visitors, Rare_Visitors).

Target Variable:

CHURN: A binary indicator (0 or 1) typically used to predict if a customer will stop using the service.

2.8 MB 36 fields / 10000 instances

15

FREE

market_basket jarcosr

The market_basket.csv dataset is a transactional dataset typically used for Market Basket Analysis (identifying relationships between items purchased together).

Dataset Overview: Format: It contains 9,835 rows and a single column titled products.

Content: Each row represents an individual transaction (a "basket").

Data Structure: Each entry is a comma-separated list of items bought in that specific transaction.

479.5 KB 1 fields / 9835 instances

15

COMPANY

PRODUCT

BUSINESS

TRAINING

GALLERY

License

Embed this resource in your web site

COMPANY

PRODUCT

BUSINESS

TRAINING

GALLERY