Dataset Gallery: Higher Education & Scientific Research

FREE

Letter Image Recognition Data czuriaga

Description

The objective is to identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The character images were based on 20 different fonts and each letter within these 20 fonts was randomly distorted to produce a file of 20,000 unique stimuli. Each stimulus was converted into 16 primitive numerical attributes (statistical moments and edge counts) which were then scaled to fit into a range of integer values from 0 through 15. We typically train on the first 16000 items and then use the resulting model to predict the letter category for the remaining 4000. See the article cited above for more details.

Attribute Information:
letter - capital letter (26 values from A to Z)
x-box - horizontal position of box (integer)
y-box - vertical position of box (integer)
width - width of box (integer)
high - height of box (integer)
onpix - total # on pixels (integer)
x-bar - mean x of on pixels in box (integer)
y-bar - mean y of on pixels in box (integer)
x2bar - mean x variance (integer)
y2bar - mean y variance (integer)
xybar - mean x y correlation (integer)
x2ybr - mean of x * x * y (integer)
xy2br - mean of x * y * y (integer)
x-ege - mean edge count left to right (integer)
xegvy - correlation of x-ege with y (integer)
y-ege - mean edge count bottom to top (integer)
yegvx - correlation of y-ege with x (integer)

Source

Letter Image Recognition Data from Delve datasets and UCI

Letters Recognition

696.0 KB 17 fields / 20000 instances

123

FREE

eCommerce train czuriaga

eCommerce problem from Tatvic blog

Train file: 2133 rows

Source

Dataset at Tatvic blog post

eCommerce Train

444.8 KB 21 fields / 2133 instances

26

FREE

American Colleges and Universities czuriaga

American Colleges and Universities

Source

mathforum.org datasets

Colleges Universities Edication

132.6 KB 23 fields / 1302 instances

22

FREE

Higgs Boson Machine Learning Challenge czuriaga

Kaggle Higgs Boson Machine Learning Challenge

Source

Kaggle Challenge

Kaggle Challenge Higgs

52.7 MB 33 fields / 250000 instances

22

FREE

eCommerce test czuriaga

eCommerce problem from Tatvic blog

Test file: 425 rows

Source

Dataset at Tatvic blog post

eCommerce Test

87.0 KB 20 fields / 425 instances

9

FREE

US Public Libraries czuriaga

Public Libraries in USA

Source

National Center for Education Statistics

USA Libraries

4.0 MB 79 fields / 8844 instances

7

FREE

See Click Predict Fix Test czuriaga

See Click Predict Fix from Kaggle competition.

Test file

18.9 MB 15 fields / 149575 instances

6

FREE

Ringnorm dataset czuriaga

This is an implementation of Leo Breiman's ringnorm example[1]. It is a 20 dimensional, 2 class classification example. Each class is drawn from a multivariate normal distribution.

Class 1 has mean zero and covariance 4 times the identity.

Class 2 has mean (a,a,..a) and unit covariance. a = 2/sqrt(20).

Breiman reports the theoretical expected misclassification rate as 1.3%. He used 300 training examples with CART and found an error of 21.4%.

Source:
Ringnorm dataset

Ringnorm artificial historical

1.1 MB 21 fields / 7400 instances

6

FREE

Top 100 Private Colleges 2003 czuriaga

Top 100 Private Colleges 2003

Source

mathforum.org datasets

Education College Top 100

10.9 KB 18 fields / 100 instances

4

COMPANY

PRODUCT

BUSINESS

TRAINING

GALLERY

License

Embed this resource in your web site

COMPANY

PRODUCT

BUSINESS

TRAINING

GALLERY