Dataset aimed to improve in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years. The goal is to build model that borrowers can use to help make the best financial decisions.
• 150,000 borrowers
Dataset structure: ID: ID of borrower. SeriousDlqin2yrs: Person experienced 90 days past due delinquency or worse (Type: Y/N). RevolvingUtilizationOfUnsecuredLines: Total balance on credit cards and personal lines of credit except real estate and no installment debt like car loans divided by the sum of credit limits (Type: percentage) Age: Age of borrower in years (Type: integer) NumberOfTime30-59DaysPastDueNotWorse: Number of times borrower has been 30-59 days past due but no worse in the last 2 years. (Type: integer). DebtRatio: Monthly debt payments, alimony, living costs divided by monthly gross income (Type: integer) MonthlyIncome: Monthly income (Type: real) NumberOfOpenCreditLinesAndLoans: Number of Open loans (installment like car loan or mortgage) and Lines of credit (e.g. credit cards) (Type: integer) NumberOfTimes90DaysLate: Number of times borrower has been 90 days or more past due. (Type: integer) NumberRealEstateLoansOrLines: Number of mortgage and real estate loans including home equity lines of credit (Type: integer) NumberOfTime60-89DaysPastDueNotWorse: Number of times borrower has been 60-89 days past due but no worse in the last 2 years. (Type: integer) NumberOfDependents: Number of dependents in family excluding themselves (spouse, children etc.). (Type: integer)
This is a simplified dataset aimed to predict inventory demand based on historical sales data. The objective is to forecast the demand of a product for a given week, at a particular store. The dataset consists of 9 weeks of sales transactions in Mexico.
Every week, there are delivery trucks that deliver products to the vendors. Each transaction consists of sales and returns. Returns are the products that are unsold and expired. The demand for a product in a certain week is defined as the sales this week subtracted by the return next week.
Things to note:
The adjusted demand (Demanda_uni_equil) is always >= 0 since demand should be either 0 or a positive value. The reason that Venta_uni_hoy - Dev_uni_proxima sometimes has negative values is that the returns records sometimes carry over a few weeks.
Data fields:
Semana — Week number (From Thursday to Wednesday) Agencia_ID — Sales Depot ID Town — Town of the Agencia State — State of the Agencia Canal_ID — Sales Channel ID Ruta_SAK — Route ID (Several routes = Sales Depot) Cliente_ID — Client ID NombreCliente — Client name Producto_ID — Product ID NombreProducto — Product Name Venta_uni_hoy — Sales unit this week (integer) Venta_hoy — Sales this week (unit: pesos) Dev_uni_proxima — Returns unit next week (integer) Dev_proxima — Returns next week (unit: pesos) Demanda_uni_equil — Adjusted Demand (integer) (This is the target you will predict)
The dataset is aimed to classify the malware/beningn Android permissions.
A binary vector of permissions is used for each application analyzed {1=used, 0=no used}. Moreover, the samples of malware/benign were devided by "Type"; 1 malware and 0 non-malware.
Source:kaggle. Originally from the following paper:
Urcuqui, C., & Navarro, A. (2016, April). Machine learning classifiers for android malware analysis. In Communications and Computing (COLCOM), 2016 IEEE Colombian Conference on (pp. 1-6). IEEE.