top of page

Using mySQL to perform preliminary EDA and to query data

Updated: Jan 31, 2021


library(RMySQL)

sqlQuery <- function (query) {
  # creating DB connection object with RMysql package
  DB <- dbConnect(MySQL(), 
                  user='TTTTT', 
                  password = 'TTTTT', 
                  dbname = 'kickstarter', 
                  host = '127.0.0.1')

  # send Query to obtain result set
  rs <- dbSendQuery(DB, query)

  # get elements from result sets and convert to dataframe
  result <- fetch(rs, -1)

  # close db connection
  dbDisconnect(DB)

  # return the dataframereturn(result)
}
cons <- dbListConnections(MySQL()) 
for(con in cons) 
  dbDisconnect(con)

First, we need to check how many tables in the database.


sqlQuery("
   SHOW tables;      
         ")
##   Tables_in_kickstarter
## 1              campaign
## 2              category
## 3               country
## 4              currency
## 5          sub_category

There are fives tables, namely: campaign, category, country, currency, and sub_category Check the content of individual table.

sqlQuery("
   SHOW COLUMNS 
   FROM campaign
         ")
##              Field     Type Null Key Default Extra
## 1               id   bigint   NO PRI    <NA>      
## 2             name     text  YES        <NA>      
## 3  sub_category_id   bigint  YES MUL    <NA>      
## 4       country_id   bigint  YES MUL    <NA>      
## 5      currency_id   bigint  YES MUL    <NA>      
## 6         launched datetime  YES        <NA>      
## 7         deadline datetime  YES        <NA>      
## 8             goal   double  YES        <NA>      
## 9          pledged   double  YES        <NA>      
## 10         backers   bigint  YES        <NA>      
## 11         outcome     text  YES        <NA>

The description of each column as follow

ID: unique project

ID name: project name

sub_category_id: what industry/category was the project in?

country_id: id number of country of origin

currency_id: currency funding was given in

launched: date fundraising began

deadline: when target amount must be raised by

goal: desired amount of funding

pledged: how much was promised (whether or not the goal was reached)

backers: how many people contributed to the campaign?

outcome: the status of project (successful/failed/…)?


sqlQuery("
   SHOW COLUMNS 
   FROM category
         ")
##   Field   Type Null Key Default Extra
## 1    id bigint   NO PRI    <NA>      
## 2  name   text  YES        <NA>

There are only two columns : the id of category and the category name.

sqlQuery("
   SHOW COLUMNS 
   FROM country;
         ")
##   Field   Type Null Key Default Extra
## 1    id bigint   NO PRI    <NA>      
## 2  name   text  YES        <NA>

There are only two columns : the id of country and the country name.

sqlQuery("
   SHOW COLUMNS 
   FROM currency;
         ")
##   Field   Type Null Key Default Extra
## 1    id bigint   NO PRI    <NA>      
## 2  name   text  YES        <NA>

There are also only two columns : the id of currency and the currency name


Exploring each table in data base

How many records in the table campaign?


sqlQuery("
        SELECT COUNT(*)
        FROM campaign;
        ")
##   COUNT(*)
## 1    15000
sqlQuery("
        SELECT COUNT(id)
        FROM campaign;
        ")

15000 id, because id is primary key => not replicate The table campaign contains 15000 rows. To display the first 15 rows

sqlQuery("
      SELECT *
      FROM campaign
      LIMIT 15;
      ")
##    id                                                        name
## 1   1                                                    Ragdolls
## 2   2                                                  One To Ten
## 3   3  Future Gun - A short film about kids and advanced hardware
## 4   4 Fourth Wave Apparel—Vintage Fashion for the Modern Feminist
## 5   5                                          The Rich Lifestyle
## 6   6                                              Perils Galore!
## 7   7               In a Dark Dark House - Directed by Larry Moss
## 8   8                                            CHARLIEFOXTROTNY
## 9   9                              Fat Englishmen - Beer and Sumo
## 10 10    bear elements old-fashioned shave and beard care for men
## 11 11                                           DigIndustrial Art
## 12 12               Expanding Farm to service the local Food Bank
## 13 13                       Church Hive. Church hiring made easy!
## 14 14                          The Visual Media Directory Project
## 15 15             Too Sweet - The Not So Serious Side to Diabetes
##    sub_category_id country_id currency_id            launched
## 1               23          2           2 2013-04-25 00:00:00
## 2               47          1           1 2015-11-16 00:00:00
## 3               24          2           2 2013-08-28 00:00:00
## 4               52          2           2 2014-07-22 00:00:00
## 5                9          2           2 2016-08-24 00:00:00
## 6               27          2           2 2012-09-25 00:00:00
## 7               20          2           2 2014-07-01 00:00:00
## 8               23          2           2 2014-05-21 00:00:00
## 9               54          2           2 2015-03-03 00:00:00
## 10              12          2           2 2016-04-14 00:00:00
## 11              97          2           2 2014-06-19 00:00:00
## 12             107          2           2 2016-06-13 00:00:00
## 13              40          2           2 2016-09-01 00:00:00
## 14              26          2           2 2012-03-07 00:00:00
## 15              10          2           2 2014-01-16 00:00:00
##               deadline       goal pledged backers    outcome
## 1  2013-05-25 00:00:00   15000.00   20.00       3     failed
## 2  2015-12-16 00:00:00     223.68  413.81      23 successful
## 3  2013-09-27 00:00:00    5000.00 1497.00      28     failed
## 4  2014-08-21 00:00:00    6000.00 8795.00     218 successful
## 5  2016-09-28 00:00:00 2000000.00    2.00       2     failed
## 6  2012-11-01 00:00:00    2500.00  230.00       5     failed
## 7  2014-08-15 00:00:00   55000.00 9316.00      45     failed
## 8  2014-06-20 00:00:00    6000.00  161.00       4     failed
## 9  2015-04-02 00:00:00   50000.00    0.00       0     failed
## 10 2016-05-14 00:00:00   12000.00  306.00      11     failed
## 11 2014-07-09 00:00:00   30000.00  111.00       4     failed
## 12 2016-07-13 00:00:00    2500.00  405.00       5     failed
## 13 2016-10-01 00:00:00   16800.00  101.00       2     failed
## 14 2012-05-06 00:00:00   10000.00    2.00       2     failed
## 15 2014-02-15 00:00:00    2500.00 2525.00       6 successful

This table contains country_id and currency_id, the country_id can be found in table country. the id of currency appears in the table campaign and the table country.

How many subcategory, country, and currency ?


sqlQuery("
      SELECT 
        COUNT(DISTINCT sub_category_id)  as subidnb, 
        COUNT(DISTINCT country_id) as countrynb,
        COUNT(DISTINCT currency_id) as currencynb
      FROM campaign;
      ")
##   subidnb countrynb currencynb
## 1     157        22         13

launched date and deadline date (datetime)


sqlQuery("
      SELECT launched AS launched_date,
             deadline AS deadline
      FROM campaign
      ORDER BY launched_date DESC
      LIMIT 10;
      ")
##          launched_date            deadline
## 1  2018-01-02 00:00:00 2018-03-03 00:00:00
## 2  2018-01-01 00:00:00 2018-01-31 00:00:00
## 3  2018-01-01 00:00:00 2018-02-11 00:00:00
## 4  2018-01-01 00:00:00 2018-01-31 00:00:00
## 5  2017-12-30 00:00:00 2018-01-29 00:00:00
## 6  2017-12-29 00:00:00 2018-01-28 00:00:00
## 7  2017-12-29 00:00:00 2018-01-28 00:00:00
## 8  2017-12-28 00:00:00 2018-01-29 00:00:00
## 9  2017-12-27 00:00:00 2018-01-26 00:00:00
## 10 2017-12-27 00:00:00 2018-01-31 00:00:00

There is data for 2017 and 2018.

sqlQuery("
      SELECT launched AS launched_date,
             deadline AS deadline
      FROM campaign
      ORDER BY launched_date ASC
      LIMIT 10;
      ")
##          launched_date            deadline
## 1  2009-05-06 00:00:00 2009-08-04 00:00:00
## 2  2009-05-07 00:00:00 2009-07-02 00:00:00
## 3  2009-05-10 00:00:00 2009-06-16 00:00:00
## 4  2009-05-12 00:00:00 2009-05-22 00:00:00
## 5  2009-06-04 00:00:00 2009-07-16 00:00:00
## 6  2009-06-05 00:00:00 2009-09-02 00:00:00
## 7  2009-06-20 00:00:00 2009-07-01 00:00:00
## 8  2009-06-30 00:00:00 2009-07-16 00:00:00
## 9  2009-07-14 00:00:00 2009-09-05 00:00:00
## 10 2009-07-28 00:00:00 2009-10-02 00:00:00

Data covers periods from 2009-05-06 to 2018-01-02 The year span is between 2009 to 2018. To check

sqlQuery("
    SELECT DISTINCT(YEAR(launched)) as launchedyears
    FROM campaign
    ORDER BY launchedyears;")
##    launchedyears
## 1           2009
## 2           2010
## 3           2011
## 4           2012
## 5           2013
## 6           2014
## 7           2015
## 8           2016
## 9           2017
## 10          2018

Similarly, we can check the year span for deadline.

sqlQuery("
    SELECT DISTINCT(YEAR(deadline)) as deadlineyears
    FROM campaign
    ORDER BY deadlineyears;
 ")
##    deadlineyears
## 1           2009
## 2           2010
## 3           2011
## 4           2012
## 5           2013
## 6           2014
## 7           2015
## 8           2016
## 9           2017
## 10          2018

We should extract year, month, and day of month.


sqlQuery("
      SELECT COUNT(*)
      FROM campaign
      WHERE YEAR(launched) = YEAR(deadline);
      ")
##   COUNT(*)
## 1    13923

There are 13923 projects that the launched data and the deadline were in the same year.

sqlQuery("
      SELECT COUNT(*)
      FROM campaign
      WHERE YEAR(launched) <> YEAR(deadline);
      ")
##   COUNT(*)
## 1     1077

Therefore, there are 1077 projects whom the launched year and the deadline year were different.


sqlQuery("
SELECT id,  DATEDIFF(deadline, launched) as daynb, goal, pledged, backers, outcome
FROM campaign
ORDER BY daynb DESC
LIMIT 10
")
##       id daynb  goal  pledged backers    outcome
## 1   6082    92 50000  1025.00       2     failed
## 2   6407    91  1500  1900.00      40 successful
## 3  10511    91   185   196.00       9 successful
## 4  13590    91  5000   951.00      17     failed
## 5  14870    91 10000    15.00       3     failed
## 6   4637    91  7500  6030.00      26     failed
## 7  14111    91  1000  2597.68     122 successful
## 8    550    91  1300    80.00       3     failed
## 9   5931    91 10000 10049.99     131 successful
## 10  6448    91 55000  5780.00      27     failed

The max duration is 91-92 days.


sqlQuery("
SELECT id,  DATEDIFF(deadline, launched) as daynb, goal, pledged, backers, outcome
FROM campaign
ORDER BY daynb ASC
LIMIT 10
")
##       id daynb      goal  pledged backers    outcome
## 1   3936     1        20    20.00       1 successful
## 2  12640     1      8888  7670.00      86     failed
## 3   3233     1 100000000     0.00       0     failed
## 4   9672     1       100   320.00      47 successful
## 5   2260     1       500   500.00       7 successful
## 6  11246     1      8888  8385.00      71     failed
## 7   8311     2       400   464.66      34 successful
## 8    667     2       100 12701.00     101 successful
## 9   7947     3     25000     1.00       1     failed
## 10 10022     3      5000    26.00       2     failed

There some projects that launched and end at the same day Project 3233 had a goal of 100000000 and pledged 0 ==> failed.


sqlQuery("
      SELECT id, goal
      FROM campaign
      ORDER BY goal DESC
      LIMIT 10;
      ")
##       id      goal
## 1  12761 100000000
## 2   3233 100000000
## 3   7118 100000000
## 4    176 100000000
## 5   9944 100000000
## 6   8150  88767573
## 7  14319  44336067
## 8   6145  17708936
## 9   4574  11086826
## 10  3101  10000000

What is the minimum goal and the maximum goal?# the year span is between 2009 to 2018 it seems date the date column only contain date with the format yyyy-mm-dd (we have to change that).

sqlQuery("
    SELECT MIN(goal) as min_goal, 
           MAX(goal) as max_goal
    FROM campaign;
      ")
#   min_goal max_goal
## 1     0.75    1e+08

The minimum is 0.75 (we do not know what currency yet) The maximun goal is 100 000 000 We can find which currency.


sqlQuery("
   SELECT name, currency_id, MAX(goal)
   FROM campaign;
      ")
##       name currency_id MAX(goal)
## 1 Ragdolls           2     1e+08

There are five projects that had a goal of 1e+08 (currency id is 2).mm

sqlQuery("
   SELECT name
   FROM currency
   WHERE id =  2
      ")
##   name
## 1  USD

Therefore, the maximum goal was 1e+08 USD. This query will be easier once we join tables. The goal 0.75 might be mistake?

sqlQuery("
   SELECT name, currency_id, MIN(pledged)
   FROM campaign;
      ")
##       name currency_id MIN(pledged)
## 1 Ragdolls           2            0

There are some money for this project. it might be not a mistake

Similarly, we can find the maximum and the minimum of pledged.

sqlQuery("
    SELECT MIN(pledged) as min_pledged, 
           MAX(pledged) as max_pledged
    FROM campaign;
      ")
##   min_pledged max_pledged
## 1           0     5408917

There is project that did not get any money. The maximum pledged is 5408917.


Category table

sqlQuery("
   SELECT *
   FROM category
         ")
##    id         name
## 1   1   Publishing
## 2   2 Film & Video
## 3   3        Music
## 4   4         Food
## 5   5       Design
## 6   6       Crafts
## 7   7        Games
## 8   8       Comics
## 9   9      Fashion
## 10 10      Theater
## 11 11          Art
## 12 12  Photography
## 13 13   Technology
## 14 14        Dance
## 15 15   Journalism

There are 15 categories.


SubCategory table


sqlQuery("
   SELECT *
   FROM sub_category
         ")
##      id               name category_id
## 1     1             Poetry           1
## 2     2     Narrative Film           2
## 3     3              Music           3
## 4     4       Film & Video           2
## 5     5        Restaurants           4
## 6     6               Food           4
## 7     7             Drinks           4
## 8     8     Product Design           5
## 9     9        Documentary           2
## 10   10         Nonfiction           1
## 11   11         Indie Rock           3
## 12   12             Crafts           6
## 13   13              Games           7
## 14   14     Tabletop Games           7
## 15   15             Design           5
## 16   16        Comic Books           8
## 17   17          Art Books           1
## 18   18            Fashion           9
## 19   19      Childrenswear           9
## 20   20            Theater          10
## 21   21             Comics           8
## 22   22                DIY           6
## 23   23          Webseries           2
## 24   24          Animation           2
## 25   25        Food Trucks           4
## 26   26         Public Art          11
## 27   27       Illustration          11
## 28   28        Photography          12
## 29   29                Pop           3
## 30   30             People          12
## 31   31                Art          11
## 32   32             Family           2
## 33   33            Fiction           1
## 34   34        Accessories           9
## 35   35               Rock           3
## 36   36           Hardware          13
## 37   37           Software          13
## 38   38            Weaving           6
## 39   39            Gadgets          13
## 40   40                Web          13
## 41   41               Jazz           3
## 42   42      Ready-to-wear           9
## 43   43          Festivals          10
## 44   44        Video Games           7
## 45   45        Anthologies           8
## 46   46         Publishing           1
## 47   47             Shorts           2
## 48   48   Electronic Music           3
## 49   49   Radio & Podcasts           1
## 50   50               Apps          13
## 51   51          Cookbooks           4
## 52   52            Apparel           9
## 53   53              Metal           3
## 54   54             Comedy           2
## 55   55            Hip-Hop           3
## 56   56        Periodicals           1
## 57   57              Dance          14
## 58   58         Technology          13
## 59   59           Painting          11
## 60   60        World Music           3
## 61   61         Photobooks          12
## 62   62              Drama           2
## 63   63       Architecture           5
## 64   64        Young Adult           1
## 65   65              Latin           3
## 66   66       Mobile Games           7
## 67   67             Flight          13
## 68   68           Fine Art          12
## 69   69             Action           2
## 70   70      Playing Cards           7
## 71   71        Makerspaces          13
## 72   72               Punk           3
## 73   73          Thrillers           2
## 74   74   Children's Books           1
## 75   75              Audio          15
## 76   76    Performance Art          11
## 77   77           Ceramics          11
## 78   78              Vegan           4
## 79   79     Graphic Novels           8
## 80   80  Fabrication Tools          13
## 81   81       Performances          14
## 82   82          Sculpture          11
## 83   83              Sound          13
## 84   84         Stationery           6
## 85   85              Print          15
## 86   86   Farmer's Markets           4
## 87   87             Events           4
## 88   88    Classical Music           3
## 89   89     Graphic Design           5
## 90   90             Spaces           4
## 91   91     Country & Folk           3
## 92   92          Wearables          13
## 93   93        Mixed Media          11
## 94   94         Journalism          15
## 95   95     Movie Theaters           2
## 96   96            Animals          12
## 97   97        Digital Art          11
## 98   98             Horror           2
## 99   99           Knitting           6
## 100 100        Small Batch           4
## 101 101      Installations          11
## 102 102  Community Gardens           4
## 103 103    DIY Electronics          13
## 104 104         Embroidery           6
## 105 105   Camera Equipment          13
## 106 106            Jewelry           9
## 107 107              Farms           4
## 108 108     Conceptual Art          11
## 109 109            Fantasy           2
## 110 110          Webcomics           8
## 111 111       Experimental          10
## 112 112    Science Fiction           2
## 113 113            Puzzles           7
## 114 114                R&B           3
## 115 115       Music Videos           2
## 116 116          Calendars           1
## 117 117              Video          15
## 118 118              Plays          10
## 119 119              Blues           3
## 120 120              Bacon           4
## 121 121              Faith           3
## 122 122         Live Games           7
## 123 123        Woodworking           6
## 124 124             Places          12
## 125 125           Footwear           9
## 126 126        3D Printing          13
## 127 127           Academic           1
## 128 128              Zines           1
## 129 129            Musical          10
## 130 130          Workshops          14
## 131 131              Photo          15
## 132 132          Immersive          10
## 133 133        Letterpress           6
## 134 134    Gaming Hardware           7
## 135 135            Candles           6
## 136 136         Television           2
## 137 137  Space Exploration          13
## 138 138            Couture           9
## 139 139             Nature          12
## 140 140             Robots          13
## 141 141         Typography           5
## 142 142            Crochet           6
## 143 143       Translations           1
## 144 144           Textiles          11
## 145 145            Pottery           6
## 146 146 Interactive Design           5
## 147 147          Video Art          11
## 148 148             Quilts           6
## 149 149              Glass           6
## 150 150        Pet Fashion           9
## 151 151           Printing           6
## 152 152            Romance           2
## 153 153       Civic Design           5
## 154 154               Kids           3
## 155 155  Literary Journals           1
## 156 156          Taxidermy           6
## 157 157    Literary Spaces           1
## 158 158           Chiptune           3
## 159 159        Residencies          14

There are 159 subcategories.


(To be continued)



Recent Posts

See All

Comments


bottom of page