top of page

Using SQL to know which project tags are most frequently found on FLOSS projects

Updated: Feb 11, 2021


library(RMySQL)

For information on FLOSS project (https://en.wikipedia.org/wiki/Free_and_open-source_software)

sqlQuery <- function (query) {
  # creating DB connection object with RMysql package
  DB <- dbConnect(MySQL(), user='root', password = 'AAAAAA', dbname = 'test', host = '127.0.0.1')

  # send Query to obtain result set
  rs <- dbSendQuery(DB, query)

  # get elements from result sets and convert to dataframe
  result <- fetch(rs, -1)

  # close db connection
  dbDisconnect(DB)

  # return the dataframe
  return(result)
}

cons <- dbListConnections(MySQL()) 
for(con in cons) 
  dbDisconnect(con)
sqlQuery("
   SHOW tables;      
         ")
##    Tables_in_test
## 1 fc_project_tags

We can discover the total number of project-tag combinations, keeping in mind that a project can have multiple tags


sqlQuery("SELECT COUNT(*)
         FROM fc_project_tags;")
##   COUNT(*)
## 1   353400

We can calculate the total number of projects

sqlQuery("SELECT count(DISTINCT project_id)
          FROM fc_project_tags;")
##   count(DISTINCT project_id)
## 1                      46510


How many unique items are in our dataset?

sqlQuery("SELECT count(DISTINCT tag_name)
         FROM fc_project_tags;")
##   count(DISTINCT tag_name)
## 1                    11006
sqlQuery("SELECT tag_name, COUNT(project_id)
FROM fc_project_tags
GROUP BY 1 
HAVING COUNT(project_id) >= 2325
ORDER BY 2 DESC;")
##                  tag_name COUNT(project_id)
## 1                     GPL             21182
## 2                   POSIX             16875
## 3                   Linux             16288
## 4                       C             10292
## 5          OS Independent             10180
## 6    Software Development              9619
## 7                Internet              8100
## 8                 Windows              7572
## 9                    Java              6394
## 10                    Web              6267
## 11                English              6000
## 12                    C++              5893
## 13              Libraries              5740
## 14                    PHP              5452
## 15                   Unix              5100
## 16               Mac OS X              4823
## 17             multimedia              4815
## 18         Communications              4450
## 19                   Perl              4244
## 20                 Python              4191
## 21                   LGPL              3525
## 22              Utilities              3298
## 23        Dynamic Content              3199
## 24                  GPLv3              2876
## 25             Networking              2821
## 26 Scientific/Engineering              2679
## 27    Games/Entertainment              2528
## 28                    BSD              2497
## 29    Desktop Environment              2335






Recent Posts

See All

Comentarios


bottom of page