xlegoman has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
workmad3 has joined #rubygems
workmad3 has quit [Ping timeout: 248 seconds]
swills has quit [Ping timeout: 255 seconds]
Zaid has joined #rubygems
<Zaid>
hi
<Zaid>
Is there a way to get all the gems that are using github, using the rubygems.org API ?
swills has joined #rubygems
swills has joined #rubygems
<Zaid>
hello
xlegoman has joined #rubygems
workmad3 has joined #rubygems
workmad3 has quit [Ping timeout: 240 seconds]
<havenwood>
Zaid: hi
<havenwood>
Zaid: you want the names of the gems using github for their source?
<Zaid>
the names and the urls
<Zaid>
yes
ur5us has quit [Remote host closed the connection]
ur5us has joined #rubygems
zaolin_ has quit [Ping timeout: 240 seconds]
zaolin has joined #rubygems
ur5us has quit [Ping timeout: 248 seconds]
ur5us has joined #rubygems
Zaid has quit [Ping timeout: 260 seconds]
xlegoman has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
workmad3 has joined #rubygems
sss has joined #rubygems
workmad3 has quit [Ping timeout: 248 seconds]
sss has quit [Client Quit]
Zaid has joined #rubygems
cliles has quit [Ping timeout: 268 seconds]
workmad3 has joined #rubygems
workmad3 has quit [Ping timeout: 248 seconds]
stan has joined #rubygems
<pombreda>
Zaid: easy, collect all the Gems through marshalled index, then call the API for every one of them to get their metadata. Collect when their is a github pointer, typically focusing on the latest version
<pombreda>
many older versions may point to rubyforge, snif, snif ;)
ur5us has quit [Remote host closed the connection]
<pombreda>
Zaid, this is not a lot of code but you want to be polite and throttle when doing this FWIW
ur5us has joined #rubygems
<pombreda>
it may take a few days to complete at a polite pace, to avoid DDoS'ing rubygems.org by mistake
<pombreda>
incidentally, I have a Python script that does just that, but is not slotted for public release until q1 2018
<pombreda>
you can look otherwise at the ruby code in librariesio and versioneye that do this data collection too
ur5us has quit [Ping timeout: 260 seconds]
<pombreda>
Zaid: out of curiosity, what's the purpose of all this?
<Zaid>
pombreda : I used the psql dump to get all the gems, but I'm trying to associate the maximum with their github projects
<Zaid>
(a research study)
<Zaid>
As ur5us said, their is some project pointing to rubyforge, others are pointing to their websites ...
<pombreda>
Zaid, you could also correlate this the whole set of GH repos, focusing on the gemspec files only
<pombreda>
e.g. you collect most gemfiles and parse them.
<pombreda>
then you cross ref that with the data collected above from rubygems
<pombreda>
where they intersect, you have your data point
<pombreda>
Zaid, all the gemspec can be fetched from the gh data set on big query ;)
<pombreda>
so easy too, but lots of data and moving parts!
ur5us has joined #rubygems
<Zaid>
does it need to go through all projects ?
<Zaid>
to get the gems projects
<pombreda>
sure thing
<pombreda>
Zaid: or you could focus on repos that have Ruby as a primary language only... but taking all repos with a gemspec at thr root is easier and safer
<pombreda>
Zaid, what the research study about, if I may?
<Zaid>
about socio-technical factors impacting software ecosystems
<pombreda>
Zaid, that's a mouthful :P but interesting!
<pombreda>
Zaid, homepage? which uni?
<pombreda>
I maintain projects at http://www.aboutcode.org/ which are all about floss scanning and discovery hence my interest
<Zaid>
polytechnique montreal
<Zaid>
ah ok, thank you for your advices
<pombreda>
Zaid, please feel free to keep me posted!