cwahlers has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
dimitarvp has joined #ipfs
jhand has joined #ipfs
dignifiedquire has joined #ipfs
gah111 has quit [Remote host closed the connection]
<AntoineM[m]>
I've got a policy website of material from congress "the people's branch".
<AntoineM[m]>
It's a bit of a complicated publication process (they are not classified, but neither are they distributed publicly by default, aggregating them is as a result an intensive process, further recent legislation suggests that new ones moving forward will be public) this wapo article explains a bit more (ignore the false sense of competition it creates, most of us are posting them for open access)
<AntoineM[m]>
Regardless & as matter of principle I'd like to host these reports from the "people's branch" on IPFS. Which is why I am hoping to get your advice, in one quick ?
<AntoineM[m]>
Should I also aim to upload just the 40,000 PDFs or the site as well (the search feature for www.crsreports.com requires a mongodb)
<SchrodingersScat>
uh oh, are you the next chelsea manning?
<AntoineM[m]>
No they are not classified.
<AntoineM[m]>
As matter of principle I'd like to host these reports from the "people's branch" on IPFS. Which is why I am hoping to get your advice, in one quick ?
<AntoineM[m]>
Should I aim to upload just the 40,000 PDFs or the site as well (the search feature for www.crsreports.com requires a mongodb)
<r0kk3rz>
AntoineM[m]: if search is the main feature of the site, then just upload the pdfs
<AntoineM[m]>
Thanks! Will do
<whyrusleeping>
AntoineM[m]: yeah, add it all to ipfs :) You can open a thread for discussion on https://github.com/ipfs/archives
spacebar_ has joined #ipfs
paroneayea has joined #ipfs
<paroneayea>
hello!
<paroneayea>
I'm trying to understand the IPFS protocol more fully
shepner[m] has joined #ipfs
<paroneayea>
am I right in that the usage of the DAG in IPFS doesn't require that every node basically have access to that whole DAG structure?
<paroneayea>
it's basically retrieveal-on-demand of the DAG nodes and structure?
ylp has quit [Quit: Leaving.]
jungly has quit [Remote host closed the connection]
<lgierth>
paroneayea: that's correct
<paroneayea>
lgierth: thanks :)
espadrine has quit [Ping timeout: 240 seconds]
tilgovi has joined #ipfs
spacebar_ has quit [Quit: spacebar_ pressed ESC]
galois_d_ has joined #ipfs
galois_dmz has quit [Ping timeout: 240 seconds]
spacebar_ has joined #ipfs
rodolf0 has quit [Quit: Leaving]
cwahlers has joined #ipfs
erictapen has quit [Ping timeout: 276 seconds]
erictapen has joined #ipfs
m0ns00n has quit [Remote host closed the connection]
Lymkwi has quit [Quit: "Tell 'em an invisible dude in the sky made the Universe, they'll believe ya. Tell 'em the paint's still wet, they'll always touch it to check."]
<lgierth>
we're looking into opening it up more but no conclusion yet -- it's tricky
pawn has quit [Remote host closed the connection]
<_mak>
if I send a pubsub message to the channe; 'mychannel123' will everyone learn about the existence of this channel or only people who are already listening on the channel can see it?
<vyzo>
_mak: only people who are listening will see it
<vyzo>
but the dht will know the channel exists
<vyzo>
your node will publish a provider record for rendezvous
<_mak>
vyzo: so that's security by obscurity right? I can have private channels by creating a hash that no one knows about
<vyzo>
no, it's not that
<paroneayea>
hm
<vyzo>
i mean it's not for security at all
<vyzo>
it's just not practical to implement it otherwise
<paroneayea>
is filecoin going to split the space in ipfs?
<_mak>
oh, I don't mean that was the intention, I mean that is the side effect
<vyzo>
nah
<vyzo>
there is no security gained
<vyzo>
ther eis a provider record in the DHT
<paroneayea>
or will the existing pseudoledger support going to stay?
<paroneayea>
oops, grammar
<vyzo>
people can find your channel if they know (or guess) its topic
<_mak>
yeah, but if the topic is a random hash of 50 chars
<_mak>
it will be hard for people to find it at random
<vyzo>
sure
<vyzo>
just be aware that there is no expectation of security here
<vyzo>
but if you make random hashes you can have some reasonable privacy
<_mak>
yeah sure, this is just to way to the auth implementation on pubsub
<vyzo>
(but no expectation :)
<_mak>
thanks mate :)
upperdeck has quit [Read error: Connection reset by peer]
droman has joined #ipfs
droman has quit [Remote host closed the connection]
droman has joined #ipfs
droman has quit [Remote host closed the connection]
cblgh has quit [Quit: Lost terminal]
cblgh has joined #ipfs
upperdeck has joined #ipfs
stoopkid_ has joined #ipfs
krs93 has joined #ipfs
jmteoma has joined #ipfs
<paroneayea>
so is the filecoin thing a new thing because it's a cool new opportunity
<paroneayea>
or because the current mechanism is having abuse problems?
<paroneayea>
or both?
m0ns00n has joined #ipfs
<jmteoma>
I love IPFS and have been hacking up tooling ontop of it since April. But TBH I have been having a hard time getting my friends involved. This is important for me because I do not know that many people. For a programmer, like this is the first time I've been on IRC for a year or two. Anyway. I figure I ought to do something with IPFS other than just dreaming up apps etc. So I am mirroring arxiv now on a VM with 100GB stora
<jmteoma>
ge.
jkilpatr has quit [Ping timeout: 248 seconds]
<jmteoma>
I am soon planning to mirror ubuntu packages using a database thing, but that's a ways off. The thing with something like ubuntu packages is people want to search for them by package name, architecture. So I have made this database thing that is more about streams and searching rather than Key-Value storage.
<jmteoma>
And now I am working on a package manager that is more about searching for blogs from metadata keys, than what qx does which is resolving dependencies which are referred to by hash.
<jmteoma>
*searching for blobs from metadata...
<jmteoma>
It is my hobby. Idk, just wanted to share!
<jmteoma>
Question: how long are IPFS pubsub values retained on a peer? Is it like, you publish data and then a peer remembers the value in sequence forever, or just for a while? Or are they totally ephemeral? My system emits them at a configurable interval so I I can cope with ephemeral.
<jmteoma>
I know this sounds like silly question because everyone has an idea of what pubsub is, but it's been gnawing on me because IPFS is all about permanence.
domanic has joined #ipfs
A124 has quit [Quit: '']
keorn has joined #ipfs
nulquen has joined #ipfs
galois_d_ has quit []
<r0kk3rz>
afaik pubsub is totally ephemeral
<r0kk3rz>
if you want to keep things, add the data to a block and pubsub the hash
<Magik6k>
jmteoma, pubsub is not persistent, it's current use is for signalling and massaging other peers
A124 has joined #ipfs
galois_dmz has joined #ipfs
galois_d_ has joined #ipfs
m0ns00n has quit [Quit: quit]
galois_dmz has quit [Ping timeout: 276 seconds]
stavros has joined #ipfs
<stavros>
hello
<stavros>
why is it that starting to pin a large object pegs both my server's CPUs?
<stavros>
shouldn't it be mostly network-bound?
<stavros>
hi daviddias !
<lgierth>
hashing
<lgierth>
it verifies the stuff actually hashes to what it says on the tin
<stavros>
ah
<lgierth>
and a bit of overhead for network encryption
<stavros>
does downloading happen in blocks, like with bittorrent, or in one large chunk/
<lgierth>
blocks
<stavros>
oh good, so it's resumable
<lgierth>
if you do `ipfs refs -r <hash>` you get all the blocks within
<stavros>
oh i see, thank you
<jmteoma>
Thanks Magik6k, r0kk3rz
<stavros>
lgierth: i assume the CPU usage only goes on as long as there is new data to download, right?
<stavros>
ie if some blocks are unavailable, it stops hashing
deltab has quit [Quit: Lost terminal]
<stavros>
can i see the pinning progress somewhere?
citizenErased has quit [Ping timeout: 260 seconds]
<jmteoma>
Briefly, it's designed to be a simple p2p database. No clustering or consensus - I'm leaving complex problems for the good folks at BigchainDB. Focussing on a simpler system that is easy to set up and use.
<lgierth>
stavros: yeah roughly -- there's a bit more activity after that, your node will tell other nodes what it has stored, to facilitate routing
jmteoma has quit [Quit: Ex-Chat]
<stavros>
i see, thanks
<stavros>
lgierth: jeez, this seems really slow, though... the directory on disk is growing at a rate of a few bytes per second
<stavros>
ah, kilobytes
<lgierth>
that indeed seems pretty slow -- there is a few bottlenecks that we have fixes for on the way, but maybe here it's just that your peer is slow?
<stavros>
but why is it pegging my CPU in that case?
<stavros>
shouldn it be very very easy to hash this much data?
keorn has quit [Quit: Page closed]
citizenErased has quit [Ping timeout: 248 seconds]
<stavros>
also, i think my peer is on an AWS machine in the same region, but i'm not sure how many blocks that has pinned
nulquen has quit []
citizenErased has joined #ipfs
<stavros>
ah, the object seems to be done pinning
<aceluck>
jmteoma: Looks interesting!
<stavros>
cpu usage is still through the roof, though
<lgierth>
pegged meaning 1 core?
<stavros>
2
<stavros>
ie all of them :p
<lgierth>
:)
<lgierth>
sorry about that
<lgierth>
it's gonna get better
<lgierth>
what you can do is run with daemon --routing=dhtclient but then it won't serve to other nodes anymore
<stavros>
i hope so, because i've built a pinning service around this and it's kind of disrupiting all the other things on the server :/
<stavros>
ah, no, it's a pinning service, so that's kind of its purpose :/
<lgierth>
got it
<stavros>
is there a way for me to see the pinning progress on something?
<lgierth>
well there's a couple of neat performance improvements, some of which should land in august
<stavros>
ah, fantastic
<aboodman>
lgierth: i'm sure you're probably all over this, but in noms, it turned out that the chunking hash was a bigger problem and bigger thing to optimize than the cryptohash
<lgierth>
stavros: pin add has a --progress field
<stavros>
lgierth: hmm, i'm pinning using the http api, so i was hoping i would be able to trigger that externally (or, even better, programmatically)
edubai____ has quit [Quit: Connection closed for inactivity]
<lgierth>
stavros: ?progress=true i think
<lgierth>
the http api equals the cli
<stavros>
oh, how does that notify of progress? i didn't see it in the docs
<lgierth>
aboodman: yeah :/ optimizations to there too. another tricky issue is having both reads and writes approach O(1) -- on a multi-TB repo adds gets kinda slow :/
citizenErased has quit [Ping timeout: 240 seconds]
<stavros>
oh it shows that on the daemon's cli, hmm
<lgierth>
aboodman: we've also played with and implemented rabin fingerprinting but that was a bit disappointing to be honest
<aboodman>
lgierth: we did a ton of work on making incremental mutations fast in noms. becuse we are trying to be a database so it's much more common to do small mutations than big chunk writes.
<aboodman>
i bet some of that could be relevant to ipfs.
<lgierth>
mmh! Kubuxu whyrusleeping: speak with aboodman about add performance :):)
<aboodman>
like trying to really reduce the number of blocks that must be rewritten for small mutations.
<lgierth>
yeah word
<lgierth>
the problem in big repos right now is that it also checks to see whether it needs to write the block, and that slows it down additionally
<aboodman>
yeah, has checcks
<aboodman>
they kill you
<lgierth>
it's just a flat sharded directory, we have one or two experimental datastores in the works that should improve on that
<aboodman>
you guys should really investigate using nbs
<lgierth>
bug kubuxu about that ;) he's all over the datastores
<lgierth>
nice
<lgierth>
good stuff
<lgierth>
what does garbage collection do? remove stuff based on some per-block flag?
<lgierth>
because GC is something that badly needs improvements too -- right now it's stop-the-world
<aboodman>
yeah i think nbs wold help you in a lot of ways. right now you hold a lock on the datastore, only one process can access it.
<aboodman>
nbs supports concurrent access.
<Kubuxu>
we also have concurrent access, some datastores are marked threadsafe so they are not locked around
<Kubuxu>
how much data did you test nbs with?
<daviddias>
Hi stavros o/
<stavros>
daviddias: looking forward to improvements to ipfs! right now it consumes so many resources i'm going to have to move it to its own server :/
<Kubuxu>
aboodman: nice trick re: Has calls, we use bloom filter as one of the cache layers, initialization is expensive but it is worth it
<aboodman>
re GC - not currently implemented, but we already do periodic compaction, it's easy to imagine extending it to exclude unreachable chunks
bwerthmann has joined #ipfs
<lgierth>
we don't really have a root to go from for pinned blocks, but i suppose that'd not be too hard to add. take the output of `ipfs pin ls` and make an object of it
<aboodman>
re size - we use it regularly with 1TB datasets
<aboodman>
beyond that it goes into S3
<aboodman>
that's another interesting advantage potentially of using NBS ... if you want to run services that interact with IPFS, you can easily persist to S3 (or some other distributed block store)
<aboodman>
i realize that's somewhat heretical, but it might be nice to have as an option
<Kubuxu>
we had go-ds-s3 it might still work
<aboodman>
anyway, i don't really know enough about the internals of IPFS to have a strong opinion. I hope to learn more over time. Don't want to sound like I'm pitching something....
<lgierth>
the biggest non-testing repo we have access to at the moment is ~4 TB
<aboodman>
call it a hunch that it would be useful
<lgierth>
it's appreciated! :)
<Kubuxu>
currently we are working on badger support for go-ipfs
bwerthma1n has joined #ipfs
bwerthmann has quit [Ping timeout: 268 seconds]
<aboodman>
i'm not the main author of nbs ... my cofounder is, so he could speak more indepth. but the thing we realized w/ all the key/value stores is that they have a lot of complexity and expense because of the fact that data is mutable.
<daviddias>
stavros: is that js-ipfs or go-ipfs? (maybe both?)
<aboodman>
if you need to support the fact that the value of a key can change, then you have to track which is the newest version of a key
<aboodman>
nbs takes advantage of the fact that for content-addressed systems, the value at a key can never change
<aboodman>
so its ok if the same value is in two different pages at one time, for example
<stavros>
daviddias: go-ipfs
<aboodman>
also, sorry, last thing lgierth -- we experimented w/ rabin and a bunch of other fingerprinting schemes. we found buzhash was best tradeoff of speed and chunk distribution.