<ani>
How does IPFS deal with mis-aligned files and data-reuse? For example, if I have two identical text files and add the letter 'a' to the beginning of one of them, can IPFS intelligently shift the chunks or does it just move everything?
<Mateon1>
Well, with rabin chunking, yes, but that's not the default. By default, IPFS just splits every 256k bytes
<ani>
Mateon1: could it be enabled?
<ani>
Aside from performance, are there any trade-offs?
<Mateon1>
Yes, when adding a file, you can provide the chunker to use with `ipfs add -s rabin-65536` or rabin-16384-65536-524288. So, rabin-[avg] and rabin-[min]-[avg]-[max]
<Mateon1>
The sizes are in bytes
<Mateon1>
I'm pretty sure you can also just do -s rabin, but I'm not sure what the default is
<Mateon1>
I'm not aware of many tradeoffs. It can theoretically reduce the deduplication factor, but on average it should be better
matoro has quit [Ping timeout: 248 seconds]
<Mateon1>
Okay, -s rabin uses 256k as the average
<SchrodingersScat>
sounds fancy
<SchrodingersScat>
IPFS employs a clever chunking strategy: by default it uses Rabin Fingerprint chunking
Oatmeal has quit [Read error: Connection reset by peer]
HostFat__ has quit [Read error: Connection reset by peer]
HostFat_ has joined #ipfs
<Kubuxu>
SchrodingersScat: no, by default it is flat 256KiB cunks
Oatmeal has joined #ipfs
<Kubuxu>
s/cunks/chunks
matoro has joined #ipfs
kvda has joined #ipfs
Oatmeal has quit [Quit: Suzie says, "TTFNs!"]
fleeky_ has joined #ipfs
<ani>
Wow, this is cool. Mateon1: /ipfs/QmZYoGqrVNCQEkkwSZpUJEbfX85Sk34nnvMhvg1MKUBb23 includes the entire works of shakespeare. /ipfs/QmPJsDAZdgsxK5pmvkkfffcFKQXCZYS1irzhoZsyxkY6nN is the same thing with a nice prefix.
<ani>
Notice the only difference are the first two blocks.
fleeky has quit [Ping timeout: 240 seconds]
<SchrodingersScat>
Kubuxu: my mother was a cunk
<SchrodingersScat>
Kubuxu: so you're saying it doesn't use Rabin by default?
<Kubuxu>
yes
<ani>
I think it should. It just effectively deduped files with different offsets.
<ani>
That's insane.
<ani>
I'm going to run some larger tests (20 gigs or so) on Climate Data. See how much we can dedupe :)
<ani>
Is there a good way to figure out how much it de-duped? Count repeats on the DAG graph?
<Kubuxu>
as everything it has its price (longer add times).
<Kubuxu>
yeah, count repeated blocks in the DAG and multiple (repeated-1)*size to get de-duped bytes
voldyman has quit [Quit: Connection closed for inactivity]
<jbenet>
ani: we can make special purpose (much more sophisticated) chunkers for large data sets. it does not have to be cunked by the basic chunker
<jbenet>
that would be useful, valuable work to do now
<ani>
jbenet: is there presently a better one?
<ani>
Also, is it based off of the /.ipfs folder or just the chunks of the file?
<jbenet>
not that i know of. this kind of thing is likely to be dataset dependent. it will require a bit of thought, but i expect it to be about 50-200 LOC
<jbenet>
and it should build on filestore and ipfs-pack
<SchrodingersScat>
so ipfs-pack doesn't require space in .ipfs/
<ani>
Last I recall it was for .torrent files in IPFS basically.
<ani>
Wait no
<ani>
you're right
Caterpillar has quit [Quit: You were not made to live as brutes, but to follow virtue and knowledge.]
<SchrodingersScat>
I'm normally a pessimist, when I'm right it's time to duck and cover :(
stevenaleach has quit [Remote host closed the connection]
arkimedes has joined #ipfs
ygrek_ has joined #ipfs
arkimedes has quit [Ping timeout: 240 seconds]
arkimedes has joined #ipfs
<ani>
IPFS-pack isn't building for me
palkeo has quit [Quit: Konversation terminated!]
<kpcyrd>
Kubuxu: hm, wouldn't that break de-dup when two independend people add the same file in different ways?
<Kubuxu>
it might
<Kubuxu>
I am not 100% sure how it works right now
<Kubuxu>
ah yeah, it would
<kythyria[m]>
It's unfortunate that for video at multiple quality levels, you probably need a custom _de_chunker too.
ygrek_ has quit [Ping timeout: 276 seconds]
Guest27309 has quit [Ping timeout: 240 seconds]
<kythyria[m]>
(though in that case at least some players can do the reassembly themselves, because you end up with something a lot like a DASH or HLS segmenter)
Aranjedeath has quit [Quit: Three sheets to the wind]
kvda has joined #ipfs
tmg has joined #ipfs
brianhoffman_ has joined #ipfs
brianhoffman has quit [Ping timeout: 276 seconds]
brianhoffman_ is now known as brianhoffman
kvda has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
Foxcool__ has quit [Ping timeout: 255 seconds]
matoro has quit [Remote host closed the connection]
matoro has joined #ipfs
Foxcool__ has joined #ipfs
tclass has joined #ipfs
tclass has quit [Remote host closed the connection]
dryajov has joined #ipfs
SuperPhly has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
arkimedes has quit [Quit: Leaving]
dignifiedquire has quit [Quit: Connection closed for inactivity]
qgnox has joined #ipfs
dignifiedquire has joined #ipfs
qgnox has quit [Quit: Leaving]
ygrek has quit [Ping timeout: 240 seconds]
Caterpillar has joined #ipfs
kvda has joined #ipfs
Guest92470 has joined #ipfs
bastianilso has joined #ipfs
taaeem has joined #ipfs
taaeem has quit [Quit: Leaving]
Foxcool__ has quit [Ping timeout: 245 seconds]
dryajov1 has joined #ipfs
dryajov1 has quit [Client Quit]
A124 has quit [Ping timeout: 255 seconds]
dryajov1 has joined #ipfs
dryajov1 has quit [Client Quit]
maxlath has joined #ipfs
taaem has joined #ipfs
taaem has quit [Client Quit]
dryajov has quit [Read error: Connection reset by peer]
dryajov has joined #ipfs
dryajov1 has joined #ipfs
dryajov1 has quit [Client Quit]
dryajov1 has joined #ipfs
dryajov1 has quit [Client Quit]
A124 has joined #ipfs
cemerick has joined #ipfs
dryajov1 has joined #ipfs
dryajov1 has quit [Client Quit]
Foxcool has joined #ipfs
tonybanana has joined #ipfs
dryajov1 has joined #ipfs
dryajov1 has quit [Client Quit]
dryajov1 has joined #ipfs
dryajov1 has quit [Client Quit]
maxlath has quit [Quit: maxlath]
Guest27309 has joined #ipfs
ianopolous_ has quit [Ping timeout: 255 seconds]
Encrypt has joined #ipfs
mildred_ has quit [Read error: Connection reset by peer]
Encrypt has quit [Quit: Quit]
mildred1 has joined #ipfs
mildred1 has quit [Ping timeout: 255 seconds]
maxlath has joined #ipfs
aeternity has joined #ipfs
maxlath has quit [Ping timeout: 255 seconds]
tonybanana has quit [Remote host closed the connection]
maxlath has joined #ipfs
palkeo has joined #ipfs
maxlath has quit [Ping timeout: 255 seconds]
<dignifiedquire>
kumavis: just published rust-cid
gde33 has joined #ipfs
aeternity has quit [Ping timeout: 240 seconds]
aeternity has joined #ipfs
Guest92470 has quit [Remote host closed the connection]
dryajov has quit [Read error: Connection reset by peer]
dryajov has joined #ipfs
maxlath has joined #ipfs
cemerick has quit [Ping timeout: 248 seconds]
Foxcool has quit [Ping timeout: 245 seconds]
cemerick has joined #ipfs
aeternity has quit [Ping timeout: 240 seconds]
aeternity has joined #ipfs
Encrypt has joined #ipfs
aeternity has quit [Ping timeout: 276 seconds]
aeternity has joined #ipfs
iovoid has quit [Quit: Iovoid has quit!]
iovoid has joined #ipfs
maxlath has quit [Ping timeout: 240 seconds]
dryajov has quit [Read error: Connection reset by peer]
dryajov has joined #ipfs
mildred has joined #ipfs
kvda has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
kvda has joined #ipfs
aeternity has quit [Ping timeout: 255 seconds]
aeternity has joined #ipfs
dryajov has quit [Read error: No route to host]
kvda has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
dryajov has joined #ipfs
cemerick has quit [Ping timeout: 248 seconds]
infinity0_ has joined #ipfs
infinity0_ has quit [Changing host]
infinity0_ has joined #ipfs
infinity0 is now known as Guest54123
Guest54123 has quit [Killed (hobana.freenode.net (Nickname regained by services))]
infinity0_ is now known as infinity0
infinity0 has quit [Remote host closed the connection]
cemerick has joined #ipfs
tmg has quit [Ping timeout: 240 seconds]
maxlath has joined #ipfs
Encrypt has quit [Quit: Quit]
espadrine_ has joined #ipfs
cemerick has quit [Ping timeout: 248 seconds]
<lgierth>
intereseting
<lgierth>
if i wget --mirror a page with plenty of pdfs linked like file.php?q=something, the gateway can't deal with it
<lgierth>
it cuts off the ?query and then can't find the links :)
uktjames has joined #ipfs
<uktjames>
Hey If I want to implement encryption in IPFS such that, only if someone gets the decryption key from me, can he view the actual document. What would be a good way of doing this?
<lgierth>
encrypt the files before adding them
<uktjames>
encrypting the document pre-uploading via ipfs api? Is there any working example for this somewhere?
<uktjames>
then I would need to have a central db, that stores private key vs the ipfs hash of the encrypted file
<Mateon1>
lgierth: Yeah, wget isn't meant for archiving things like that. Either use tools people at archive.org use (let me look for some links), or use HTTrack like I do for xkcd.com
<lgierth>
ok good call
<lgierth>
i was about to say we should make the gateway not throw away the ?query, which would make things complicated down the road
<lgierth>
i have plans that involve the ?query on the gateway :)
<Mateon1>
Probably should be in an issue on ipfs/archives
<Mateon1>
Or in the readme
<lgierth>
PRs welcome :P
<Mateon1>
Note that these tools create WARC archive files
uktjames has quit [Quit: ChatZilla 0.9.93 [Firefox 51.0.1/20170125172221]]
<Mateon1>
So no dedup unless a specialized IPLD format is written (like for Ethereum blocks)
<Mateon1>
I wonder if it's posssible to unpack a WARC to be usable
Boomerang has joined #ipfs
bastianilso has quit [Ping timeout: 255 seconds]
onabreak_ has joined #ipfs
onabreak has quit [Ping timeout: 260 seconds]
<victorbjelkholm>
lgierth: what about calling our jenkins workers proletarians instead?
Boomerang has quit [Ping timeout: 240 seconds]
Boomerang has joined #ipfs
bastianilso has joined #ipfs
<Kubuxu>
can we just say screw PC in this case?
<kythyria[m]>
Mateon1: Doesn't WARC contain more information than unuxfs?
pfrazee has joined #ipfs
maxlath has quit [Ping timeout: 260 seconds]
tclass has joined #ipfs
tclass has quit [Ping timeout: 240 seconds]
Encrypt has joined #ipfs
maxlath has joined #ipfs
<Mateon1>
kythyria[m]: I honesly have no idea, as I learned of WARC and related tools this week
<lgierth>
victorbjelkholm: they're not selling their labor, so they're not strictly proletarians :)
<SchrodingersScat>
Mateon1: archiveteam uses WARC
<Mateon1>
SchrodingersScat: Yep, I know
<Mateon1>
Learned of it recently in HN comments
<SchrodingersScat>
the more you know :)
chris613 has joined #ipfs
<kythyria[m]>
warcprox looks pretty interesting.
<kythyria[m]>
Hm, does IPLD have a notion of weak references and the like? I could very much see wanting to override the pinner's idea of what's required, and given that, make it easy to hint "these are page prerequisites, pin them too even if you're being stingy"
wak-work has quit [Ping timeout: 255 seconds]
Mateon3 has joined #ipfs
Mateon1 has quit [Ping timeout: 256 seconds]
Mateon3 is now known as Mateon1
<Mateon1>
As far as I know, (recursive) pinning is not defined for arbitrary IPLD, only for unixfs, I might be wrong though
Caterpillar2 has joined #ipfs
Caterpillar has quit [Quit: You were not made to live as brutes, but to follow virtue and knowledge.]
aeternity has quit [Ping timeout: 258 seconds]
wak-work has joined #ipfs
aeternity has joined #ipfs
aeternity has quit [Ping timeout: 260 seconds]
<lgierth>
it's defined for links
<lgierth>
and links are defined for ipld :)
<lgierth>
the stuff that comes out of ipld's tree()
aeternity has joined #ipfs
aeternity has quit [Ping timeout: 255 seconds]
aeternity has joined #ipfs
Encrypt has quit [Quit: Quit]
dryajov has quit [Ping timeout: 240 seconds]
aeternity has quit [Ping timeout: 240 seconds]
aeternity has joined #ipfs
dryajov has joined #ipfs
dryajov2 has joined #ipfs
dryajov has quit [Ping timeout: 255 seconds]
dryajov2 has quit [Read error: No route to host]
aeternity has quit [Ping timeout: 276 seconds]
Caterpillar has joined #ipfs
aeternity has joined #ipfs
maxlath has quit [Ping timeout: 255 seconds]
dryajov has joined #ipfs
mguentner2 is now known as mguentner
spinnable has joined #ipfs
anewuser_ has quit [Quit: anewuser_]
maxlath has joined #ipfs
anewuser has joined #ipfs
anewuser has quit [Remote host closed the connection]
anewuser has joined #ipfs
aeternity has quit [Ping timeout: 240 seconds]
aeternity has joined #ipfs
espadrine_ has quit [Read error: Connection reset by peer]
kulelu88 has joined #ipfs
cemerick has joined #ipfs
dryajov2 has joined #ipfs
dryajov has quit [Ping timeout: 255 seconds]
spinnable has quit [K-Lined]
SuperPhly has joined #ipfs
G-Ray has joined #ipfs
dryajov2 has quit [Ping timeout: 248 seconds]
dryajov has joined #ipfs
aeternity has quit [Ping timeout: 240 seconds]
aeternity has joined #ipfs
matoro has quit [Ping timeout: 260 seconds]
kulelu88 has quit [Ping timeout: 245 seconds]
anewuser has quit [Read error: Connection reset by peer]
vapid has quit [Remote host closed the connection]
vapid has joined #ipfs
Boomerang has quit [Quit: leaving]
arkimedes has joined #ipfs
tmg has joined #ipfs
Encrypt has joined #ipfs
bastianilso has quit [Quit: bastianilso]
bastianilso has joined #ipfs
mith[m] has joined #ipfs
pfrazee has quit [Remote host closed the connection]
tilgovi has quit [Ping timeout: 255 seconds]
stevenaleach has joined #ipfs
SuperPhly has joined #ipfs
Encrypt has quit [Quit: Quit]
ygrek has joined #ipfs
espadrine_ has quit [Ping timeout: 240 seconds]
ianopolous has joined #ipfs
Boomerang has joined #ipfs
wallacoloo____ has joined #ipfs
tclass has quit [Remote host closed the connection]
arkimedes has quit [Ping timeout: 255 seconds]
neuthral has quit [Quit: leaving]
G-Ray has joined #ipfs
albino_ has joined #ipfs
ianopolous has quit [Ping timeout: 245 seconds]
pfrazee has joined #ipfs
Boomerang has quit [Remote host closed the connection]
matoro has quit [Ping timeout: 276 seconds]
iav has quit []
ygrek has quit [Ping timeout: 255 seconds]
Caterpillar has quit [Quit: You were not made to live as brutes, but to follow virtue and knowledge.]
G-Ray has quit [Quit: G-Ray]
matoro has joined #ipfs
dryajov1 has joined #ipfs
wmoh has joined #ipfs
G-Ray has joined #ipfs
G-Ray has quit [Client Quit]
<hannes[m]>
Is there a way to share a hash with others and let them add (not remove) files/hashes in it?
cemerick has joined #ipfs
ianopolous has joined #ipfs
Boomerang has joined #ipfs
ecloud has quit [Ping timeout: 245 seconds]
ecloud has joined #ipfs
seagreen has quit [Quit: WeeChat 1.6]
SuperPhly has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
ion has quit [Ping timeout: 252 seconds]
cemerick has quit [Ping timeout: 248 seconds]
ani has joined #ipfs
<ani>
Is it possible to limit how much data IPFS will store passively?
<ani>
I.e. enable GC?
<lgierth>
you can't currently impose a hard limit, but yes you can run periodic gc with a watermark
<ani>
Also, why is my 2015 Macbook Pro twice as fast at rubin than a digitalocean droplet with 3x the cores?
<lgierth>
ipfs daemon --enable-gc, combined with the Datastore.* config settings
ion has joined #ipfs
s_kunk has quit [Ping timeout: 248 seconds]
kvda has joined #ipfs
<Voker57>
hannes[m]: I assume by hash you mean ipns which is only 'mutable' directory ipfs has. You can share ipns key but this would give everybody rw access
<Voker57>
otherwise, ipfs hashes are immutable, so if you add/rm files you get different unrelated hash
<stevenaleach>
Empty files in a directory seem to break things.
<stevenaleach>
Creating a directory with a test file in it. Then do touch empty to create an empty file, add the directory recursively, you get... nothing... instead of your directory. delete the empty file and you get a directory with your test files in it.
dryajov has quit [Ping timeout: 240 seconds]
<ani>
`ipfs pin ls` is just haning for me on Ubuntu 16.10
<ani>
no clue why
dryajov has joined #ipfs
<stevenaleach>
Is the no empty files thing a known limitation?