Bertl is now known as Bertl_zZ
proteusguy has joined #symbiflow
citypw has joined #symbiflow
Bertl_zZ is now known as Bertl
craigo has joined #symbiflow
proteusguy has quit [Ping timeout: 258 seconds]
proteusguy has joined #symbiflow
proteusguy has quit [Ping timeout: 245 seconds]
adjtm has quit [Ping timeout: 245 seconds]
craigo has quit [Ping timeout: 268 seconds]
citypw has quit [Remote host closed the connection]
freemint has joined #symbiflow
adjtm has joined #symbiflow
freemint has quit [Ping timeout: 250 seconds]
freemint has joined #symbiflow
freemint has quit [Remote host closed the connection]
freemint has joined #symbiflow
freemint has quit [Ping timeout: 245 seconds]
freemint has joined #symbiflow
adjtm has quit [Ping timeout: 258 seconds]
adjtm has joined #symbiflow
freemint has quit [Remote host closed the connection]
rvalles has quit [Ping timeout: 252 seconds]
adjtm has quit [Remote host closed the connection]
adjtm has joined #symbiflow
Bertl is now known as Bertl_oO
rvalles has joined #symbiflow
<mithro> duck2 set up some benchmarking for the VPR parsing code at https://github.com/duck2/vpr-rrgraph-benchmark
<tpb> Title: GitHub - duck2/vpr-rrgraph-benchmark (at github.com)
freemint has joined #symbiflow
<litghost> mithro: Assuming I'm reading that correctly, duck2 XML -> 18 seconds, duck2 capnprot -> 12 seconds
<litghost> oph
<litghost> Given that capnproto doesn't really parse a lot, I'm guessing there is a some headroom in the copying for improvment
<litghost> If not, the mmap -> in memory datastructures is the next step
<hzeller[m]> As long as the capnprot data structure is copied to the local data structure, there will still be a lot of overhead. Ideally, we can use the capnproto structs directly; so though that might need an abstraction of the access patterns first.
<litghost> hzeller: I agree there will be some overhead, but 12 seconds seems excessive.
<mithro> litghost / hzeller[m]: duck2 is a good person to ask
<hackerfoo> Assuming the amount read >= peak memory usage, that's >= 160MB/s, which seems reasonable for random-ish access. Averge random 4k reads for my high end SSD are only ~50MB/s.
<hackerfoo> How big is the capnproto rr_graph?
<hackerfoo> I guess it shouldn't be random reads, though.
<duck2> the current copying code is mostly "translated" from the xml reading code. From my past callgrinds, I think the most time is taken by copying the edges, where the gap between the data representations is wide.
<litghost> duck2: Edges are something we should strongly consider specializing
<litghost> duck2: e.g. store the edges in a dense blob of ints/shorts
<hackerfoo> duck2: Can you try putting the rr_graph in a ramdisk?
<litghost> duck2: Because that data is basically just a giant 2D matrix
<duck2> hackerfoo: The cap'n proto graph is ~600MB. I do a warmup run in the benchmark, so the file should be in the cache when measuring 12s
freemint has quit [Read error: Connection reset by peer]
freemint has joined #symbiflow
<duck2> litghost: in the file or in memory? Even if we store the edges as such in the file, we still need to do rr_node::add_edge or vpr::add_edge_metadata which do allocations.
<litghost> duck2: the allocation strategy of the edges is something that should be examined anyways
<litghost> duck2: now might be a good time
<duck2> litghost: is the rr_graph read-only enough to try arena allocation? currently every node manages its small vector-ish of edges. I don't know how to deal with metadata, since it's not a simple vector. however currently every node&edge has a t_metadata_dict of its own, which takes an allocation to create and another allocation to populate
<litghost> > " rr_graph read-only enough to try arena allocation"
<litghost> This is true when reading an rr graph
<litghost> It is completely read-only
<hackerfoo> You could try a simple bump allocator: `void *malloc(size_t size) { void *p = mem; mem += size; return p; }` and `void free(void *p) {}`
adjtm has quit [Ping timeout: 245 seconds]
<hackerfoo> Where `mem` points to a large chunk of preallocated memory, and see how much it speeds it up.
<hackerfoo> You might also want to check for overflow in `malloc`.
<hackerfoo> You can use `mmap` to get a large chunk of RAM, and even resize it. `sbrk` is not recommended.
<duck2> also note that a t_metadata_dict is a std::unordered_map<std::string, std::vector<std::string>> and that's allocated for every edge
<hackerfoo> Yuck, strings.
<duck2> hackerfoo: is it possible to just override malloc like that?
<hackerfoo> duck2: In C, yeah. `malloc` is defined in stdlib.h: https://en.cppreference.com/w/c/memory/malloc
<tpb> Title: malloc - cppreference.com (at en.cppreference.com)
<hackerfoo> There's a way to replace `new` in C++ as well, but I haven't done it.
<litghost> hackerfoo: Overriding malloc like that is a bad idea in C++. duck2: Just replace the allocator for the relevant objects
<litghost> duck2: Also unordered_map is likely overkill, a flat_hash_map is likely superior given that we do not mutate the metadata during runtime
<tpb> Title: new expression - cppreference.com (at en.cppreference.com)
<litghost> duck2: VTR does provide a chunk malloc, which is approximately a bump allocator, but accommodates unbounding arena size
<litghost> unbounded
<duck2> litghost: also would be good if we don't create std::strings for every metadata string, they come as const char *s. can we provide a hash fn for them?
<tpb> Title: Add tests for Xilinx UG901 examples by SergeyDegtyar · Pull Request #1363 · YosysHQ/yosys · GitHub (at github.com)
craigo has joined #symbiflow
<litghost> duck2: Yes, you can define a hash fn, unordered_map has a "class Hash" template parameter that can point to a hash function class thingy
<litghost> duck2: By default it is std::hash<T>, but it can be anything
<hackerfoo> If you mmap a file, you can use char * + size and not allocate anything else. That's what I do for immutable strings. Probably not an option here, though.
<hackerfoo> If you dedup the strings, you can just use the pointer as a hash and for comparisons.
<hackerfoo> And then the strings just become numbers, essentially.
<litghost> And the number of string keys is low (think order ~10), so it might also make sense to do vector<string> -> sort -> unique -> hand out ids
<litghost> e.g. a simple interning schema
<hackerfoo> ^ Yeah, that's more predictable than pointers, and you can put the sizes in the table.
<litghost> To be clear, I'm talking about the key to the std::unordered_map
<litghost> They are keys like "fasm_feature", "fasm_prefix", etc
<litghost> The data is basically line noise
<duck2> wouldn't assuming ~10 unique keys be too specializing to fasm? not that I'm aware of any other use of metadata but...
adjtm has joined #symbiflow
<litghost> duck2: You could do a fallback schema if the string intern map gets too big
<litghost> duck2: E.g. intern up to 100 strings, and then fallback to a straight forward hash
<litghost> duck2: And the key becomes a tagged union, either string intern pointer or std::string
<hackerfoo> duck2: And here's the initial implementation for the fallback: assert(false); /// TODO handle this case
<litghost> duck2: I think it is safe to assume that most uses of metadata will use a limit number of identifying keys
<hackerfoo> Because there's no reason to implement the slow path if the fast path isn't fast enough.
<duck2> litghost: makes sense
lopsided98 has quit [Quit: Disconnected]
lopsided98 has joined #symbiflow
lopsided98 has quit [Client Quit]
lopsided98 has joined #symbiflow
freemint has quit [Quit: Leaving]
freemint has joined #symbiflow
freemint has quit [Ping timeout: 264 seconds]
<mithro> duck2: Post your end year report here please :-)
<tpb> Title: GSoC2019 - SymbiFlow - Final Report - Google Docs (at docs.google.com)
freemint has joined #symbiflow
tpb has quit [Remote host closed the connection]
tpb has joined #symbiflow