<litghost>
mithro: Assuming I'm reading that correctly, duck2 XML -> 18 seconds, duck2 capnprot -> 12 seconds
<litghost>
oph
<litghost>
Given that capnproto doesn't really parse a lot, I'm guessing there is a some headroom in the copying for improvment
<litghost>
If not, the mmap -> in memory datastructures is the next step
<hzeller[m]>
As long as the capnprot data structure is copied to the local data structure, there will still be a lot of overhead. Ideally, we can use the capnproto structs directly; so though that might need an abstraction of the access patterns first.
<litghost>
hzeller: I agree there will be some overhead, but 12 seconds seems excessive.
<mithro>
litghost / hzeller[m]: duck2 is a good person to ask
<hackerfoo>
Assuming the amount read >= peak memory usage, that's >= 160MB/s, which seems reasonable for random-ish access. Averge random 4k reads for my high end SSD are only ~50MB/s.
<hackerfoo>
How big is the capnproto rr_graph?
<hackerfoo>
I guess it shouldn't be random reads, though.
<duck2>
the current copying code is mostly "translated" from the xml reading code. From my past callgrinds, I think the most time is taken by copying the edges, where the gap between the data representations is wide.
<litghost>
duck2: Edges are something we should strongly consider specializing
<litghost>
duck2: e.g. store the edges in a dense blob of ints/shorts
<hackerfoo>
duck2: Can you try putting the rr_graph in a ramdisk?
<litghost>
duck2: Because that data is basically just a giant 2D matrix
<duck2>
hackerfoo: The cap'n proto graph is ~600MB. I do a warmup run in the benchmark, so the file should be in the cache when measuring 12s
freemint has quit [Read error: Connection reset by peer]
freemint has joined #symbiflow
<duck2>
litghost: in the file or in memory? Even if we store the edges as such in the file, we still need to do rr_node::add_edge or vpr::add_edge_metadata which do allocations.
<litghost>
duck2: the allocation strategy of the edges is something that should be examined anyways
<litghost>
duck2: now might be a good time
<duck2>
litghost: is the rr_graph read-only enough to try arena allocation? currently every node manages its small vector-ish of edges. I don't know how to deal with metadata, since it's not a simple vector. however currently every node&edge has a t_metadata_dict of its own, which takes an allocation to create and another allocation to populate
<litghost>
> " rr_graph read-only enough to try arena allocation"
<litghost>
This is true when reading an rr graph
<litghost>
It is completely read-only
<hackerfoo>
You could try a simple bump allocator: `void *malloc(size_t size) { void *p = mem; mem += size; return p; }` and `void free(void *p) {}`
adjtm has quit [Ping timeout: 245 seconds]
<hackerfoo>
Where `mem` points to a large chunk of preallocated memory, and see how much it speeds it up.
<hackerfoo>
You might also want to check for overflow in `malloc`.
<hackerfoo>
You can use `mmap` to get a large chunk of RAM, and even resize it. `sbrk` is not recommended.
<duck2>
also note that a t_metadata_dict is a std::unordered_map<std::string, std::vector<std::string>> and that's allocated for every edge
<hackerfoo>
Yuck, strings.
<duck2>
hackerfoo: is it possible to just override malloc like that?
<tpb>
Title: new expression - cppreference.com (at en.cppreference.com)
<litghost>
duck2: VTR does provide a chunk malloc, which is approximately a bump allocator, but accommodates unbounding arena size
<litghost>
unbounded
<duck2>
litghost: also would be good if we don't create std::strings for every metadata string, they come as const char *s. can we provide a hash fn for them?
<tpb>
Title: Add tests for Xilinx UG901 examples by SergeyDegtyar · Pull Request #1363 · YosysHQ/yosys · GitHub (at github.com)
craigo has joined #symbiflow
<litghost>
duck2: Yes, you can define a hash fn, unordered_map has a "class Hash" template parameter that can point to a hash function class thingy
<litghost>
duck2: By default it is std::hash<T>, but it can be anything
<hackerfoo>
If you mmap a file, you can use char * + size and not allocate anything else. That's what I do for immutable strings. Probably not an option here, though.
<hackerfoo>
If you dedup the strings, you can just use the pointer as a hash and for comparisons.
<hackerfoo>
And then the strings just become numbers, essentially.
<litghost>
And the number of string keys is low (think order ~10), so it might also make sense to do vector<string> -> sort -> unique -> hand out ids
<litghost>
e.g. a simple interning schema
<hackerfoo>
^ Yeah, that's more predictable than pointers, and you can put the sizes in the table.
<litghost>
To be clear, I'm talking about the key to the std::unordered_map
<litghost>
They are keys like "fasm_feature", "fasm_prefix", etc
<litghost>
The data is basically line noise
<duck2>
wouldn't assuming ~10 unique keys be too specializing to fasm? not that I'm aware of any other use of metadata but...
adjtm has joined #symbiflow
<litghost>
duck2: You could do a fallback schema if the string intern map gets too big
<litghost>
duck2: E.g. intern up to 100 strings, and then fallback to a straight forward hash
<litghost>
duck2: And the key becomes a tagged union, either string intern pointer or std::string
<hackerfoo>
duck2: And here's the initial implementation for the fallback: assert(false); /// TODO handle this case
<litghost>
duck2: I think it is safe to assume that most uses of metadata will use a limit number of identifying keys
<hackerfoo>
Because there's no reason to implement the slow path if the fast path isn't fast enough.
<duck2>
litghost: makes sense
lopsided98 has quit [Quit: Disconnected]
lopsided98 has joined #symbiflow
lopsided98 has quit [Client Quit]
lopsided98 has joined #symbiflow
freemint has quit [Quit: Leaving]
freemint has joined #symbiflow
freemint has quit [Ping timeout: 264 seconds]
<mithro>
duck2: Post your end year report here please :-)