#symbiflow on 2019-09-09 — irc logs at freenode.irclog.whitequark.org

01:40 Bertl is now known as Bertl_zZ

05:37 proteusguy has joined #symbiflow

09:19 citypw has joined #symbiflow

09:19 Bertl_zZ is now known as Bertl

09:33 craigo has joined #symbiflow

10:04 proteusguy has quit [Ping timeout: 258 seconds]

10:17 proteusguy has joined #symbiflow

11:10 proteusguy has quit [Ping timeout: 245 seconds]

11:29 adjtm has quit [Ping timeout: 245 seconds]

11:38 craigo has quit [Ping timeout: 268 seconds]

11:50 citypw has quit [Remote host closed the connection]

12:00 freemint has joined #symbiflow

12:03 adjtm has joined #symbiflow

12:23 freemint has quit [Ping timeout: 250 seconds]

12:23 freemint has joined #symbiflow

12:27 freemint has quit [Remote host closed the connection]

12:27 freemint has joined #symbiflow

13:19 freemint has quit [Ping timeout: 245 seconds]

13:22 freemint has joined #symbiflow

13:35 adjtm has quit [Ping timeout: 258 seconds]

13:46 adjtm has joined #symbiflow

13:57 freemint has quit [Remote host closed the connection]

14:14 rvalles has quit [Ping timeout: 252 seconds]

15:10 adjtm has quit [Remote host closed the connection]

15:10 adjtm has joined #symbiflow

15:13 Bertl is now known as Bertl_oO

15:26 rvalles has joined #symbiflow

15:39 <mithro> duck2 set up some benchmarking for the VPR parsing code at https://github.com/duck2/vpr-rrgraph-benchmark

15:39 <tpb> Title: GitHub - duck2/vpr-rrgraph-benchmark (at github.com)

16:06 freemint has joined #symbiflow

16:14 <litghost> mithro: Assuming I'm reading that correctly, duck2 XML -> 18 seconds, duck2 capnprot -> 12 seconds

16:15 <litghost> oph

16:21 <litghost> Given that capnproto doesn't really parse a lot, I'm guessing there is a some headroom in the copying for improvment

16:21 <litghost> If not, the mmap -> in memory datastructures is the next step

16:41 <hzeller[m]> As long as the capnprot data structure is copied to the local data structure, there will still be a lot of overhead. Ideally, we can use the capnproto structs directly; so though that might need an abstraction of the access patterns first.

16:42 <litghost> hzeller: I agree there will be some overhead, but 12 seconds seems excessive.

16:46 <mithro> litghost / hzeller[m]: duck2 is a good person to ask

16:51 <hackerfoo> Assuming the amount read >= peak memory usage, that's >= 160MB/s, which seems reasonable for random-ish access. Averge random 4k reads for my high end SSD are only ~50MB/s.

16:52 <hackerfoo> How big is the capnproto rr_graph?

16:53 <hackerfoo> I guess it shouldn't be random reads, though.

16:53 <duck2> the current copying code is mostly "translated" from the xml reading code. From my past callgrinds, I think the most time is taken by copying the edges, where the gap between the data representations is wide.

16:54 <litghost> duck2: Edges are something we should strongly consider specializing

16:54 <litghost> duck2: e.g. store the edges in a dense blob of ints/shorts

16:54 <hackerfoo> duck2: Can you try putting the rr_graph in a ramdisk?

16:54 <litghost> duck2: Because that data is basically just a giant 2D matrix

16:56 <duck2> hackerfoo: The cap'n proto graph is ~600MB. I do a warmup run in the benchmark, so the file should be in the cache when measuring 12s

17:05 freemint has quit [Read error: Connection reset by peer]

17:08 freemint has joined #symbiflow

17:35 <duck2> litghost: in the file or in memory? Even if we store the edges as such in the file, we still need to do rr_node::add_edge or vpr::add_edge_metadata which do allocations.

17:42 <litghost> duck2: the allocation strategy of the edges is something that should be examined anyways

17:42 <litghost> duck2: now might be a good time

17:50 <duck2> litghost: is the rr_graph read-only enough to try arena allocation? currently every node manages its small vector-ish of edges. I don't know how to deal with metadata, since it's not a simple vector. however currently every node&edge has a t_metadata_dict of its own, which takes an allocation to create and another allocation to populate

17:50 <litghost> > " rr_graph read-only enough to try arena allocation"

17:50 <litghost> This is true when reading an rr graph

17:50 <litghost> It is completely read-only

17:57 <hackerfoo> You could try a simple bump allocator: `void *malloc(size_t size) { void *p = mem; mem += size; return p; }` and `void free(void *p) {}`

17:57 adjtm has quit [Ping timeout: 245 seconds]

17:58 <hackerfoo> Where `mem` points to a large chunk of preallocated memory, and see how much it speeds it up.

17:59 <hackerfoo> You might also want to check for overflow in `malloc`.

18:03 <hackerfoo> You can use `mmap` to get a large chunk of RAM, and even resize it. `sbrk` is not recommended.

18:05 <duck2> also note that a t_metadata_dict is a std::unordered_map<std::string, std::vector<std::string>> and that's allocated for every edge

18:06 <hackerfoo> Yuck, strings.

18:06 <duck2> hackerfoo: is it possible to just override malloc like that?

18:08 <hackerfoo> duck2: In C, yeah. `malloc` is defined in stdlib.h: https://en.cppreference.com/w/c/memory/malloc

18:08 <tpb> Title: malloc - cppreference.com (at en.cppreference.com)

18:09 <hackerfoo> There's a way to replace `new` in C++ as well, but I haven't done it.

18:09 <litghost> hackerfoo: Overriding malloc like that is a bad idea in C++. duck2: Just replace the allocator for the relevant objects

18:10 <litghost> duck2: Also unordered_map is likely overkill, a flat_hash_map is likely superior given that we do not mutate the metadata during runtime

18:11 <hackerfoo> https://en.cppreference.com/w/cpp/language/new#Allocation

18:11 <tpb> Title: new expression - cppreference.com (at en.cppreference.com)

18:11 <litghost> duck2: VTR does provide a chunk malloc, which is approximately a bump allocator, but accommodates unbounding arena size

18:12 <litghost> unbounded

18:12 <duck2> litghost: also would be good if we don't create std::strings for every metadata string, they come as const char *s. can we provide a hash fn for them?

18:13 <mithro> https://github.com/YosysHQ/yosys/pull/1363

18:13 <tpb> Title: Add tests for Xilinx UG901 examples by SergeyDegtyar · Pull Request #1363 · YosysHQ/yosys · GitHub (at github.com)

18:14 craigo has joined #symbiflow

18:15 <litghost> duck2: Yes, you can define a hash fn, unordered_map has a "class Hash" template parameter that can point to a hash function class thingy

18:15 <litghost> duck2: By default it is std::hash<T>, but it can be anything

18:16 <hackerfoo> If you mmap a file, you can use char * + size and not allocate anything else. That's what I do for immutable strings. Probably not an option here, though.

18:17 <hackerfoo> If you dedup the strings, you can just use the pointer as a hash and for comparisons.

18:18 <hackerfoo> And then the strings just become numbers, essentially.

18:18 <litghost> And the number of string keys is low (think order ~10), so it might also make sense to do vector<string> -> sort -> unique -> hand out ids

18:18 <litghost> e.g. a simple interning schema

18:19 <hackerfoo> ^ Yeah, that's more predictable than pointers, and you can put the sizes in the table.

18:21 <litghost> To be clear, I'm talking about the key to the std::unordered_map

18:21 <litghost> They are keys like "fasm_feature", "fasm_prefix", etc

18:22 <litghost> The data is basically line noise

18:24 <duck2> wouldn't assuming ~10 unique keys be too specializing to fasm? not that I'm aware of any other use of metadata but...

18:25 adjtm has joined #symbiflow

18:26 <litghost> duck2: You could do a fallback schema if the string intern map gets too big

18:27 <litghost> duck2: E.g. intern up to 100 strings, and then fallback to a straight forward hash

18:27 <litghost> duck2: And the key becomes a tagged union, either string intern pointer or std::string

18:28 <hackerfoo> duck2: And here's the initial implementation for the fallback: assert(false); /// TODO handle this case

18:28 <litghost> duck2: I think it is safe to assume that most uses of metadata will use a limit number of identifying keys

18:28 <hackerfoo> Because there's no reason to implement the slow path if the fast path isn't fast enough.

18:31 <duck2> litghost: makes sense

18:40 lopsided98 has quit [Quit: Disconnected]

18:42 lopsided98 has joined #symbiflow

18:47 lopsided98 has quit [Client Quit]

18:48 lopsided98 has joined #symbiflow

19:26 freemint has quit [Quit: Leaving]

21:22 freemint has joined #symbiflow

21:55 freemint has quit [Ping timeout: 264 seconds]

22:12 <mithro> duck2: Post your end year report here please :-)

22:33 <duck2> eh, apparently I didn't post before. here it is: https://docs.google.com/document/d/1SZm44g-9Bo-xD2lDfXcxb8bpzYjewGyK7bVyOYj5Gqs/edit?usp=sharing

22:33 <tpb> Title: GSoC2019 - SymbiFlow - Final Report - Google Docs (at docs.google.com)

22:38 freemint has joined #symbiflow

23:50 tpb has quit [Remote host closed the connection]

23:50 tpb has joined #symbiflow