<cess11>
I would import it as one and then split in a subsequent step if I wanted to index on either of those two.
<tankf33der>
doing simpler example, lets see
<tankf33der>
just on several records
<cess11>
Usually it is a good idea to have a statistics object when handling large imports, a simple class with a couple of fields that get increased at every finished operation, like a row imported or changed object.
<cess11>
That way you can check in another REPL how far it has come.
<tankf33der>
not a proble for now
<tankf33der>
problem is date, time, total_amount :)
<Regenaxer>
If there are no relations, you don't need a DB. A flat file would do
<tankf33der>
and this is my first attempt
<Regenaxer>
good
<cess11>
I would start with a model covering the entire row and then copy over to another class of objects what I particularly want.
<tankf33der>
and how handle if i want sum dolars in jul 2017?
<Regenaxer>
or did you mean, that just the *data* have no relations between them?
<cess11>
Using '(match '(@FirstField "," @SecondField ... allows for easy validation during import if that is needed.
<Regenaxer>
But in pil you make indexes?
<tankf33der>
i meant no relations between them
<Regenaxer>
ok, good
<tankf33der>
only ranges of something
<Regenaxer>
I would (split (line) " " ",")
<Regenaxer>
and make index on date and time etc
<tankf33der>
maybe i want sum amount in jul 2017 where was pickup in airport
<tankf33der>
ok
<Regenaxer>
yes, good example
<beneroth>
Good morning Regenaxer, tankf33der, cess11
<Regenaxer>
For larger data sets pilog is better than 'collect'
<Regenaxer>
using select for more involved filtering
<tankf33der>
ha
<tankf33der>
i found +Time, perfect
<tankf33der>
i thought only +Date exists
<beneroth>
:)
xificurC has joined #picolisp
<beneroth>
often it is useful to index them together using +Aux, e.g. (rel date (+Aux +Ref +Date) (time)) (rel time (+Time))
<beneroth>
querying is a bit tricky, using (aux) instead of (db)
<beneroth>
but allows date time ranges then
<tankf33der>
ok, i will try to remember all this, but here i want split date and time
<tankf33der>
will be different queries
<tankf33der>
so
<tankf33der>
(rel date (+Ref +Date))
<tankf33der>
i found i want non uniq index
<tankf33der>
right
<beneroth>
it's still two properties, and you could also index them separately
<tankf33der>
?
<beneroth>
yeah
<beneroth>
sounds good
<tankf33der>
ok
<beneroth>
with separate indexing, you can also make queries to see e.g. which time of the day most rides happen (independent of date, only working with time)
<beneroth>
I mean when you have (+Ref +Date) and (+Ref +Time)
<tankf33der>
i will do it like this, yes
<tankf33der>
i could draw a histogram :)
<tankf33der>
what about amount of dollars ?
<Regenaxer>
To extract the date for example, you can use ($dat "2017-01-09" "-")
<tankf33der>
(rel amount (+Number) 2)
<tankf33der>
?
<Regenaxer>
yes
<tankf33der>
ok then
<tankf33der>
i ready to import then
<cess11>
Converting to cents might be a good idea.
<tankf33der>
tankf33der: dont forget to (commit) !
<tankf33der>
o
<tankf33der>
one more thing about pool and size
<Regenaxer>
In the beginning you can use the default (single file with size 2)
<Regenaxer>
ie no args to 'pool'
<tankf33der>
and when i will import all years ?
<Regenaxer>
Later you can more easily estimate the sizes
<tankf33der>
ok
<tankf33der>
i will ask
<Regenaxer>
yes
<tankf33der>
when import a year
<tankf33der>
1 month ~9M records
<Regenaxer>
yes, then sizes get important
<Regenaxer>
And import best done in single-user standalone