I would import it as one and then split in a subsequent step if I wanted to index on either of those two.
doing simpler example, lets see
just on several records
Usually it is a good idea to have a statistics object when handling large imports, a simple class with a couple of fields that get increased at every finished operation, like a row imported or changed object.
That way you can check in another REPL how far it has come.
not a proble for now
problem is date, time, total_amount :)
If there are no relations, you don't need a DB. A flat file would do
and this is my first attempt
I would start with a model covering the entire row and then copy over to another class of objects what I particularly want.
and how handle if i want sum dolars in jul 2017?
or did you mean, that just the *data* have no relations between them?
Using '(match '(@FirstField "," @SecondField ... allows for easy validation during import if that is needed.
But in pil you make indexes?
i meant no relations between them
ok, good
only ranges of something
I would (split (line) " " ",")
and make index on date and time etc
maybe i want sum amount in jul 2017 where was pickup in airport
yes, good example
Good morning Regenaxer, tankf33der, cess11
For larger data sets pilog is better than 'collect'
using select for more involved filtering
i found +Time, perfect
i thought only +Date exists
xificurC has joined #picolisp
often it is useful to index them together using +Aux, e.g. (rel date (+Aux +Ref +Date) (time)) (rel time (+Time))
querying is a bit tricky, using (aux) instead of (db)
but allows date time ranges then
ok, i will try to remember all this, but here i want split date and time
will be different queries
(rel date (+Ref +Date))
i found i want non uniq index
it's still two properties, and you could also index them separately
sounds good
with separate indexing, you can also make queries to see e.g. which time of the day most rides happen (independent of date, only working with time)
I mean when you have (+Ref +Date) and (+Ref +Time)
i will do it like this, yes
i could draw a histogram :)
what about amount of dollars ?
To extract the date for example, you can use ($dat "2017-01-09" "-")
(rel amount (+Number) 2)
ok then
i ready to import then
Converting to cents might be a good idea.
tankf33der: dont forget to (commit) !
one more thing about pool and size
In the beginning you can use the default (single file with size 2)
ie no args to 'pool'
and when i will import all years ?
Later you can more easily estimate the sizes
i will ask
when import a year
1 month ~9M records
yes, then sizes get important
And import best done in single-user standalone