Web lists-archives.com

Re: dump, drop database then merge/aggregate




Ah, ok, if I understand correctly within this context every record in the
one table _should_ have a unique identifier.  Please verify this is the
case, though, if for example the primary key is an auto increment what I'm
going to suggest is not good and Really Bad Things will, not may, happen.

If you want to do this all in MySQL, and IFF the records are ensured to be
*globally unique*, then what I suggested previously would work but isn't
necessary (and is actually dangerous if global record uniqueness is not
definite).  Uou _could_ do a standard mysqldump (use flags to do data only,
no schema) and on the importing server it will insert the records and if
there are duplicates records they will fail. If there is a chance the
records aren't unique, or if you want to be extra super safe (good idea
anyway), you can add triggers on the ingest server to ensure
uniqueness/capture failures and record them in another table for analysis
or perhaps even to immediate data remediation (update key) and do insert.

Now, for me, using triggers or other business-logic-in-database features is
a code smell.  I loath putting business logic in databases as they tend to
be non-portable and are hard to troubleshoot for people behind me that is
expecting to have logic in code.  Since you're having to script this
behavior out anyway, if it were me I would dump the data in the table to
CSV or similar using INSERT INTO OUTFILE rather than mysqldump, ship the
file, and have a small php script on cron or whatever ingest it, allowing
for your business logic for data validate/etc to be done in code (IMO where
it belongs).

S



On Mon, Feb 29, 2016 at 12:12 PM, lejeczek <peljasz@xxxxxxxxxxx> wrote:

> On 29/02/16 16:32, Steven Siebert wrote:
>
>> What level of control do you have on the remote end that is
>> collecting/dumping the data?  Can you specify the command/arguments on how
>> to dump?  Is it possible to turn on binary logging and manually ship the
>> logs rather than shipping the dump, effectively manually doing
>> replication?
>>
> in an overview it's a simple php app, a form of a questionnaire that
> collects user manual input, db backend is similarly simple, just one table.
> Yes I can operate mysqldump command but nothing else, I do not have
> control over mysql config nor processes.
>
> It's one of those cases when for now it's too late and you are only
> thinking - ough... that remote box, if compromised would be good to have
> only a minimal set of data on it.
>
> So I can mysqldump any way it'd be best and I'd have to insert ideally not
> replacing anything, instead aggregating, adding data.
> I think developers took care of uniqueness of the rows, and constructed it
> in conformity with good design practices.
>
> What I'm only guessing is when I lock, dump and remove then insert,
> aggregate could there be problems with keys? And no data loss during
> dump+removal?
>
> thanks for sharing your thoughts.
>
>
>> I agree with others, in general this approach smells like a bad idea.
>> However, updating data from a remote system in batch is quite common,
>> except often it's done at the application level polling things like web
>> services and perhaps some business logic to ensure integrity is
>> maintained.  Attempting to do it within the constructs of the database
>> itself is understandable, but there are risks when not adding that "layer"
>> of logic to ensure state is exactly as you expect it during a merge.
>>
>> At risk of giving you too much rope to hang yourself: if you use mysqldump
>> to dump the database, if you use the --replace flag you'll convert all
>> INSERT statements to REPLACE, which when you merge will update or insert
>> the record, effectively "merging" the data.  This may be one approach you
>> want to look at, but may not be appropriate depending on your specific
>> situation.
>>
>> S
>>
>>
>>
>> On Mon, Feb 29, 2016 at 11:12 AM, lejeczek <peljasz@xxxxxxxxxxx> wrote:
>>
>> On 29/02/16 15:42, Gary Smith wrote:
>>>
>>> On 29/02/2016 15:30, lejeczek wrote:
>>>>
>>>> On 28/02/16 20:50, lejeczek wrote:
>>>>>
>>>>> fellow users, hopefully you experts too, could help...
>>>>>>
>>>>>> ...me to understand how, and what should be the best practice to dump
>>>>>> database, then drop it and merge the dumps..
>>>>>> What I'd like to do is something probably many have done and I wonder
>>>>>> how it's done best.
>>>>>> A box will be dumping a database (maybe? tables if it's better) then
>>>>>> dropping (purging the data) it and on a different system that dump
>>>>>> swill be
>>>>>> inserted/aggregated into the same database.
>>>>>> It reminds me a kind of incremental backup except for the fact that
>>>>>> source data will be dropped/purged on regular basis, but before a
>>>>>> drop, a
>>>>>> dump which later will be used to sort of reconstruct that same
>>>>>> database.
>>>>>>
>>>>>> How do you recommend to do it? I'm guessing trickiest bit might this
>>>>>> reconstruction part, how to merge dumps safely, naturally while
>>>>>> maintaining
>>>>>> consistency & integrity?
>>>>>> Actual syntax, as usually any code examples are, would be best.
>>>>>>
>>>>>> many thanks.
>>>>>>
>>>>>>
>>>>>> I guess dropping a tables is not really what I should even consider -
>>>>>>
>>>>> should I just be deleting everything from tables in order to remove
>>>>> data?
>>>>> And if I was to use dumps of such a database (where data was first
>>>>> cleansed then some data was collected) to merge data again would it
>>>>> work
>>>>> and merge that newly collected data with what's already in the database
>>>>>
>>>>> This sounds like a remarkably reliable way to ensure no data integrity.
>>>> What exactly are you trying to achieve? Would replication be the magic
>>>> word
>>>> you're after?
>>>>
>>>> I realize this all might look rather like a bird fiddling with a worm
>>>>
>>> instead of lion going for quick kill. I replicate wherever I need and
>>> can,
>>> here a have very little control over one end.
>>> On that end with little control there is one simple database, which data
>>> I'll need to be removed on regular basis, before removing I'll be dumping
>>> and I need to use those dumps to add, merge, aggregate data to a database
>>> on the other end, like:
>>> today both databases are mirrored/identical
>>> tonight awkward end will dump then remove all the data, then collect some
>>> and again, dump then remove
>>> and these dumps should reconstruct the database on the other box.
>>>
>>> Pointers on what to pay the attention to, how to test for consistency &
>>> integrity, would be of great help.
>>>
>>>
>>> Gary
>>>
>>>>
>>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe:    http://lists.mysql.com/mysql
>>>
>>>
>>>
>