Welcome, Guest
Username Password: Remember me

shell script merge two list and remove duplicates
(1 viewing) (1) Guest
Shell Script
  • Page:
  • 1

TOPIC: shell script merge two list and remove duplicates

shell script merge two list and remove duplicates 1 year, 10 months ago #255

  • rajeshkumar
  • OFFLINE
  • Moderator
  • I love software configuration management
  • Posts: 370
  • Points: 44590
  • Karma: 4
  • Honor Medal 2009
You want all the records from list_A supplemented by all the records from list_B for which there is not already a matching name in list A. Mathematically this is:
 
A + B - {w in B | (w,value) in A }
 
\

There are many ways of accomplishing this, depending on access and needed efficiencies.

* If you can modify DB1 (with A), then download table B from DB2, upload it to DB1, then extract your data with the appropriate join
* If you can't modify DB1, then download both A and B and concatenate them to the same stream, with A followed by B. Then sort by the first field. Then process the stream one record at time. Duplicate names will be side-by-side. If the same name appears more than one time, print the first and ignore subsequent records with the same name.

Here is a sample solution to your problem (starting with two lists of names/values)

#!/bin/bash

A="Smith value1
Jones value2
Wilson value3"

 
B="Smith value10
Wilson value11
Fox value12
Brown value13"

 
PrevName="Not a valid name"
echo "$A
$B"
| sort -k1 |
while read Name Value
do
if [ "$Name" != "$PrevName" ]; then
echo $Name $Value
fi
PrevName="$Name"
done > outfile




You want all the records from list_A supplemented by all the records from list_B for which there is not already a matching name in list A. Mathematically this is:

A + B - {w in B | (w,value) in A }

There are many ways of accomplishing this, depending on access and needed efficiencies.

* If you can modify DB1 (with A), then download table B from DB2, upload it to DB1, then extract your data with the appropriate join
* If you can't modify DB1, then download both A and B and concatenate them to the same stream, with A followed by B. Then sort by the first field. Then process the stream one record at time. Duplicate names will be side-by-side. If the same name appears more than one time, print the first and ignore subsequent records with the same name.

Here is a sample solution to your problem (starting with two lists of names/values):

#!/bin/bash

A="Smith value1
Jones value2
Wilson value3"

B="Smith value10
Wilson value11
Fox value12
Brown value13"

PrevName="Not a valid name"
echo "$A
$B" | sort -k1 |
while read Name Value
do
if [ "$Name" != "$PrevName" ]; then
echo $Name $Value
fi
PrevName="$Name"
done > outfile

Here is the output:

Brown value13
Fox value12
Jones value2
Smith value1
Wilson value11
Regards,
Rajesh Kumar
Build and Release Engineer
My Blog: community.scmgalaxy.com/pg/profile/rajeshkumar
  • Page:
  • 1
Time to create page: 0.63 seconds

     
    
Home Forum