Saturday, February 12, 2011

How can I extract and save text using Perl?

No extracted data output to data2.txt? What goes wrong to the code?

MyFile.txt

ex1,fx2,xx1
mm1,nn2,gg3
EX1,hh2,ff7

This is my desired output in data2.txt:

ex1,fx2,xx1
EX1,hh2,ff7


#! /DATA/PLUG/pvelasco/Softwares/PERLINUX/bin/perl -w

my $infile  ='My1.txt';
my $outfile ='data2.txt';

open IN,  '<', $infile  or die "Cant open $infile:$!";
open OUT, '>', $outfile or die "Cant open $outfile:$!";

while (<IN>) {   
  if (m/EX$HF|ex$HF/) {
    print OUT $_, "\n";      
    print $_;   
  }
}

close IN;
close OUT;
  • When I run your code, but name the input file My1.txt instead of MyFile.txt I get the desired output - except with empty lines, which you can remove by removing the , "\n" from the print statement.

    Shiel : Oh sorry I forgot to edit My1.txt. It should be MyFile.txt.
    From moritz
  • This regex makes no sense:

    m/EX$HF|ex$HF/
    

    Is $HF supposed to be a variable? What are you trying to match?

    Also, the second line in every Perl script you write should be:

    use strict;
    

    It will make Perl catch such mistakes and tell you about them, rather than silently ignoring them.

    Brad Gilbert : ... and the third should be `use warnings`.
    raldi : He already has -w on the first line.
    Brad Gilbert : Well why doesn't he just add -Mstrict to the first line?
    From raldi
  • while (<IN>) {
      if (m/^(EX|ex)\d.*/) {   
        print OUT "$_";      
        print $_;   
      }
    }
    
    Jouni K. Seppänen : Also, if you don't need the (debug?) output of all lines in the input file, you can reduce this to the one-liner perl -ne 'print if /^(EX|ex)\d/'
    John Ferguson : perl golf has its place, but I'd rather people put readable code into production.
    Brad Gilbert : This *is* simple enough to use a one-liner.
    From benPearce
  • Bleh! "use strict;" "use warnings;". Lexical-filehandles. Three-args-open.

  • What are you trying to do? Keep all the lines that start with EX? Using a regexp is overkill - you're much better off just checking the first two letters. In python:

    
    
    from __future__ import with_statement
    class converter(object):
        def __init__(self, inFile, outFile):
            self.inFile, self.outFile = inFile, outFile
    
    
    def main(self):
        with open(self.inFile, 'r') as infsock:
            with open(self.outFile, 'w') as outfsock:
                for line in infsock:
                    self.doReplace(line, outfsock)
    
    def doReplace(self, line, outsock):
        if ''.join(line[:2]).upper() == "EX":
            outsock.write(line)
    

    if name == 'main': import sys ZeConverter = converter(sys.argv[1], sys.argv[2]) ZeConverter.main()

    Sub out doReplace if you need a different replacement method

    From kanja
  • The filenames don't match.

    open(my $inhandle, '<', $infile)   or die "Cant open $infile: $!";
    open(my $outhandle, '>', $outfile) or die "Cant open $outfile: $!";
    
    while(my $line = <$inhandle>) {   
    
        # Assumes that ex, Ex, eX, EX all are valid first characters
        if($line =~ m{^ex}i) {         # or   if(lc(substr $line, 0 => 2) eq 'ex') {
            print { $outhandle } $line;      
            print $line;
        }
    }
    

    And yes, always always use strict;

    You could also chomp $line and (if using perl 5.10) say $line instead of print "$line\n".

    raldi : What are the braces for in this line? print { $outhandle } $line;
    draegtun : It helps avoid mistakes like... print $outhandle, $line; (the comma means print won't recognise $outhandle as a file handle). Its a recommendation from "Perl Best Practises" by Damian Conway.
    Brad Gilbert : I didn't realize that would work.
    From Berserk
  • Sorry if this seems like stating the bleeding obvious, but what's wrong with

    grep -i ^ex < My1.txt > data2.txt
    

    ... or if you really want to do it in perl (and there's nothing wrong with that):

    perl -ne '/^ex/i && print' < My1.txt > data2.txt
    

    This assumes the purpose of the request is to find lines that start with EX, with case-insensitivity.

    From RET

0 comments:

Post a Comment