2017-09-12 12 views
0

Ceci est le format de mes données:Python: Comment compter par jour sans horodatage

[Mon May 02 15:38:50 2016] [error] [client XX.XX.XX.XX] File does not exist: /home/XXX/XXXX/XXX/XXX/XXX.shtml 

Et voici mon code, je suis en train d'afficher compte de lignes par date:

# datecount.py 
    import sys, collections 

    # sys.argv is the list of command-line arguments 
    # sys.arg[0] is the name of the program itself 
    # sys.arg[1] is optional and will be the file name 

    # set input based on number of arguments 
    if len(sys.argv) == 1: 
     f = sys.stdin 
    elif len(sys.argv) == 2: 
     try: 
      f = open(sys.argv[1]) 
     except IOError: 
      print "Cannot open", sys.argv[1] 
      sys.exit() 
    else: 
     print "USAGE: python datecount [FILE]" 
     sys.exit() 

    dateCounts = collections.Counter() 
    # for every line passed into the script 
    for line in f: 
     # find indices of date section 
     start = line.find("[") 
     if start >= 0 : 
      end = line.find("]", start) 
      # graph just the date 
      date = line[start+21: end] #by YEAR 
      dateCounts[date]=dateCounts[date]+1 

    #print top dates 
    for date in dateCounts.most_common(): 
     sys.stdout.write(str(date) + "\n")` 

en ce moment, la sortie est:

('2017', 738057) 
    ('2016', 446204) 
    ('2015', 9995) 
    ('2014', 706) 

mais je veux seulement compter par date, par exemple :

('May 02 2016', 128) 
    ('May 03 2016', 105) 
    ('May 04 2016', 99) 

Pensais de mettre en œuvre l'expression régulière, mais ne savent pas comment. Comment puis-je me débarrasser de l'horodatage au milieu de la date?

Répondre

0

Nous pouvons obtenir le résultat attendu en utilisant le code ci-dessous. J'espère que ça aide.

# datecount.py 
import sys, collections 

# sys.argv is the list of command-line arguments 
# sys.arg[0] is the name of the program itself 
# sys.arg[1] is optional and will be the file name 

# set input based on number of arguments 
if len(sys.argv) == 1: 
    f = sys.stdin 
elif len(sys.argv) == 2: 
    try: 
     f = open(sys.argv[1]) 
    except IOError: 
     print "Cannot open", sys.argv[1] 
     sys.exit() 
else: 
    print "USAGE: python datecount [FILE]" 
    sys.exit() 

dateCounts = collections.Counter() 
# for every line passed into the script 
for line in f: 
    # find indices of date section 
    start = line.find("[") 
    if start >= 0 : 
     end = line.find("]", start) 
     # graph just the date 
     date = line[start+5:11] +' '+ line[start+21:end] #by Date and YEAR 
     dateCounts[date]=dateCounts[date]+1 

#print top dates 
for date in dateCounts.most_common(): 
    sys.stdout.write(str(date) + "\n")` 
0

mis en œuvre avec expression rationnelle:

import sys   
import collections 
import re 

dateCounts = collections.Counter()    
input_str = """ 
[Mon May 02 15:38:50 2016] [error] [client XX.XX.XX.XX] File does not exist: /home/XXX/XXXX/XXX/XXX/XXX.shtml 
[Mon May 03 15:38:50 2017] [error] [client XX.XX.XX.XX] File does not exist: /home/XXX/XXXX/XXX/XXX/XXX.shtml 
[Mon May 02 15:38:50 2016] [error] [client XX.XX.XX.XX] File does not exist: /home/XXX/XXXX/XXX/XXX/XXX.shtml 
""" 

found = re.findall("\[(.*)\].*\[.*\].*\[.*\].*", input_str, re.MULTILINE) 

for date in found:       
    dateCounts[date] = dateCounts[date] + 1 

for date in dateCounts.most_common(): 
    sys.stdout.write(str(date) + "\n") 

sortie:

('Mon May 02 15:38:50 2016', 2) 
('Mon May 03 15:38:50 2017', 1)