Je ne sais pas comment cela est facile, car il ne fait usage de certains concepts plus avancés tels que des générateurs, mais c'est au moins robuste et bien documenté. Le code actuel est en bas et est assez concis. L'idée de base est que la fonction iter_delim_sets
renvoie un itérateur sur (une suite de) tuples contenant le numéro de ligne, l'ensemble d'indices dans la chaîne "attendue" où le délimiteur a été trouvé, et un ensemble similaire pour le chaîne "réelle". Il y a un tel tuple généré pour chaque paire de lignes (attendues, résultat). Ces tuples sont succinctement formalisés dans un type collections.namedtuple
appelé DelimLocations
.
Ensuite, la fonction analyze
renvoie simplement des informations de plus haut niveau basées sur un tel ensemble de données, stockées dans un DelimAnalysis
namedtuple
. Ceci est fait en utilisant l'algèbre de base.
"""Compare two sequences of strings.
Test data:
>>> from pprint import pprint
>>> delimiter = '||'
>>> expected = (
... delimiter.join(("one", "fish", "two", "fish")),
... delimiter.join(("red", "fish", "blue", "fish")),
... delimiter.join(("I do not like them", "Sam I am")),
... delimiter.join(("I do not like green eggs and ham.",)))
>>> actual = (
... delimiter.join(("red", "fish", "blue", "fish")),
... delimiter.join(("one", "fish", "two", "fish")),
... delimiter.join(("I do not like spam", "Sam I am")),
... delimiter.join(("I do not like", "green eggs and ham.")))
The results:
>>> pprint([analyze(v) for v in iter_delim_sets(delimiter, expected, actual)])
[DelimAnalysis(index=0, correct=2, incorrect=1, count_diff=0),
DelimAnalysis(index=1, correct=2, incorrect=1, count_diff=0),
DelimAnalysis(index=2, correct=1, incorrect=0, count_diff=0),
DelimAnalysis(index=3, correct=0, incorrect=1, count_diff=1)]
What they mean:
>>> pprint(delim_analysis_doc)
(('index',
('The number of the lines from expected and actual',
'used to perform this analysis.')),
('correct',
('The number of delimiter placements in ``actual``',
'which were correctly placed.')),
('incorrect', ('The number of incorrect delimiters in ``actual``.',)),
('count_diff',
('The difference between the number of delimiters',
'in ``expected`` and ``actual`` for this line.')))
And a trace of the processing stages:
>>> def dump_it(it):
... '''Wraps an iterator in code that dumps its values to stdout.'''
... for v in it:
... print v
... yield v
>>> for v in iter_delim_sets(delimiter,
... dump_it(expected), dump_it(actual)):
... print v
... print analyze(v)
... print '======'
one||fish||two||fish
red||fish||blue||fish
DelimLocations(index=0, expected=set([9, 3, 14]), actual=set([9, 3, 15]))
DelimAnalysis(index=0, correct=2, incorrect=1, count_diff=0)
======
red||fish||blue||fish
one||fish||two||fish
DelimLocations(index=1, expected=set([9, 3, 15]), actual=set([9, 3, 14]))
DelimAnalysis(index=1, correct=2, incorrect=1, count_diff=0)
======
I do not like them||Sam I am
I do not like spam||Sam I am
DelimLocations(index=2, expected=set([18]), actual=set([18]))
DelimAnalysis(index=2, correct=1, incorrect=0, count_diff=0)
======
I do not like green eggs and ham.
I do not like||green eggs and ham.
DelimLocations(index=3, expected=set([]), actual=set([13]))
DelimAnalysis(index=3, correct=0, incorrect=1, count_diff=1)
======
"""
from collections import namedtuple
# Data types
## Here ``expected`` and ``actual`` are sets
DelimLocations = namedtuple('DelimLocations', 'index expected actual')
DelimAnalysis = namedtuple('DelimAnalysis',
'index correct incorrect count_diff')
## Explanation of the elements of DelimAnalysis.
## There's no real convenient way to add a docstring to a variable.
delim_analysis_doc = (
('index', ("The number of the lines from expected and actual",
"used to perform this analysis.")),
('correct', ("The number of delimiter placements in ``actual``",
"which were correctly placed.")),
('incorrect', ("The number of incorrect delimiters in ``actual``.",)),
('count_diff', ("The difference between the number of delimiters",
"in ``expected`` and ``actual`` for this line.")))
# Actual functionality
def iter_delim_sets(delimiter, expected, actual):
"""Yields a DelimLocations tuple for each pair of strings.
``expected`` and ``actual`` are sequences of strings.
"""
from re import escape, compile as compile_
from itertools import count, izip
index = count()
re = compile_(escape(delimiter))
def delimiter_locations(string):
"""Set of the locations of matches of ``re`` in ``string``."""
return set(match.start() for match in re.finditer(string))
string_pairs = izip(expected, actual)
return (DelimLocations(index=index.next(),
expected=delimiter_locations(e),
actual=delimiter_locations(a))
for e, a in string_pairs)
def analyze(locations):
"""Returns an analysis of a DelimLocations tuple.
``locations.expected`` and ``locations.actual`` are sets.
"""
return DelimAnalysis(
index=locations.index,
correct=len(locations.expected & locations.actual),
incorrect=len(locations.actual - locations.expected),
count_diff=(len(locations.actual) - len(locations.expected)))
Pour plus d'informations, vous devez fournir la sortie exacte que vous attendez pour cette ligne d'exemple. Ce n'est pas clair quand un '||' est correct (quand tous les mots entourant sont égaux? quand juste les mots précédents/suivants sont égaux?). Quant aux «vous les gens brillants», vous savez sûrement comment flatter l'ego d'un programmeur :-p – tokland