在Python中如何将字符串截断为75个字符?

在JavaScript中是这样做的:

var data="saddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsaddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsadddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
var info = (data.length > 75) ? data.substring[0,75] + '..' : data;

当前回答

info = (data[:75] + '..') if len(data) > 75 else data

其他回答

这是一个函数,我把它作为一个新的String类的一部分…它允许添加后缀(如果字符串是修剪后的大小,并且添加它足够长-尽管你不需要强制绝对大小)

我在改变一些东西的过程中,所以有一些无用的逻辑成本(如果_truncate…例如),不再需要它,并且在顶部有一个return…

但是,它仍然是一个截断数据的好函数……

##
## Truncate characters of a string after _len'nth char, if necessary... If _len is less than 0, don't truncate anything... Note: If you attach a suffix, and you enable absolute max length then the suffix length is subtracted from max length... Note: If the suffix length is longer than the output then no suffix is used...
##
## Usage: Where _text = 'Testing', _width = 4
##      _data = String.Truncate( _text, _width )                        == Test
##      _data = String.Truncate( _text, _width, '..', True )            == Te..
##
## Equivalent Alternates: Where _text = 'Testing', _width = 4
##      _data = String.SubStr( _text, 0, _width )                       == Test
##      _data = _text[  : _width ]                                      == Test
##      _data = ( _text )[  : _width ]                                  == Test
##
def Truncate( _text, _max_len = -1, _suffix = False, _absolute_max_len = True ):
    ## Length of the string we are considering for truncation
    _len            = len( _text )

    ## Whether or not we have to truncate
    _truncate       = ( False, True )[ _len > _max_len ]

    ## Note: If we don't need to truncate, there's no point in proceeding...
    if ( not _truncate ):
        return _text

    ## The suffix in string form
    _suffix_str     = ( '',  str( _suffix ) )[ _truncate and _suffix != False ]

    ## The suffix length
    _len_suffix     = len( _suffix_str )

    ## Whether or not we add the suffix
    _add_suffix     = ( False, True )[ _truncate and _suffix != False and _max_len > _len_suffix ]

    ## Suffix Offset
    _suffix_offset = _max_len - _len_suffix
    _suffix_offset  = ( _max_len, _suffix_offset )[ _add_suffix and _absolute_max_len != False and _suffix_offset > 0 ]

    ## The truncate point.... If not necessary, then length of string.. If necessary then the max length with or without subtracting the suffix length... Note: It may be easier ( less logic cost ) to simply add the suffix to the calculated point, then truncate - if point is negative then the suffix will be destroyed anyway.
    ## If we don't need to truncate, then the length is the length of the string.. If we do need to truncate, then the length depends on whether we add the suffix and offset the length of the suffix or not...
    _len_truncate   = ( _len, _max_len )[ _truncate ]
    _len_truncate   = ( _len_truncate, _max_len )[ _len_truncate <= _max_len ]

    ## If we add the suffix, add it... Suffix won't be added if the suffix is the same length as the text being output...
    if ( _add_suffix ):
        _text = _text[ 0 : _suffix_offset ] + _suffix_str + _text[ _suffix_offset: ]

    ## Return the text after truncating...
    return _text[ : _len_truncate ]

更简短的是:

info = data[:75] + (data[75:] and '..')
limit = 75
info = data[:limit] + '..' * (len(data) > limit)
info = data[:min(len(data), 75)

如果你想做一些更复杂的字符串截断,你可以采用sklearn方法作为实现:

sklearn.base.BaseEstimator.__repr__ (参见原始完整代码:https://github.com/scikit-learn/scikit-learn/blob/f3f51f9b6/sklearn/base.py#L262)

它增加了一些好处,比如避免在单词中间截断。

def truncate_string(data, N_CHAR_MAX=70):
    # N_CHAR_MAX is the (approximate) maximum number of non-blank
    # characters to render. We pass it as an optional parameter to ease
    # the tests.

    lim = N_CHAR_MAX // 2  # apprx number of chars to keep on both ends
    regex = r"^(\s*\S){%d}" % lim
    # The regex '^(\s*\S){%d}' % n
    # matches from the start of the string until the nth non-blank
    # character:
    # - ^ matches the start of string
    # - (pattern){n} matches n repetitions of pattern
    # - \s*\S matches a non-blank char following zero or more blanks
    left_lim = re.match(regex, data).end()
    right_lim = re.match(regex, data[::-1]).end()
    if "\n" in data[left_lim:-right_lim]:
        # The left side and right side aren't on the same line.
        # To avoid weird cuts, e.g.:
        # categoric...ore',
        # we need to start the right side with an appropriate newline
        # character so that it renders properly as:
        # categoric...
        # handle_unknown='ignore',
        # so we add [^\n]*\n which matches until the next \n
        regex += r"[^\n]*\n"
        right_lim = re.match(regex, data[::-1]).end()
    ellipsis = "..."
    if left_lim + len(ellipsis) < len(data) - right_lim:
        # Only add ellipsis if it results in a shorter repr
        data = data[:left_lim] + "..." + data[-right_lim:]
    return data