bgdZddlZddlZddlmZddlmZmZddlm Z ddgZ dZ e e Z d Zd Zd Zd Zd ZdZdZdZdZddddZe edddddZddddZe edddddZdS)z& Implementation of optimized einsum. N)c_einsum) asanyarray tensordot)array_function_dispatcheinsum einsum_path4abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZc`t||}td|dz }|r|dz }||zS)a Computes the number of FLOPS in the contraction. Parameters ---------- idx_contraction : iterable The indices involved in the contraction inner : bool Does this contraction require an inner product? num_terms : int The number of terms in a contraction size_dictionary : dict The size of each of the indices in idx_contraction Returns ------- flop_count : int The total number of FLOPS required for the contraction. Examples -------- >>> _flop_count('abc', False, 1, {'a': 2, 'b':3, 'c':5}) 30 >>> _flop_count('abc', True, 2, {'a': 2, 'b':3, 'c':5}) 60 )_compute_size_by_dictmax)idx_contractioninner num_termssize_dictionary overall_size op_factors L/opt/cloudlinux/venv/lib64/python3.11/site-packages/numpy/core/einsumfunc.py _flop_countrsB>)/JJLAy1}%%I Q ) ##c*d}|D] }|||z}|S)a Computes the product of the elements in indices based on the dictionary idx_dict. Parameters ---------- indices : iterable Indices to base the product on. idx_dict : dictionary Dictionary of index sizes Returns ------- ret : int The resulting product. Examples -------- >>> _compute_size_by_dict('abbc', {'a': 2, 'b':3, 'c':5}) 90 r )indicesidx_dictretis rr r 8s-. C  x{ Jrct}|}g}t|D])\}}||vr||z}||||z}*||z}||z } ||||| |fS)a  Finds the contraction for a given set of input and output sets. Parameters ---------- positions : iterable Integer positions of terms used in the contraction. input_sets : list List of sets that represent the lhs side of the einsum subscript output_set : set Set that represents the rhs side of the overall einsum subscript Returns ------- new_result : set The indices of the resulting contraction remaining : list List of sets that have not been contracted, the new set is appended to the end of this list idx_removed : set Indices removed from the entire contraction idx_contraction : set The indices used in the current contraction Examples -------- # A simple dot product test case >>> pos = (0, 1) >>> isets = [set('ab'), set('bc')] >>> oset = set('ac') >>> _find_contraction(pos, isets, oset) ({'a', 'c'}, [{'a', 'c'}], {'b'}, {'a', 'b', 'c'}) # A more complex case with additional terms in the contraction >>> pos = (0, 2) >>> isets = [set('abd'), set('ac'), set('bdc')] >>> oset = set('ac') >>> _find_contraction(pos, isets, oset) ({'a', 'c'}, [{'a', 'c'}, {'a', 'c'}], {'b', 'd'}, {'a', 'b', 'c', 'd'}) )setcopy enumerateappend) positions input_sets output_set idx_contract idx_remain remainingindvalue new_result idx_removeds r_find_contractionr,UsV55L""JI ++  U )   E !LL   U # # # % JJl*J*,K Z    ; ==rc dg|fg}tt|dz D]}g}|D]}|\}} } tjtt||z dD]q} t | | |} | \} }}}t | |}||kr1|t ||t| |z}| | gz}||||fr|r|}t|dd}|ttt||z gz }|cSt|dkr*ttt|gSt|dd}|S)a Computes all possible pair contractions, sieves the results based on ``memory_limit`` and returns the lowest cost path. This algorithm scales factorial with respect to the elements in the list ``input_sets``. Parameters ---------- input_sets : list List of sets that represent the lhs side of the einsum subscript output_set : set Set that represents the rhs side of the overall einsum subscript idx_dict : dictionary Dictionary of index sizes memory_limit : int The maximum number of elements in a temporary array Returns ------- path : list The optimal contraction order within the memory limit constraint. Examples -------- >>> isets = [set('abd'), set('ac'), set('bdc')] >>> oset = set() >>> idx_sizes = {'a': 1, 'b':2, 'c':3, 'd':4} >>> _optimal_path(isets, oset, idx_sizes, 5000) [(0, 2), (0, 1)] rr c|dSNrrxs rz_optimal_path..s 1Q4rkeyc|dSr0rr1s rr3z_optimal_path..s 1Q4r) rangelen itertools combinationsr,r rr!mintuple)r#r$r memory_limit full_results iteration iter_resultscurrcostr"r'concontr*new_input_setsr+r%new_size total_costnew_pospaths r _optimal_pathrJs>J'(L3z??Q.//  ! K KD)- &D)Y -eC OOi4O.P.PRSTT K K)iDDHLE NK1XFFl**#[{CPSHHV^%_%__ #se+##Z.$IJJJJ K$  'LL|888;D U5Z9!<==>>? ?DKKK <AeC OO,,--.. | 0 0 0 3D Krct||}|\}} } } t|} | |krdSfd|D} t| | z }t| | t |}| |f}||z|krdS||| gS)aCompute the cost (removed size + flops) and resultant indices for performing the contraction specified by ``positions``. Parameters ---------- positions : tuple of int The locations of the proposed tensors to contract. input_sets : list of sets The indices found on each tensors. output_set : set The output indices of the expression. idx_dict : dict Mapping of each index to its size. memory_limit : int The total allowed size for an intermediary tensor. path_cost : int The contraction cost so far. naive_cost : int The cost of the unoptimized expression. Returns ------- cost : (int, int) A tuple containing the size of any indices removed, and the flop cost. positions : tuple of int The locations of the proposed tensors to contract. new_input_sets : list of sets The resulting new list of indices if this proposed contraction is performed. Nc3DK|]}t|VdSNr ).0prr#s r z._parse_possible_contraction..s2SSA&z!}h??SSSSSSr)r,r sumrr8)r"r#r$rr= path_cost naive_costcontract idx_resultrEr+r%rF old_sizes removed_sizerBsorts ` ` r_parse_possible_contractionrZsB!J CCH>H,L |[#i..( K KD M4 D DJ&&t )^ ,,rc|d}|\}}g}|D]\}\}}} ||vs||vr| |t||kz t||kz =| |t||kz t||kz =| d|dd|t||kz t||kz |t||kz t||kz f} ||| | f|S)aUpdate the positions and provisional input_sets of ``results`` based on performing the contraction result ``best``. Remove any involving the tensors contracted. Parameters ---------- results : list List of contraction results produced by ``_parse_possible_contraction``. best : list The best contraction of ``results`` i.e. the one that will be performed. Returns ------- mod_results : list The list of modified results, updated with outcome of ``best`` contraction. r r.)intinsertr!) resultsbestbest_conbxby mod_resultsrBr2ycon_setsmod_cons r_update_other_resultsrhs$AwH FBK") 6 6fq!h ==AMM  R#b1f++%BF 3 4 R#b1f++%BF 3 4DGBK(((c!b&kk/CBKK/SR[[3q2v;;1NND'845555 rc t|dkrdgSt|dkrdgSttt|||}|\}}}}t||t||} t jtt|d} g} d} g} tt|dz D]}| D]\}||d||dr0t|||||| | }|| |]t| dkrt jtt|dD].}t|||||| | }|| |/t| dkr>| ttt|nt| d}t| |} |d}t|dz fd tD} | |d| |ddz } | S) a Finds the path by contracting the best pair until the input list is exhausted. The best pair is found by minimizing the tuple ``(-prod(indices_removed), cost)``. What this amounts to is prioritizing matrix multiplication or inner product operations, then Hadamard like operations, and finally outer operations. Outer products are limited by ``memory_limit``. This algorithm scales cubically with respect to the number of elements in the list ``input_sets``. Parameters ---------- input_sets : list List of sets that represent the lhs side of the einsum subscript output_set : set Set that represents the rhs side of the overall einsum subscript idx_dict : dictionary Dictionary of index sizes memory_limit : int The maximum number of elements in a temporary array Returns ------- path : list The greedy contraction order within the memory limit constraint. Examples -------- >>> isets = [set('abd'), set('ac'), set('bdc')] >>> oset = set() >>> idx_sizes = {'a': 1, 'b':2, 'c':3, 'd':4} >>> _greedy_path(isets, oset, idx_sizes, 5000) [(0, 2), (0, 1)] r )rr.)rr rNc|dSr0rr1s rr3z_greedy_path..s QqTrr4c3 K|]}|fV dSrMr)rOrnew_tensor_poss rrQz_greedy_path..s(HHQa(HHHHHHr) r8r,r7rr9r: isdisjointrZr!r<r;rh)r#r$rr=rUrVrEr+r%rT comb_iterknown_contractionsrSrIr?r"resultr`rls @r _greedy_pathrq8sH :!v ZA  x!s:!7!7ZPPH LL%&--f555%&&!++ E%J"8"899:::%>>:::33EtLL!W Z1,HHHH%2G2GHHH  DGT!WQZ Krct|dkrdSt|dkrdS|\}}t||zD]b}||||}}|dks|dks ||zdkrdS||zdz t||vkrdSct|}t|} ||z } | |z } t|} ||krdS|| krdS|| d|d| krdS|d| || dkrdS|| d|| dkrdS|d| |d| krdS| r| sdSdS)a Checks if we can use BLAS (np.tensordot) call and its beneficial to do so. Parameters ---------- inputs : list of str Specifies the subscripts for summation. result : str Resulting summation. idx_removed : set Indices that are removed in the summation Returns ------- type : bool Returns true if BLAS should and can be used, else False Notes ----- If the operations is BLAS level 1 or 2 and is not already aligned we default back to einsum as the memory movement to copy is more costly than the operation itself. Examples -------- # Standard GEMM operation >>> _can_dot(['ij', 'jk'], 'ik', set('j')) True # Can use the standard BLAS, but requires odd data movement >>> _can_dot(['ijj', 'jk'], 'ik', set('j')) False # DDOT where the memory is not aligned >>> _can_dot(['ijk', 'ikj'], '', set('ijk')) False rFr.r TN)r8rcountr]) inputsrpr+ input_left input_rightcnlnrset_left set_right keep_left keep_rightrss r_can_dotrsX ;1u 6{{au$J k) * *  !!!$$k&7&7&:&:B FFQBGaKK55 7Q;#a6k** * *55 +:HK  I;&I[(J [  B[  t9u 2#$$;ss+++t#2#+rcdd+++t2#$$;stt,,,t#2#+crc***t Ju 4rc t|dkrtdt|dtrW|ddd}d|ddD}|D]"}|dvr|t vrtd |z#nt |}g}g}tt|d zD]R}|| d|| dSt|r|d nd}d |D}d}t|dz }t|D]l\} } | D]Y}|tur|d z } tj |}n"#t$r} td| d} ~ wwxYw|t |z }Z| |kr|dz }m|a|dz }|D]Y}|tur|d z } tj |}n"#t$r} td| d} ~ wwxYw|t |z }Zd|vsd|vr\|ddkp|ddk} | s|ddkrtdd|vr|dddddd} t t t#| z }d|}d}d|vr0|d\}}|d}d}n|d}d}t|D]\} } d| vr| ddks| d dkrtd|| jdkrd}n0t+|| jd}|t| dz z}||kr|}|dkrtd|dkr| d d|| <|| d}| d ||| <d|}|dkrd}n || d}|r|d|d |zz }nd}|dd}t/t#|D];}|t vrtd |z||dkr||z }>> np.random.seed(123) >>> a = np.random.rand(4, 4) >>> b = np.random.rand(4, 4, 4) >>> _parse_einsum_input(('...a,...a->...', a, b)) ('za,xza', 'xz', [a, b]) # may vary >>> _parse_einsum_input((a, [Ellipsis, 0], b, [Ellipsis, 0])) ('za,xza', 'xz', [a, b]) # may vary rzNo input operands c,g|]}t|SrrrOvs r z'_parse_einsum_input..+888aJqMM888rr Nz.,->z#Character %s is not a valid symbol.r.r\c,g|]}t|Srrrs rrz'_parse_einsum_input..=rrz...z=For this input type lists must contain either int or Ellipsis,->->z%Subscripts can only contain one '->'..TFzInvalid Ellipses.rzEllipses lengths do not match.z/Output character %s did not appear in the inputzDNumber of einsum subscripts must be equal to the number of operands.)r8 ValueError isinstancestrreplaceeinsum_symbolslistr7r!popr Ellipsisoperatorindex TypeErrorrseinsum_symbols_setrjoinsplitshaper ndimsorted)operands subscriptss tmp_operands operand_listsubscript_listrP output_listlastnumsubeinvalidusedunused ellipse_indslongest input_tmp output_subsplit_subscriptsout_sub ellipse_countrep_inds out_ellipseoutput_subscripttmp_subscripts normal_indsinput_subscriptschars r_parse_einsum_inputr s6 8}},---(1+s##04a[((b11 888ABB<888 L LAF{{&& !F!JKKK' LH~~  s8}})** 7 7A    0 0 3 3 4 4 4  ! !,"2"21"5"5 6 6 6 6*-l*;*;El2&& 88<888 >""Q&!.11 " "HC 4 4==%'JJI$N1--$III')ABBGHII."33JJd{{c!  " $ J 4 4==%'JJI$N1--$III')ABBGHII."33JJ zsj00##C((1,L*2B2B32G2G!2K  Fz''--22DEE E j!!#r**223;;CCD"MM(3t99455wwv  :  $.$4$4T$:$: !Iz(s33 GG)//44 G!"233 I IHCczzIIcNNa''SYYu-=-=-B-B$%8999C=&",,$%MM$' (:A$>$>M!c#hhl3M 7**+G 1$$$%EFFF"a'',/KKr,B,B$S))+]NOO)>),[)9)9*:#;#;<>> np.random.seed(123) >>> a = np.random.rand(2, 2) >>> b = np.random.rand(2, 5) >>> c = np.random.rand(5, 2) >>> path_info = np.einsum_path('ij,jk,kl->il', a, b, c, optimize='greedy') >>> print(path_info[0]) ['einsum_path', (1, 2), (0, 1)] >>> print(path_info[1]) Complete contraction: ij,jk,kl->il # may vary Naive scaling: 4 Optimized scaling: 3 Naive FLOP count: 1.600e+02 Optimized FLOP count: 5.600e+01 Theoretical speedup: 2.857 Largest intermediate: 4.000e+00 elements ------------------------------------------------------------------------- scaling current remaining ------------------------------------------------------------------------- 3 kl,jk->jl ij,jl->il 3 jl,ij->il il->il A more complex index transformation example. >>> I = np.random.rand(10, 10, 10, 10) >>> C = np.random.rand(10, 10) >>> path_info = np.einsum_path('ea,fb,abcd,gc,hd->efgh', C, C, I, C, C, ... optimize='greedy') >>> print(path_info[0]) ['einsum_path', (0, 2), (0, 3), (0, 2), (0, 1)] >>> print(path_info[1]) Complete contraction: ea,fb,abcd,gc,hd->efgh # may vary Naive scaling: 8 Optimized scaling: 5 Naive FLOP count: 8.000e+08 Optimized FLOP count: 8.000e+05 Theoretical speedup: 1000.000 Largest intermediate: 1.000e+04 elements -------------------------------------------------------------------------- scaling current remaining -------------------------------------------------------------------------- 5 abcd,ea->bcde fb,gc,hd,bcde->efgh 5 bcde,fb->cdef gc,hd,cdef->efgh 5 cdef,gc->defg hd,defg->efgh 5 defg,hd->efgh efgh->efgh TrNFrrr.r zDid not understand the path: %src,g|]}t|SrrrOr2s rrzeinsum_path..Ts---Q#a&&---rrcg|]}gSrrrs rrzeinsum_path..Zs<<<<<.ts;;;AQ;;;rc0g|]}t|SrrN)rOtermdimension_dicts rrzeinsum_path..ws3>>>'t^<<>>>rc34K|]}t|VdSrM)r8rs rrQzeinsum_path..s(44AQ444444r)r r.optimalzPath name %s not found)reverser\c$g|] }||f Srr)rOr(rs rrzeinsum_path..s#JJJ#N3/5JJJrcg|] }|d S)r rrs rrzeinsum_path..s!D!D!D1!A$!D!D!DrrzHInvalid einsum_path is specified: {} more operands has to be contracted.)scalingcurrentr'z Complete contraction: %s z Naive scaling: %d z Optimized scaling: %d z Naive FLOP count: %.3e z Optimized FLOP count: %.3e z Theoretical speedup: %3.3f z' Largest intermediate: %.3e elements zK-------------------------------------------------------------------------- z%6s %24s %40s zJ--------------------------------------------------------------------------z %4d %24s %40s) rrr8r]floatrrrrrr7r rrr!keysr rRrr<rqrJKeyErrorrrr,r rrr RuntimeErrorformat):rrr path_typeexplicit_einsum_pathr=einsum_call_argrr input_listr#r$rbroadcast_indicestnumrshcnumrdim size_listmax_size memory_arg inner_productrTrI cost_list scale_listcontraction_list contract_indsrUout_indsr+r%rBbcast tmp_inputsr2new_bcast_indsdo_blasrV sort_result einsum_str contractionopt_costoverall_contractionheaderspeedupmax_i path_printnindsidx_rmr'blas remaining_strpath_runrs: @rrrs"`ID   L Uz)S99  Y LYq\]::# y>>Q  Jy|S$A$A  y|c5\ 2 2 9Q<(( aL 9C NNJKKK"O4Gx3P3P0&"'',,J--*---J%&&J"**33344GN<?? ?$D// + +JD$T(Caxx!$'..t444~**,,,,!$'1,,+.N4((N4$8 999$&K(,dN44H#'N&OPPP: (+t$$! +&<;):;;;>>>>'+;*<<>>>I9~~H ! 4444444s7||CqHMWmS__nUUJ<} e    OOv % % z ! !eC OO,,--. h  J NJOO i  Z^ZPP/;;;9;RR6Iz9&6 )&-&-mfT-%8%8$GGGHH $]J KK:B7*k<<c-6H6H.YY#l++,,,.xHHIII  . .A   jnnQ// 0 0 0 &**1-- -EE,;&'' z8[AAGGG 3t99  # #)JJJJJJJJJK!D!D{0C0C!D!D!DEEJ*%%%  000XXj))D0:= $k:z!!!}gV  ,,,,9~~!H :!  &Z1!45577 7,*+++T14DD 0F8#G  NNE14GGJ1CLL@@J1C OOCCJ3j@@J3h>>J4w>>J' is included as well as subscript labels of the precise output form. operands : list of array_like These are the arrays for the operation. out : ndarray, optional If provided, the calculation is done into this array. dtype : {data-type, None}, optional If provided, forces the calculation to use the data type specified. Note that you may have to also give a more liberal `casting` parameter to allow the conversions. Default is None. order : {'C', 'F', 'A', 'K'}, optional Controls the memory layout of the output. 'C' means it should be C contiguous. 'F' means it should be Fortran contiguous, 'A' means it should be 'F' if the inputs are all 'F', 'C' otherwise. 'K' means it should be as close to the layout as the inputs as is possible, including arbitrarily permuted axes. Default is 'K'. casting : {'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional Controls what kind of data casting may occur. Setting this to 'unsafe' is not recommended, as it can adversely affect accumulations. * 'no' means the data types should not be cast at all. * 'equiv' means only byte-order changes are allowed. * 'safe' means only casts which can preserve values are allowed. * 'same_kind' means only safe casts or casts within a kind, like float64 to float32, are allowed. * 'unsafe' means any data conversions may be done. Default is 'safe'. optimize : {False, True, 'greedy', 'optimal'}, optional Controls if intermediate optimization should occur. No optimization will occur if False and True will default to the 'greedy' algorithm. Also accepts an explicit contraction list from the ``np.einsum_path`` function. See ``np.einsum_path`` for more details. Defaults to False. Returns ------- output : ndarray The calculation based on the Einstein summation convention. See Also -------- einsum_path, dot, inner, outer, tensordot, linalg.multi_dot einops : similar verbose interface is provided by `einops `_ package to cover additional operations: transpose, reshape/flatten, repeat/tile, squeeze/unsqueeze and reductions. opt_einsum : `opt_einsum `_ optimizes contraction order for einsum-like expressions in backend-agnostic manner. Notes ----- .. versionadded:: 1.6.0 The Einstein summation convention can be used to compute many multi-dimensional, linear algebraic array operations. `einsum` provides a succinct way of representing these. A non-exhaustive list of these operations, which can be computed by `einsum`, is shown below along with examples: * Trace of an array, :py:func:`numpy.trace`. * Return a diagonal, :py:func:`numpy.diag`. * Array axis summations, :py:func:`numpy.sum`. * Transpositions and permutations, :py:func:`numpy.transpose`. * Matrix multiplication and dot product, :py:func:`numpy.matmul` :py:func:`numpy.dot`. * Vector inner and outer products, :py:func:`numpy.inner` :py:func:`numpy.outer`. * Broadcasting, element-wise and scalar multiplication, :py:func:`numpy.multiply`. * Tensor contractions, :py:func:`numpy.tensordot`. * Chained array operations, in efficient calculation order, :py:func:`numpy.einsum_path`. The subscripts string is a comma-separated list of subscript labels, where each label refers to a dimension of the corresponding operand. Whenever a label is repeated it is summed, so ``np.einsum('i,i', a, b)`` is equivalent to :py:func:`np.inner(a,b) `. If a label appears only once, it is not summed, so ``np.einsum('i', a)`` produces a view of ``a`` with no changes. A further example ``np.einsum('ij,jk', a, b)`` describes traditional matrix multiplication and is equivalent to :py:func:`np.matmul(a,b) `. Repeated subscript labels in one operand take the diagonal. For example, ``np.einsum('ii', a)`` is equivalent to :py:func:`np.trace(a) `. In *implicit mode*, the chosen subscripts are important since the axes of the output are reordered alphabetically. This means that ``np.einsum('ij', a)`` doesn't affect a 2D array, while ``np.einsum('ji', a)`` takes its transpose. Additionally, ``np.einsum('ij,jk', a, b)`` returns a matrix multiplication, while, ``np.einsum('ij,jh', a, b)`` returns the transpose of the multiplication since subscript 'h' precedes subscript 'i'. In *explicit mode* the output can be directly controlled by specifying output subscript labels. This requires the identifier '->' as well as the list of output subscript labels. This feature increases the flexibility of the function since summing can be disabled or forced when required. The call ``np.einsum('i->', a)`` is like :py:func:`np.sum(a, axis=-1) `, and ``np.einsum('ii->i', a)`` is like :py:func:`np.diag(a) `. The difference is that `einsum` does not allow broadcasting by default. Additionally ``np.einsum('ij,jh->ih', a, b)`` directly specifies the order of the output subscript labels and therefore returns matrix multiplication, unlike the example above in implicit mode. To enable and control broadcasting, use an ellipsis. Default NumPy-style broadcasting is done by adding an ellipsis to the left of each term, like ``np.einsum('...ii->...i', a)``. To take the trace along the first and last axes, you can do ``np.einsum('i...i', a)``, or to do a matrix-matrix product with the left-most indices instead of rightmost, one can do ``np.einsum('ij...,jk...->ik...', a, b)``. When there is only one operand, no axes are summed, and no output parameter is provided, a view into the operand is returned instead of a new array. Thus, taking the diagonal as ``np.einsum('ii->i', a)`` produces a view (changed in version 1.10.0). `einsum` also provides an alternative way to provide the subscripts and operands as ``einsum(op0, sublist0, op1, sublist1, ..., [sublistout])``. If the output shape is not provided in this format `einsum` will be calculated in implicit mode, otherwise it will be performed explicitly. The examples below have corresponding `einsum` calls with the two parameter methods. .. versionadded:: 1.10.0 Views returned from einsum are now writeable whenever the input array is writeable. For example, ``np.einsum('ijk...->kji...', a)`` will now have the same effect as :py:func:`np.swapaxes(a, 0, 2) ` and ``np.einsum('ii->i', a)`` will return a writeable view of the diagonal of a 2D array. .. versionadded:: 1.12.0 Added the ``optimize`` argument which will optimize the contraction order of an einsum expression. For a contraction with three or more operands this can greatly increase the computational efficiency at the cost of a larger memory footprint during computation. Typically a 'greedy' algorithm is applied which empirical tests have shown returns the optimal path in the majority of cases. In some cases 'optimal' will return the superlative path through a more expensive, exhaustive search. For iterative calculations it may be advisable to calculate the optimal path once and reuse that path by supplying it as an argument. An example is given below. See :py:func:`numpy.einsum_path` for more details. Examples -------- >>> a = np.arange(25).reshape(5,5) >>> b = np.arange(5) >>> c = np.arange(6).reshape(2,3) Trace of a matrix: >>> np.einsum('ii', a) 60 >>> np.einsum(a, [0,0]) 60 >>> np.trace(a) 60 Extract the diagonal (requires explicit form): >>> np.einsum('ii->i', a) array([ 0, 6, 12, 18, 24]) >>> np.einsum(a, [0,0], [0]) array([ 0, 6, 12, 18, 24]) >>> np.diag(a) array([ 0, 6, 12, 18, 24]) Sum over an axis (requires explicit form): >>> np.einsum('ij->i', a) array([ 10, 35, 60, 85, 110]) >>> np.einsum(a, [0,1], [0]) array([ 10, 35, 60, 85, 110]) >>> np.sum(a, axis=1) array([ 10, 35, 60, 85, 110]) For higher dimensional arrays summing a single axis can be done with ellipsis: >>> np.einsum('...j->...', a) array([ 10, 35, 60, 85, 110]) >>> np.einsum(a, [Ellipsis,1], [Ellipsis]) array([ 10, 35, 60, 85, 110]) Compute a matrix transpose, or reorder any number of axes: >>> np.einsum('ji', c) array([[0, 3], [1, 4], [2, 5]]) >>> np.einsum('ij->ji', c) array([[0, 3], [1, 4], [2, 5]]) >>> np.einsum(c, [1,0]) array([[0, 3], [1, 4], [2, 5]]) >>> np.transpose(c) array([[0, 3], [1, 4], [2, 5]]) Vector inner products: >>> np.einsum('i,i', b, b) 30 >>> np.einsum(b, [0], b, [0]) 30 >>> np.inner(b,b) 30 Matrix vector multiplication: >>> np.einsum('ij,j', a, b) array([ 30, 80, 130, 180, 230]) >>> np.einsum(a, [0,1], b, [1]) array([ 30, 80, 130, 180, 230]) >>> np.dot(a, b) array([ 30, 80, 130, 180, 230]) >>> np.einsum('...j,j', a, b) array([ 30, 80, 130, 180, 230]) Broadcasting and scalar multiplication: >>> np.einsum('..., ...', 3, c) array([[ 0, 3, 6], [ 9, 12, 15]]) >>> np.einsum(',ij', 3, c) array([[ 0, 3, 6], [ 9, 12, 15]]) >>> np.einsum(3, [Ellipsis], c, [Ellipsis]) array([[ 0, 3, 6], [ 9, 12, 15]]) >>> np.multiply(3, c) array([[ 0, 3, 6], [ 9, 12, 15]]) Vector outer product: >>> np.einsum('i,j', np.arange(2)+1, b) array([[0, 1, 2, 3, 4], [0, 2, 4, 6, 8]]) >>> np.einsum(np.arange(2)+1, [0], b, [1]) array([[0, 1, 2, 3, 4], [0, 2, 4, 6, 8]]) >>> np.outer(np.arange(2)+1, b) array([[0, 1, 2, 3, 4], [0, 2, 4, 6, 8]]) Tensor contraction: >>> a = np.arange(60.).reshape(3,4,5) >>> b = np.arange(24.).reshape(4,3,2) >>> np.einsum('ijk,jil->kl', a, b) array([[4400., 4730.], [4532., 4874.], [4664., 5018.], [4796., 5162.], [4928., 5306.]]) >>> np.einsum(a, [0,1,2], b, [1,0,3], [2,3]) array([[4400., 4730.], [4532., 4874.], [4664., 5018.], [4796., 5162.], [4928., 5306.]]) >>> np.tensordot(a,b, axes=([1,0],[0,1])) array([[4400., 4730.], [4532., 4874.], [4664., 5018.], [4796., 5162.], [4928., 5306.]]) Writeable returned arrays (since version 1.10.0): >>> a = np.zeros((3, 3)) >>> np.einsum('ii->i', a)[:] = 1 >>> a array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]) Example of ellipsis use: >>> a = np.arange(6).reshape((3,2)) >>> b = np.arange(12).reshape((4,3)) >>> np.einsum('ki,jk->ij', a, b) array([[10, 28, 46, 64], [13, 40, 67, 94]]) >>> np.einsum('ki,...k->i...', a, b) array([[10, 28, 46, 64], [13, 40, 67, 94]]) >>> np.einsum('k...,jk', a, b) array([[10, 28, 46, 64], [13, 40, 67, 94]]) Chained array operations. For more complicated contractions, speed ups might be achieved by repeatedly computing a 'greedy' path or pre-computing the 'optimal' path and repeatedly applying it, using an `einsum_path` insertion (since version 1.12.0). Performance improvements can be particularly significant with larger arrays: >>> a = np.ones(64).reshape(2,4,8) Basic `einsum`: ~1520ms (benchmarked on 3.1GHz Intel i5.) >>> for iteration in range(500): ... _ = np.einsum('ijk,ilm,njm,nlk,abc->',a,a,a,a,a) Sub-optimal `einsum` (due to repeated path calculation time): ~330ms >>> for iteration in range(500): ... _ = np.einsum('ijk,ilm,njm,nlk,abc->',a,a,a,a,a, optimize='optimal') Greedy `einsum` (faster optimal path approximation): ~160ms >>> for iteration in range(500): ... _ = np.einsum('ijk,ilm,njm,nlk,abc->',a,a,a,a,a, optimize='greedy') Optimal `einsum` (best usage pattern in some use cases): ~110ms >>> path = np.einsum_path('ijk,ilm,njm,nlk,abc->',a,a,a,a,a, optimize='optimal')[0] >>> for iteration in range(500): ... _ = np.einsum('ijk,ilm,njm,nlk,abc->',a,a,a,a,a, optimize=path) NFr)dtypeordercastingc"g|] \}}|v | Srr)rOkrvalid_einsum_kwargss rrzeinsum..`s2444FQ222222rz+Did not understand the following kwargs: %sTrrKAc3.K|]}|jjVdSrM)flags f_contiguous)rOarrs rrQzeinsum..ms'::#sy%::::::rFCc:g|]}|Sr)r)rOr2rs rrzeinsum..us#666A Q666rr rrraxesr)r)ritemsr8rrrupperallr rrrr!findrr<r)rrrr specified_outunknown_kwargsr output_orderrrrrrr'rr handle_out input_str results_indexrurv tensor_resultrleft_pos right_posnew_viewr s ` @rrrs5H tOM5  F5M,V,,,8774444fllnn444N >*E()** *"-h9="?"?"?H::gs++Ls"" ::::: : : LLL&&677+#+#[4?1fj)T6666666 #KqS9I5J5J(J   E'1'7'7'='= $I}&/ooc&:&: #J &4M = = - 5 5a < < #%biHF^^ 6 6  2 2333  !1!1!!4!45555!,YeHoouYGWGW5XYYH..:.($'F5M#MD$8=$H(]]V\]]  $ #u  D\DDDVDDH !!! ((; (1+\::::r)__doc__r9rnumpy.core.multiarrayrnumpy.core.numericrrnumpy.core.overridesr__all__rrrrr r,rJrZrhrqrrrrrrrrrr(s******44444444888888 ] #GS(($$$$$$L:9>9>9>xDDDL7-7-7-t%%%NbbbJkkk\j:j:j:Z15$0AAA$,%ccccBAcL '+T+G<<<q;q;q;q;=<q;q;q;r