<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>python &amp;mdash; StealthyCoder</title>
    <link>https://stealthycoder.writeas.com/tag:python</link>
    <description>Making code ninjas out of everyone</description>
    <pubDate>Sat, 13 Jun 2026 16:39:38 +0000</pubDate>
    <item>
      <title>Small change, big difference</title>
      <link>https://stealthycoder.writeas.com/small-change-big-difference?pk_campaign=rss-feed</link>
      <description>&lt;![CDATA[Sometimes the small things in life can make the biggest difference. This time it is a small adventure in Python. !--more--&#xA;&#xA;I was just working on a small PoC (Proof of Concept) to do some nginx testing. I wanted to have a working nginx, two simple Python APIs and then a client that would send requests to the nginx instance that would be load balancing to either of those Python API instances. For some reason it would not collect the statistics correctly. Hereunder a simplified example but it will contain the core concept of what I was trying to do.&#xA;&#xA;Concurrency&#xA;&#xA;So in Python it is very simple to do concurrency. You get an executor and then it will just be available. In code:&#xA;&#xA;from concurrent.futures import ProcessPoolExecutor&#xA;&#xA;var = 0&#xA;&#xA;def func(var):&#xA;    var = var + 1&#xA;    print(var)&#xA;&#xA;with ProcessPoolExecutor() as pool:&#xA;    for  in range(10):&#xA;        pool.submit(func, var)&#xA;&#xA;print(var)&#xA;&#xA;If you run it you will see something weird. The var will be constantly 1 and the last one will be 0. What is going on here?&#xA;&#xA;Processes&#xA;&#xA;The key indicator is the ProcessPoolExecutor it will create a separate instance/interpreter/process and run the code inside there. So that var will be complete different instance as well. It took some time for me to realize this. So how to fix this? Either switch to ThreadPoolExecutor, or do the following:&#xA;&#xA;from concurrent.futures import ProcessPoolExecutor, wait&#xA;from multiprocessing.managers import SyncManager&#xA;&#xA;var = 0&#xA;&#xA;def updatevar():&#xA;    global var&#xA;    var = var + 1&#xA;&#xA;manager = SyncManager(address=(&#39;&#39;, 5566), authkey=b&#34;secret&#34;)&#xA;manager.register(&#39;updatevar&#39;, callable=updatevar)&#xA;manager.register(&#39;getvar&#39;, callable=lambda: var)&#xA;manager.start()&#xA;&#xA;def func():&#xA;    m = SyncManager(address=(&#39;&#39;, 5566), authkey=b&#34;secret&#34;)&#xA;    m.register(&#34;updatevar&#34;)&#xA;    m.connect()&#xA;    m.updatevar()&#xA;    m.shutdown()&#xA;&#xA;futs = []&#xA;with ProcessPoolExecutor() as pool:&#xA;    for  in range(10):&#xA;        futs.append(pool.submit(func))&#xA;wait(futs)&#xA;print(manager.getvar())&#xA;manager.shutdown()&#xA;&#xA;That is quite the transformation. So in essence what we need extra is a Manager to synchronize between processes. One for updating the var and one to get the var. &#xA;&#xA;Then every function needs to connect to the Manager and then register everything double. &#xA;&#xA;If one wants to get rid of the global var then you could make a simple class that holds state and instantiate one to the manager as well. &#xA;&#xA;Conclusion&#xA;&#xA;Sometimes something simple turns out to be quite complicated. A bonus solution would be to use sharedmemory. That could be something for another post.&#xA;&#xA;#devlife #python]]&gt;</description>
      <content:encoded><![CDATA[<p>Sometimes the small things in life can make the biggest difference. This time it is a small adventure in Python. </p>

<p>I was just working on a small PoC (Proof of Concept) to do some <code>nginx</code> testing. I wanted to have a working <code>nginx</code>, two simple Python APIs and then a client that would send requests to the <code>nginx</code> instance that would be load balancing to either of those Python API instances. For some reason it would not collect the statistics correctly. Hereunder a simplified example but it will contain the core concept of what I was trying to do.</p>

<h1 id="concurrency" id="concurrency">Concurrency</h1>

<p>So in Python it is very simple to do concurrency. You get an executor and then it will just be available. In code:</p>

<pre><code class="language-python">from concurrent.futures import ProcessPoolExecutor

var = 0


def func(var):
    var = var + 1
    print(var)


with ProcessPoolExecutor() as pool:
    for _ in range(10):
        pool.submit(func, var)

print(var)
</code></pre>

<p>If you run it you will see something weird. The var will be constantly <code>1</code> and the last one will be <code>0</code>. What is going on here?</p>

<h1 id="processes" id="processes">Processes</h1>

<p>The key indicator is the <code>ProcessPoolExecutor</code> it will create a separate instance/interpreter/process and run the code inside there. So that <code>var</code> will be complete different instance as well. It took some time for me to realize this. So how to fix this? Either switch to <code>ThreadPoolExecutor</code>, or do the following:</p>

<pre><code class="language-python">from concurrent.futures import ProcessPoolExecutor, wait
from multiprocessing.managers import SyncManager

var = 0


def update_var():
    global var
    var = var + 1


manager = SyncManager(address=(&#39;&#39;, 5566), authkey=b&#34;secret&#34;)
manager.register(&#39;update_var&#39;, callable=update_var)
manager.register(&#39;get_var&#39;, callable=lambda: var)
manager.start()


def func():
    m = SyncManager(address=(&#39;&#39;, 5566), authkey=b&#34;secret&#34;)
    m.register(&#34;update_var&#34;)
    m.connect()
    m.update_var()
    m.shutdown()


futs = []
with ProcessPoolExecutor() as pool:
    for _ in range(10):
        futs.append(pool.submit(func))
wait(futs)
print(manager.get_var())
manager.shutdown()
</code></pre>

<p>That is quite the transformation. So in essence what we need extra is a <code>Manager</code> to synchronize between processes. One for updating the <code>var</code> and one to get the <code>var</code>.</p>

<p>Then every function needs to connect to the <code>Manager</code> and then register everything double.</p>

<p>If one wants to get rid of the <code>global var</code> then you could make a simple class that holds state and instantiate one to the <code>manager</code> as well.</p>

<h1 id="conclusion" id="conclusion">Conclusion</h1>

<p>Sometimes something simple turns out to be quite complicated. A bonus solution would be to use <code>shared_memory</code>. That could be something for another post.</p>

<p><a href="https://stealthycoder.writeas.com/tag:devlife" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">devlife</span></a> <a href="https://stealthycoder.writeas.com/tag:python" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">python</span></a></p>
]]></content:encoded>
      <guid>https://stealthycoder.writeas.com/small-change-big-difference</guid>
      <pubDate>Thu, 13 Jun 2024 21:41:07 +0000</pubDate>
    </item>
    <item>
      <title>Strawberry Fields</title>
      <link>https://stealthycoder.writeas.com/strawberry-fields-ln3t?pk_campaign=rss-feed</link>
      <description>&lt;![CDATA[I was working on getting some work ported over to Strawberry from Graphene-Django, and I suddenly hit a snag. Once I found out what happened to the Strawberry Fields, I was just glad I could solve it at that point. !--more--&#xA;&#xA;The premise&#xA;&#xA;So the premise is that I worked on a project that was written with Graphene Django, for Django obviously, that essentially is the Python framework implementation behind a GraphQL schema. So you have a GraphQL endpoint where you send your queries and mutations and it will validate the schema and execute the queries and mutations. The two problems with the framework are it is not fully async and it is a lot of meta class programming that implies a lot of configuring but not a lot of control over the implementation details.&#xA;&#xA;This meant when we were hitting our performance issues, we tried everything from low-hanging fruit like using Dataloaders, to optimizing certain queries by not relying on the ORM (Graphene Mongo in this case) and just using PyMongo directly. &#xA;&#xA;Still we got stuck again, and the fact we relied heavily on the Promise implementation in Python to match the Promises/A+ from Javascript land did not help actually. Also there tried to improve some things by making things more streamlined in their concurrency, but alas. So the move is to another framework, preferably in Python, to keep the current dev team. &#xA;&#xA;Enter Strawberry&#xA;&#xA;So Strawberry is a nice framework that does both sync and async and it is a lot of configuring as well, but also lets you control the implementation details. For instance through things called FieldExtensions. These can either be run when the schema is first generated (through the apply method) or when the nodes are being resolved (through either sync resolve or async resolveasync functions). This is a wonderful way to, through a middleware type approach, have a way to tweak the implementation details. &#xA;&#xA;One of the things that kind of was lacking in the Graphene Django implementation setup was a nice automatic control of what fields were allowed to be given as filters. Ideally it should just be always the fields exposed on the Node itself. That however was not the case, you had to manually make it so. Trying to make the new stack better and fixing that particular nuisance, I made a simple BaseExtension class:&#xA;&#xA;class BaseExtension(FieldExtension):&#xA;&#xA;    def apply(self, field: StrawberryField) -  None:&#xA;        self.filterfields = []&#xA;        resolvedtype: Type[WithStrawberryObjectDefinition] = cast(&#xA;            Type[WithStrawberryObjectDefinition], field.resolvetype()&#xA;        )&#xA;        if resolvedtype.strawberrydefinition_.specializedtypevarmap:&#xA;            node = cast(&#xA;                Type[WithStrawberryObjectDefinition],&#xA;                resolvedtype.strawberrydefinition_.specializedtypevarmap[&#xA;                    &#34;NodeType&#34;&#xA;                ],&#xA;            )&#xA;&#xA;        for f in node._strawberrydefinition_.fields:&#xA;            field.arguments.append(&#xA;                StrawberryArgument(&#xA;                    pythonname=f.name,&#xA;                    graphqlname=f.name.replace(&#34;&#34;, &#34;&#34;)&#xA;                    if f.name.startswith(&#34;&#34;)&#xA;                    else None,&#xA;                    typeannotation=StrawberryAnnotation(&#xA;                        Optional[f.type]&#xA;                        if not isinstance(f.type, StrawberryOptional)&#xA;                        else Optional[f.type.oftype]&#xA;                    ),&#xA;                    description=&#34;&#34;,&#xA;                    default=strawberry.UNSET,&#xA;                )&#xA;            )&#xA;            self.filterfields.append(f.name)&#xA;&#xA;All this really does is go over the fields defined and add them all as Arguments so that you can filter on them and they will be passed along as kwargs in the resolve functions. &#xA;&#xA;Perfect.&#xA;&#xA;Snag time&#xA;&#xA;So I was porting the NodeTypes and this project also uses Relay. It is a certain implementation of GraphQL itself. Not very important, except for me to say now I had not made any connections yet. As in one Node -  another Node. Which is quite common in GraphQL and in Relay. &#xA;&#xA;When I made the first connection, the schema would not even generate. I was so frustrated, and nothing worked. I could go from resolver function -  relay.ListConnection[NodeType] but not from Node -  relay.ListConnection[NodeType]. It kept complaining about it not being a GraphQLInput Type. I did not want it as an input. I struggled and looked deep into the source code of everything, trying to hack it there. Making it dynamically an input or an output depending on properties, and I suddenly stopped. Since there was no mention of this online whatsoever it had to be a problem I caused and created. &#xA;&#xA;I went to bed, late. Woke up. Paced around a bit. &#xA;In my head I thought, why is it automatically turning into an argu.....oh I am an idiot. &#xA;&#xA;So I revisited my BaseExtension field that powered my dynamic argument adding stuff. I tweaked it here and there and the following is the fixed version:&#xA;&#xA;class BaseExtension(FieldExtension):&#xA;    filterfields = [&#34;projectid&#34;, &#34;changeorderid&#34;]&#xA;&#xA;    def apply(self, field: StrawberryField) -  None:&#xA;        self.filterfields = [&#34;projectid&#34;, &#34;changeorderid&#34;]&#xA;        resolvedtype: Type[WithStrawberryObjectDefinition] = cast(&#xA;            Type[WithStrawberryObjectDefinition], field.resolvetype()&#xA;        )&#xA;        if resolvedtype.strawberrydefinition_.specializedtypevarmap:&#xA;            node = cast(&#xA;                Type[WithStrawberryObjectDefinition],&#xA;                resolvedtype.strawberrydefinition_.specializedtypevarmap[&#xA;                    &#34;NodeType&#34;&#xA;                ],&#xA;            )&#xA;        else:&#xA;            node = resolvedtype&#xA;&#xA;        for f in node.strawberrydefinition_.fields:&#xA;            if inspect.isclass(f.type) and issubclass(&#xA;                f.type, strawberry.relay.types.ListConnection&#xA;            ):&#xA;                continue&#xA;            if isinstance(f.type, StrawberryOptional):&#xA;                if inspect.isclass(f.type.oftype) and issubclass(&#xA;                    f.type.oftype, strawberry.relay.types.ListConnection&#xA;                ):&#xA;                    continue&#xA;            field.arguments.append(&#xA;                StrawberryArgument(&#xA;                    pythonname=f.name,&#xA;                    graphqlname=f.name.replace(&#34;&#34;, &#34;&#34;)&#xA;                    if f.name.startswith(&#34;&#34;)&#xA;                    else None,&#xA;                    typeannotation=StrawberryAnnotation(&#xA;                        Optional[f.type]&#xA;                        if not isinstance(f.type, StrawberryOptional)&#xA;                        else Optional[f.type.oftype]&#xA;                    ),&#xA;                    description=&#34;&#34;,&#xA;                    default=strawberry.UNSET,&#xA;                )&#xA;            )&#xA;            self.filterfields.append(f.name)&#xA;&#xA;Essentially what I needed to do was check if the type or of_type is a class. If it is check if it is a relay.ListConnection type class and then exclude it from the argument generation. All worked right after this. &#xA;&#xA;Conclusion&#xA;&#xA;I really like this framework. It gives me insight into how they operate and why sometimes a particular query is slow, and they give you the space to fix it. For example I already fixed the fact that we can load all the necessary subparts in one go from a node with the Dataloaders. That was not possible before. However it was still as slow as the old stack, because each node on it&#39;s own tried to create this new relay.ListConnection for one Edge essentially. &#xA;&#xA;We already have all the instances needed to make all the edges when doing the Dataloader logic, so implement in that particular spot also the creation of all the edges in one go. Then have a simple mapping of node.id -  Edge and you are done. This sped up things by quite a significant margin. &#xA;&#xA;Something the old stack could not really do. It had no real way of giving you the same tools to do the same thing. &#xA;&#xA;#devlife #python #graphql]]&gt;</description>
      <content:encoded><![CDATA[<p>I was working on getting some work ported over to <a href="https://strawberry.rocks" rel="nofollow">Strawberry</a> from Graphene-Django, and I suddenly hit a snag. Once I found out what happened to the Strawberry Fields, I was just glad I could solve it at that point. </p>

<h2 id="the-premise" id="the-premise">The premise</h2>

<p>So the premise is that I worked on a project that was written with Graphene Django, for Django obviously, that essentially is the Python framework implementation behind a GraphQL schema. So you have a GraphQL endpoint where you send your queries and mutations and it will validate the schema and execute the queries and mutations. The two problems with the framework are it is not fully <code>async</code> and it is a lot of meta class programming that implies a lot of configuring but not a lot of control over the implementation details.</p>

<p>This meant when we were hitting our performance issues, we tried everything from low-hanging fruit like using Dataloaders, to optimizing certain queries by not relying on the ORM (Graphene Mongo in this case) and just using PyMongo directly.</p>

<p>Still we got stuck again, and the fact we relied heavily on the Promise implementation in Python to match the Promises/A+ from Javascript land did not help actually. Also there tried to improve some things by making things more streamlined in their concurrency, but alas. So the move is to another framework, preferably in Python, to keep the current dev team.</p>

<h2 id="enter-strawberry" id="enter-strawberry">Enter Strawberry</h2>

<p>So Strawberry is a nice framework that does both <code>sync</code> and <code>async</code> and it is a lot of configuring as well, but also lets you control the implementation details. For instance through things called <code>FieldExtensions</code>. These can either be run when the schema is first generated (through the <code>apply</code> method) or when the nodes are being resolved (through either <code>sync</code> <code>resolve</code> or <code>async</code> <code>resolve_async</code> functions). This is a wonderful way to, through a middleware type approach, have a way to tweak the implementation details.</p>

<p>One of the things that kind of was lacking in the Graphene Django implementation setup was a nice automatic control of what fields were allowed to be given as filters. Ideally it should just be always the fields exposed on the Node itself. That however was not the case, you had to manually make it so. Trying to make the new stack better and fixing that particular nuisance, I made a simple <code>BaseExtension</code> class:</p>

<pre><code class="language-python">class BaseExtension(FieldExtension):

    def apply(self, field: StrawberryField) -&gt; None:
        self.filter_fields = []
        resolved_type: Type[WithStrawberryObjectDefinition] = cast(
            Type[WithStrawberryObjectDefinition], field.resolve_type()
        )
        if resolved_type.__strawberry_definition__.specialized_type_var_map:
            node = cast(
                Type[WithStrawberryObjectDefinition],
                resolved_type.__strawberry_definition__.specialized_type_var_map[
                    &#34;NodeType&#34;
                ],
            )

        for f in node.__strawberry_definition__.fields:
            field.arguments.append(
                StrawberryArgument(
                    python_name=f.name,
                    graphql_name=f.name.replace(&#34;_&#34;, &#34;&#34;)
                    if f.name.startswith(&#34;_&#34;)
                    else None,
                    type_annotation=StrawberryAnnotation(
                        Optional[f.type]
                        if not isinstance(f.type, StrawberryOptional)
                        else Optional[f.type.of_type]
                    ),
                    description=&#34;&#34;,
                    default=strawberry.UNSET,
                )
            )
            self.filter_fields.append(f.name)
</code></pre>

<p>All this really does is go over the fields defined and add them all as <code>Arguments</code> so that you can filter on them and they will be passed along as <code>kwargs</code> in the <code>resolve</code> functions.</p>

<p>Perfect.</p>

<h2 id="snag-time" id="snag-time">Snag time</h2>

<p>So I was porting the <code>NodeType</code>s and this project also uses <em>Relay</em>. It is a certain implementation of GraphQL itself. Not very important, except for me to say now I had not made any connections yet. As in one Node –&gt; another Node. Which is quite common in GraphQL and in Relay.</p>

<p>When I made the first connection, the schema would not even generate. I was so frustrated, and nothing worked. I could go from <code>resolver function -&gt; relay.ListConnection[NodeType]</code> but not from <code>Node -&gt; relay.ListConnection[NodeType]</code>. It kept complaining about it not being a GraphQLInput Type. I did not want it as an input. I struggled and looked deep into the source code of everything, trying to hack it there. Making it dynamically an input or an output depending on properties, and I suddenly stopped. Since there was no mention of this online whatsoever it had to be a problem I caused and created.</p>

<p>I went to bed, late. Woke up. Paced around a bit.
In my head I thought, why is it automatically turning into an argu.....oh I am an idiot.</p>

<p>So I revisited my <code>BaseExtension</code> field that powered my dynamic argument adding stuff. I tweaked it here and there and the following is the fixed version:</p>

<pre><code class="language-python">class BaseExtension(FieldExtension):
    filter_fields = [&#34;project_id&#34;, &#34;change_order_id&#34;]

    def apply(self, field: StrawberryField) -&gt; None:
        self.filter_fields = [&#34;project_id&#34;, &#34;change_order_id&#34;]
        resolved_type: Type[WithStrawberryObjectDefinition] = cast(
            Type[WithStrawberryObjectDefinition], field.resolve_type()
        )
        if resolved_type.__strawberry_definition__.specialized_type_var_map:
            node = cast(
                Type[WithStrawberryObjectDefinition],
                resolved_type.__strawberry_definition__.specialized_type_var_map[
                    &#34;NodeType&#34;
                ],
            )
        else:
            node = resolved_type

        for f in node.__strawberry_definition__.fields:
            if inspect.isclass(f.type) and issubclass(
                f.type, strawberry.relay.types.ListConnection
            ):
                continue
            if isinstance(f.type, StrawberryOptional):
                if inspect.isclass(f.type.of_type) and issubclass(
                    f.type.of_type, strawberry.relay.types.ListConnection
                ):
                    continue
            field.arguments.append(
                StrawberryArgument(
                    python_name=f.name,
                    graphql_name=f.name.replace(&#34;_&#34;, &#34;&#34;)
                    if f.name.startswith(&#34;_&#34;)
                    else None,
                    type_annotation=StrawberryAnnotation(
                        Optional[f.type]
                        if not isinstance(f.type, StrawberryOptional)
                        else Optional[f.type.of_type]
                    ),
                    description=&#34;&#34;,
                    default=strawberry.UNSET,
                )
            )
            self.filter_fields.append(f.name)
</code></pre>

<p>Essentially what I needed to do was check if the <code>type</code> or <code>of_type</code> is a <code>class</code>. If it is check if it is a <code>relay.ListConnection</code> type class and then exclude it from the argument generation. All worked right after this.</p>

<h2 id="conclusion" id="conclusion">Conclusion</h2>

<p>I really like this framework. It gives me insight into how they operate and why sometimes a particular query is slow, and they give you the space to fix it. For example I already fixed the fact that we can load all the necessary subparts in one go from a node with the Dataloaders. That was not possible before. However it was still as slow as the old stack, because each node on it&#39;s own tried to create this new <code>relay.ListConnection</code> for one <code>Edge</code> essentially.</p>

<p>We already have all the instances needed to make all the edges when doing the Dataloader logic, so implement in that particular spot also the creation of all the edges in one go. Then have a simple mapping of <code>node.id -&gt; Edge</code> and you are done. This sped up things by quite a significant margin.</p>

<p>Something the old stack could not really do. It had no real way of giving you the same tools to do the same thing.</p>

<p><a href="https://stealthycoder.writeas.com/tag:devlife" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">devlife</span></a> <a href="https://stealthycoder.writeas.com/tag:python" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">python</span></a> <a href="https://stealthycoder.writeas.com/tag:graphql" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">graphql</span></a></p>
]]></content:encoded>
      <guid>https://stealthycoder.writeas.com/strawberry-fields-ln3t</guid>
      <pubDate>Tue, 05 Mar 2024 11:58:49 +0000</pubDate>
    </item>
    <item>
      <title>What a fantastic ride</title>
      <link>https://stealthycoder.writeas.com/what-a-fantastic-ride?pk_campaign=rss-feed</link>
      <description>&lt;![CDATA[I was put in charge to write some extra tests in our framework covering our Docker registry endpoints. We created a framework around Locust. !--more-- Naturally, I first started learning the framework and it is pretty nice to use. You create simple classes that house the flow of the requests you want to execute and you call them one by one, stating what should be the success and what the failure. &#xA;&#xA;The goal was to prove that our Python code was horrible and needed to be switched to Golang implementation ASAP. &#xA;&#xA;Rough start&#xA;&#xA;I could not even start our Docker container because some Werkzeug, Flask and Locust combo made it all not work anymore. So I first had to untangle that mess. It turned out that some older code of Flask used a specific call to a function that does not exist anymore at the provided location. &#xA;&#xA;  For all who are interested, the actual error is: cannot import name &#39;BaseResponse&#39; from &#39;werkzeug.wrappers&#39;. &#xA;&#xA;After that initial rough start I started out by mapping out how Docker actually works. What happens when you do docker pull or docker login for example. Turns out they are all just HTTP calls to a REST API backend. That returns some data and with that data we continue onward to more calls until all the data has been gotten for docker to actually create the containers and/or images. &#xA;&#xA;Docker API&#xA;&#xA;I wrote the simple Python PoC code for the DockerAPI client. In principle I can use that code now to get any image I want, but I do not use that. So I included that whole code into our Locust framework to make sure the test was always set up correctly, and that subsequent images were deleted. &#xA;&#xA;I ran into the second problem. Images cannot be removed from a Docker registry by default. You have to enable that feature. So when I started talking to our devs, they said just forget about it. Do the setup code once, so that the image exists that is needed in a shared test repository and continue onward.&#xA;&#xA;So I scrapped the entire code out of Locust and began again anew.&#xA;&#xA;Concurrent issues&#xA;&#xA;Next up came the problem that I wanted to only get credentials once, and share those credentials amongst the distributed workers. There were several hosts that each run multiple workers as separate processes. I wanted on each of those hosts, that one call got made by the worker process to get a nice token and share that token in memory with the rest. In comes SharedMemory by Python. I got it to finally work after fixing all my concurrent race condition failures, where there was no synchronised flag to make sure everybody waited on each other. &#xA;&#xA;After all that code, the rest of the devs were that is cool but we do not need it. Just call the login at each start of the flow, it will create credentials and if there are already credentials it will return them. So again rip out the code written so far and start anew.&#xA;&#xA;Finally on my way&#xA;&#xA;Started again with the new flow and now I got a nice test up and running. The data returned was a bit baffling and showed our Python code was not the bottleneck as previously thought, hoped for. It was our Nginx reverse proxy setup. Split out the nginx pods unto their own and updated the config to handle things a bit better and give more threads and workers basically. &#xA;&#xA;Okay after fixing the nginx pods, then ran the tests again and it turned out the Docker registry itself was a bottleneck. It just could not cope in terms of memory usage and freeing up stuff. We use Redis as our cache layer and Google Cloud Storage (GCS) as our bucket to actually store the data retrieved by Docker registry. &#xA;&#xA;Breathing room&#xA;&#xA;We had so much services jammed together in one pod it was crazy. Basically one pod ran the following services:&#xA;Nginx&#xA;Redis&#xA;Docker registry&#xA;Flask app&#xA;&#xA;Then there was no control of what pod ran what services, so it could be that one pod ran 2 nginx + redis + docker registry + flask, whilst another ran only docker registry + flask. So back to basics, get one service per pod and split off the docker registry unto it&#39;s own node. Now we have the following setup:&#xA;&#xA;Nodepool A:&#xA;   Three nodes&#xA;        running one pod each of Nginx&#xA;        running one pod each of Flask&#xA;        running one pod total of Redis&#xA;Nodepool B:&#xA;   Three nodes&#xA;        running one pod each of Docker Registry&#xA;&#xA;Now that that was cleared up, the next bottleneck seemed to be Redis? So I turned to Redis and it&#39;s config and found out we actually were not using the staging Redis but the production Redis ?!?!?!&#xA;&#xA;I quickly changed that config and made it so there was one node running a dedicated Redis. So the full situation becomes:&#xA;&#xA;Nodepool A:&#xA;   Three nodes&#xA;        running one pod each of Nginx&#xA;        running one pod each of Flask&#xA;Nodepool B:&#xA;   Three nodes&#xA;        running one pod each of Docker Registry&#xA;Nodepool C:&#xA;   One node&#xA;        running one pod total of Redis&#xA;&#xA;Okay, now can we finally move onward to find out that the Python code itself is so slow?&#xA;&#xA;gunicorn&#xA;&#xA;Well not so fast. Turns out that gunicorn was behaving badly and might do with some optimisation. gunicorn uses different worker classes and if we do not feed it the right ones with the right parameters it might actually be blocking. The reason I started looking down this rabbit hole was because of the gunicorn logs stating they ran out of workers. &#xA;&#xA;After much experimenting on what parameters work best, turns out the best one that worked for us was the following:&#xA;&#xA;CONCURRENCYSETTING=$(python3 -c &#39;import multiprocessing as mp; print(mp.cpucount() * 2)&#39;)&#xA;exec /usr/local/bin/gunicorn -n internalauthsecret -w${CONCURRENCYSETTING} -k gevent --worker-connections=1000 -b 0.0.0.0:8000 internalauthsecret:app -t 180&#xA;Meaning use the gevent type worker class, with 1000 worker connections. Also use a total amount of workers to twice the amount of cores available to us in whatever host we are running as. This also meant it is dynamic to the point where if we would ever upgrade the hardware of the node underlying the pod it will grow with it automatically without us having to make sure we also update the amount of workers. &#xA;&#xA;Conclusion&#xA;&#xA;After fixing all the infrastructure setup of correctly allocating memory and CPU to each of the services, coupled with separating them out to make sure each of them gets the appropriate amount needed. Making sure our nginx was configured correctly. Followed by actually configuring the services in staging correctly to point at services in staging rather than production, followed by configuring the gunicorn service and fine-tuning it, there was still a slight bottleneck. &#xA;&#xA;Yeey, finally Python code is slow and dumb and move on to Golang. Hold on, let us first see what is being the bottleneck. I made some call graphs using the following module https://github.com/daneads/pycallgraph2. It showed that the bottleneck was partly in our shared library code that handled authentication and also the way we were determining when to call that particular function. Finally the culprit has been located.&#xA;&#xA;To fix the shared library code was easy, just improved the for loops and small optimizations in terms of what to store so we do not do a constant looking up of the same values. Cache more in Redis, then also use a Redis connection pool rather than starting up a new connection every time for each query. &#xA;&#xA;To fix the problem of knowing when to call the function in the shared code was a literal one if else statement added to the previously declaring of the variable logic. It was a code fix of 44 characters that resulted in an improvement of the total time spent. The longest before this fix was 465ms on the shared library code path. After fixing both it was only around 60ms. So instead of the code being able to handle roughly 2 per second we could now handle roughly 15 per second per worker per worker_connection. &#xA;&#xA;After that roller-coaster of a ride, I made sure we could handle millions of requests coming in rather than just couple of hundred. The next optimisations lie in Network I/O and other factors. Even if we would move towards Golang implementation it might gain us 1ms max in terms of code maybe, that is even highly optimistic and probably not even realistic. The rest lies in the fact that we have a nginx going to a docker registry talking to another service running somewhere else again on the network that talks to Redis. Those round trip times are starting to add up.&#xA;&#xA;However that is for another time. Right now we got enough to make sure we can get through the next years of running our service. If we need more, just scale the entire setup to include more nodes, until the bottleneck is network throughput/bandwith. Then we will revisit this. &#xA;&#xA;#100DaysToOffload #DevOps #python ]]&gt;</description>
      <content:encoded><![CDATA[<p>I was put in charge to write some extra tests in our framework covering our Docker registry endpoints. We created a framework around <a href="https://locust.io/" rel="nofollow">Locust</a>.  Naturally, I first started learning the framework and it is pretty nice to use. You create simple classes that house the flow of the requests you want to execute and you call them one by one, stating what should be the success and what the failure.</p>

<p>The goal was to prove that our Python code was horrible and needed to be switched to Golang implementation ASAP.</p>

<h2 id="rough-start" id="rough-start">Rough start</h2>

<p>I could not even start our Docker container because some Werkzeug, Flask and Locust combo made it all not work anymore. So I first had to untangle that mess. It turned out that some older code of Flask used a specific call to a function that does not exist anymore at the provided location.</p>

<blockquote><p>For all who are interested, the actual error is: <code>cannot import name &#39;BaseResponse&#39; from &#39;werkzeug.wrappers&#39;</code>.</p></blockquote>

<p>After that initial rough start I started out by mapping out how Docker actually works. What happens when you do <code>docker pull</code> or <code>docker login</code> for example. Turns out they are all just HTTP calls to a REST API backend. That returns some data and with that data we continue onward to more calls until all the data has been gotten for <code>docker</code> to actually create the containers and/or images.</p>

<h2 id="docker-api" id="docker-api">Docker API</h2>

<p>I wrote the simple Python PoC code for the DockerAPI client. In principle I can use that code now to get any image I want, but I do not use that. So I included that whole code into our Locust framework to make sure the test was always set up correctly, and that subsequent images were deleted.</p>

<p>I ran into the second problem. Images cannot be removed from a Docker registry by default. You have to enable that feature. So when I started talking to our devs, they said just forget about it. Do the setup code once, so that the image exists that is needed in a shared test repository and continue onward.</p>

<p>So I scrapped the entire code out of Locust and began again anew.</p>

<h2 id="concurrent-issues" id="concurrent-issues">Concurrent issues</h2>

<p>Next up came the problem that I wanted to only get credentials once, and share those credentials amongst the distributed workers. There were several hosts that each run multiple workers as separate processes. I wanted on each of those hosts, that one call got made by the worker process to get a nice token and share that token in memory with the rest. In comes <a href="https://docs.python.org/3/library/multiprocessing.shared_memory.html" rel="nofollow">SharedMemory</a> by Python. I got it to finally work after fixing all my concurrent race condition failures, where there was no synchronised flag to make sure everybody waited on each other.</p>

<p>After all that code, the rest of the devs were that is cool but we do not need it. Just call the login at each start of the flow, it will create credentials and if there are already credentials it will return them. So again rip out the code written so far and start anew.</p>

<h2 id="finally-on-my-way" id="finally-on-my-way">Finally on my way</h2>

<p>Started again with the new flow and now I got a nice test up and running. The data returned was a bit baffling and showed our Python code was not the bottleneck as previously thought, hoped for. It was our Nginx reverse proxy setup. Split out the nginx pods unto their own and updated the config to handle things a bit better and give more threads and workers basically.</p>

<p>Okay after fixing the nginx pods, then ran the tests again and it turned out the Docker registry itself was a bottleneck. It just could not cope in terms of memory usage and freeing up stuff. We use Redis as our cache layer and Google Cloud Storage (GCS) as our bucket to actually store the data retrieved by Docker registry.</p>

<h2 id="breathing-room" id="breathing-room">Breathing room</h2>

<p>We had so much services jammed together in one pod it was crazy. Basically one pod ran the following services:
– Nginx
– Redis
– Docker registry
– Flask app</p>

<p>Then there was no control of what pod ran what services, so it could be that one pod ran 2 nginx + redis + docker registry + flask, whilst another ran only docker registry + flask. So back to basics, get one service per pod and split off the docker registry unto it&#39;s own node. Now we have the following setup:</p>
<ul><li>Nodepool A:
<ul><li>Three nodes
<ul><li>running one pod each of Nginx</li>
<li>running one pod each of Flask</li>
<li>running one pod total of Redis</li></ul></li></ul></li>
<li>Nodepool B:
<ul><li>Three nodes
<ul><li>running one pod each of Docker Registry</li></ul></li></ul></li></ul>

<p>Now that that was cleared up, the next bottleneck seemed to be Redis? So I turned to Redis and it&#39;s config and found out we actually were not using the staging Redis but the <strong>production</strong> Redis ?!?!?!</p>

<p>I quickly changed that config and made it so there was one node running a dedicated Redis. So the full situation becomes:</p>
<ul><li>Nodepool A:
<ul><li>Three nodes
<ul><li>running one pod each of Nginx</li>
<li>running one pod each of Flask</li></ul></li></ul></li>
<li>Nodepool B:
<ul><li>Three nodes
<ul><li>running one pod each of Docker Registry</li></ul></li></ul></li>
<li>Nodepool C:
<ul><li>One node
<ul><li>running one pod total of Redis</li></ul></li></ul></li></ul>

<p>Okay, now can we finally move onward to find out that the Python code itself is so slow?</p>

<h2 id="gunicorn" id="gunicorn">gunicorn</h2>

<p>Well not so fast. Turns out that <code>gunicorn</code> was behaving badly and might do with some optimisation. <code>gunicorn</code> uses different worker classes and if we do not feed it the right ones with the right parameters it might actually be blocking. The reason I started looking down this rabbit hole was because of the <code>gunicorn</code> logs stating they ran out of workers.</p>

<p>After much experimenting on what parameters work best, turns out the best one that worked for us was the following:</p>

<pre><code class="language-bash">CONCURRENCY_SETTING=$(python3 -c &#39;import multiprocessing as mp; print(mp.cpu_count() * 2)&#39;)
exec /usr/local/bin/gunicorn -n internal_auth_secret -w${CONCURRENCY_SETTING} -k gevent --worker-connections=1000 -b 0.0.0.0:8000 internal_auth_secret:app -t 180
</code></pre>

<p>Meaning use the <code>gevent</code> type worker class, with 1000 worker connections. Also use a total amount of workers to twice the amount of cores available to us in whatever host we are running as. This also meant it is dynamic to the point where if we would ever upgrade the hardware of the node underlying the pod it will grow with it automatically without us having to make sure we also update the amount of workers.</p>

<h2 id="conclusion" id="conclusion">Conclusion</h2>

<p>After fixing all the infrastructure setup of correctly allocating memory and CPU to each of the services, coupled with separating them out to make sure each of them gets the appropriate amount needed. Making sure our nginx was configured correctly. Followed by actually configuring the services in staging correctly to point at services in staging rather than production, followed by configuring the <code>gunicorn</code> service and fine-tuning it, there was still a slight bottleneck.</p>

<p>Yeey, finally Python code is slow and dumb and move on to Golang. Hold on, let us first see what is being the bottleneck. I made some call graphs using the following module <a href="https://github.com/daneads/pycallgraph2" rel="nofollow">https://github.com/daneads/pycallgraph2</a>. It showed that the bottleneck was partly in our shared library code that handled authentication and also the way we were determining when to call that particular function. Finally the culprit has been located.</p>

<p>To fix the shared library code was easy, just improved the for loops and small optimizations in terms of what to store so we do not do a constant looking up of the same values. Cache more in Redis, then also use a Redis connection pool rather than starting up a new connection every time for each query.</p>

<p>To fix the problem of knowing when to call the function in the shared code was a literal one if else statement added to the previously declaring of the variable logic. It was a code fix of 44 characters that resulted in an improvement of the total time spent. The longest before this fix was 465ms on the shared library code path. After fixing both it was only around 60ms. So instead of the code being able to handle roughly 2 per second we could now handle roughly 15 per second per worker per worker_connection.</p>

<p>After that roller-coaster of a ride, I made sure we could handle millions of requests coming in rather than just couple of hundred. The next optimisations lie in Network I/O and other factors. Even if we would move towards Golang implementation it might gain us 1ms max in terms of code maybe, that is even highly optimistic and probably not even realistic. The rest lies in the fact that we have a nginx going to a docker registry talking to another service running somewhere else again on the network that talks to Redis. Those round trip times are starting to add up.</p>

<p>However that is for another time. Right now we got enough to make sure we can get through the next years of running our service. If we need more, just scale the entire setup to include more nodes, until the bottleneck is network throughput/bandwith. Then we will revisit this.</p>

<p><a href="https://stealthycoder.writeas.com/tag:100DaysToOffload" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">100DaysToOffload</span></a> <a href="https://stealthycoder.writeas.com/tag:DevOps" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">DevOps</span></a> <a href="https://stealthycoder.writeas.com/tag:python" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">python</span></a></p>
]]></content:encoded>
      <guid>https://stealthycoder.writeas.com/what-a-fantastic-ride</guid>
      <pubDate>Mon, 02 Jan 2023 21:07:24 +0000</pubDate>
    </item>
    <item>
      <title>Powerful concept: FunctionMap</title>
      <link>https://stealthycoder.writeas.com/powerful-concept-functionmap?pk_campaign=rss-feed</link>
      <description>&lt;![CDATA[This idea comes from JavaScript when there were no classes yet as there are now in ES6. You would write something like this: !--more--&#xA;&#xA;var greetings = {&#xA; hello: function() { console.info(&#34;Hello&#34;); }&#xA;};&#xA;greetings.hello();&#xA;greetings&#34;hello&#34;;&#xA;This is how functions could be shipped around. So the idea is simple, especially the second one. I started to wonder if we can have a Map of String -  Function in other languages like Java, Python, PHP and more?&#xA;&#xA;Java&#xA;&#xA;We sure can. I was working on a JSON-RPC project where you don&#39;t send entity bodies but you send what method to invoke and with what arguments. In the backend we made it so internally a consistent object was shipped around (RPCServiceResult) and we had Enum for the methods that were allowed. So I set up an EnumMap of Enum -  Function. This made it so the entrypoint of each service was the same. See if the method key exists, if so execute the corresponding function with the given input otherwise throw a MethodNotFound error. This meant we could extend the Enum and EnumMap and the Function mapping but never had to change how the services called each other. &#xA;&#xA;Python&#xA;&#xA;In Python I was working on a silly hacking simulation game in the Console. I wanted to have a list of commands and execute the corresponding function that was that command in game terms. At this point I just came from PHP and there it is completely normal to execute the string itself. I will come back to this in a bit. This is more difficult to do in Python and so I came up with a dict where command name was the key and function references were the values. This made it super simple to maintain. &#xA;&#xA;PHP&#xA;&#xA;So in PHP this is normal:&#xA;function a(){ echo &#34;A&#34;; };&#xA;$a = &#34;a&#34;;&#xA;$a();&#xA;However it is really difficult to maintain and see what is going on, let alone insecure to do this if the input to $a is user input. The following makes more sense and is easier to maintain.&#xA;&#xA;function a() { echo &#34;A&#34;; };&#xA;$commands = [&#34;a&#34; =  function() { calluserfunc(&#34;a&#34;); }];&#xA;if ( arraykeyexists(&#34;a&#34;, $commands) ) { $commands&#34;a&#34;; }&#xA;&#xA;Granted this is a contrived example but I hope it gets the point across. &#xA;&#xA;Scala&#xA;&#xA;I was working on the Advent of Code 2020 and one of the puzzles had a lot of validation rules. Immediately the structure came to mind of having a Map of Functions, for each field to validate the corresponding function. This worked like a charm and it is very easy to see what each rule does. &#xA;&#xA;Java&#xA;&#xA;Another example was that in a particular flow there was a need to either create a User of type A or type B and then continue the flow. It was first like the following:&#xA;if ( type == &#34;UserA&#34; ) {&#xA;    return new UserA();&#xA;} else {&#xA;    return new UserB();&#xA;}&#xA;Now never mind the bug of always returning type B, I suggested the following:&#xA;MapString, Supplier&lt;User  userSupplier = new HashMap(){{ put(&#34;UserA&#34;, UserA::new); put(&#34;UserB&#34;, UserB::new); }};&#xA;Now you could replace it with one entrypoint:&#xA;return userSupplier.getOrDefault(type, UserB::new).get();&#xA;You still have the bug, but it became a feature. So we leave it in.&#xA;The colleague immediately said, this happens in way more places in the code and I can change all of them. &#xA;&#xA;The nice thing again is, we can extend the map to support more types but it does not change the logic of calling the service.&#xA;&#xA;Conclusion&#xA;&#xA;I hope this concept proves useful to you, I find it quite powerful and easy to construct and it gives lots of nice patterns. &#xA;&#xA;#code #javascript #java #python]]&gt;</description>
      <content:encoded><![CDATA[<p>This idea comes from JavaScript when there were no classes yet as there are now in ES6. You would write something like this: </p>

<pre><code class="language-javascript">var greetings = {
 hello: function() { console.info(&#34;Hello&#34;); }
};
greetings.hello();
greetings[&#34;hello&#34;]();
</code></pre>

<p>This is how functions could be shipped around. So the idea is simple, especially the second one. I started to wonder if we can have a <code>Map</code> of <code>String -&gt; Function</code> in other languages like Java, Python, PHP and more?</p>

<h1 id="java" id="java">Java</h1>

<p>We sure can. I was working on a JSON-RPC project where you don&#39;t send entity bodies but you send what method to invoke and with what arguments. In the backend we made it so internally a consistent object was shipped around (<code>RPCServiceResult</code>) and we had <code>Enum</code> for the methods that were allowed. So I set up an <code>EnumMap</code> of <code>Enum -&gt; Function</code>. This made it so the entrypoint of each service was the same. See if the method key exists, if so execute the corresponding function with the given input otherwise throw a <code>MethodNotFound</code> error. This meant we could extend the Enum and EnumMap and the Function mapping but never had to change how the services called each other.</p>

<h1 id="python" id="python">Python</h1>

<p>In Python I was working on a silly hacking simulation game in the Console. I wanted to have a list of commands and execute the corresponding function that was that command in game terms. At this point I just came from PHP and there it is completely normal to execute the string itself. I will come back to this in a bit. This is more difficult to do in Python and so I came up with a dict where command name was the key and function references were the values. This made it super simple to maintain.</p>

<h1 id="php" id="php">PHP</h1>

<p>So in PHP this is normal:</p>

<pre><code class="language-php">function a(){ echo &#34;A&#34;; };
$a = &#34;a&#34;;
$a();
</code></pre>

<p>However it is really difficult to maintain and see what is going on, let alone insecure to do this if the input to <code>$a</code> is user input. The following makes more sense and is easier to maintain.</p>

<pre><code class="language-php">function a() { echo &#34;A&#34;; };
$commands = [&#34;a&#34; =&gt; function() { call_user_func(&#34;a&#34;); }];
if ( array_key_exists(&#34;a&#34;, $commands) ) { $commands[&#34;a&#34;](); }
</code></pre>

<p>Granted this is a contrived example but I hope it gets the point across.</p>

<h1 id="scala" id="scala">Scala</h1>

<p>I was working on the Advent of Code 2020 and one of the puzzles had a lot of validation rules. Immediately the structure came to mind of having a Map of Functions, for each field to validate the corresponding function. This worked like a charm and it is very easy to see what each rule does.</p>

<h1 id="java-1" id="java-1">Java</h1>

<p>Another example was that in a particular flow there was a need to either create a User of type A or type B and then continue the flow. It was first like the following:</p>

<pre><code class="language-java">if ( type == &#34;UserA&#34; ) {
    return new UserA();
} else {
    return new UserB();
}
</code></pre>

<p>Now never mind the bug of always returning type B, I suggested the following:</p>

<pre><code class="language-java">Map&lt;String, Supplier&lt;User&gt;&gt; userSupplier = new HashMap&lt;&gt;(){{ put(&#34;UserA&#34;, UserA::new); put(&#34;UserB&#34;, UserB::new); }};
</code></pre>

<p>Now you could replace it with one entrypoint:</p>

<pre><code class="language-java">return userSupplier.getOrDefault(type, UserB::new).get();
</code></pre>

<p>You still have the bug, but it became a feature. So we leave it in.
The colleague immediately said, this happens in way more places in the code and I can change all of them.</p>

<p>The nice thing again is, we can extend the map to support more types but it does not change the logic of calling the service.</p>

<h1 id="conclusion" id="conclusion">Conclusion</h1>

<p>I hope this concept proves useful to you, I find it quite powerful and easy to construct and it gives lots of nice patterns.</p>

<p><a href="https://stealthycoder.writeas.com/tag:code" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">code</span></a> <a href="https://stealthycoder.writeas.com/tag:javascript" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">javascript</span></a> <a href="https://stealthycoder.writeas.com/tag:java" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">java</span></a> <a href="https://stealthycoder.writeas.com/tag:python" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">python</span></a></p>
]]></content:encoded>
      <guid>https://stealthycoder.writeas.com/powerful-concept-functionmap</guid>
      <pubDate>Mon, 07 Dec 2020 23:10:38 +0000</pubDate>
    </item>
    <item>
      <title>Decoration patterns</title>
      <link>https://stealthycoder.writeas.com/decoration-patterns?pk_campaign=rss-feed</link>
      <description>&lt;![CDATA[So this post touches on some Python concepts in async/await territory. I will not cover event loops nor how to interact with them, but something I uncovered/unearthed in a quest to make something work that was synchronous only. I will use the word async which itself is a shortening of the word asynchronous. !--more--&#xA;&#xA;First things first, the async structure in essence means a form of cooperative multi-threading. This is in contrast with preemptive multi-threading. The latter means there will be scheduled allotted CPU cycles where for example a function needs 3 cycles, but gets 2 at a time over the course of all work that needs to be done can actually take 6 cycles. &#xA;&#xA;In cooperative model the function gets to take as many cycles with regards to the CPU scheduler though and then hand back control after 3 and hopefully that means in this case the cycles are used in a more direct and efficient way making the program smoother and more efficient.&#xA;&#xA;  As a side note, async should only be used as you expect to be I/O bound and also it should be used completely throughout the application and third party libraries. &#xA;&#xA;So this also means it will be running in the background whatever this function is you wanted to make async, and therefore the keyword await was introduced stating you should await the function for when it is done. &#xA;&#xA;Getting started&#xA;&#xA;First something simple.&#xA;&#xA;def f():&#xA;    pass&#xA;A simple function definition. If we wanted an async function we would do the following:&#xA;&#xA;async def a():&#xA;    pass&#xA;So far only the only difference, other than the name, is the keyword async. So what happens when you print these defined symbols.&#xA;&#xA;print(f) # function f at 0x7fa7b50af1f0&#xA;print(a) # function a at 0x7fa7b4fb9ca0&#xA;So nothing apparently. They are both just function definitions. What happens when you print the result of executing the functions?&#xA;&#xA;print(f()) # None&#xA;print(a()) # coroutine object a at 0x7fa7b5037940&#xA;You will also get a RuntimeWarning of a coroutine not being awaited. We leave that for what it is right now. So interestingly we see they are both functions before and after the execution the one has a result, the other a coroutine object. So we want to see if there is a way to determine if the function is a coroutine aforehand. &#xA;&#xA;Let us write something that does that:&#xA;import inspect&#xA;&#xA;print(inspect.iscoroutine(a)) # False&#xA;print(inspect.isawaitable(a)) # False&#xA;&#xA;print(inspect.iscoroutine(a())) # True&#xA;print(inspect.isawaitable(a())) # True&#xA;Hang on, they both state False when operating on the definition. However our RuntimeWarning said we needed to await. When we execute the function we do get the information, but that is still after the fact. &#xA;&#xA;We first have to execute a function in order to find out we need to await.  There might be some people out there right now that go well you used a wrong method. There will be two solutions at the end. &#xA;&#xA;Decorators&#xA;&#xA;In Python there exists the following syntactic sugar:&#xA;&#xA;def decorator(func):&#xA;    def wrapper():&#xA;       func()&#xA;    return wrapper&#xA;&#xA;@decorator&#xA;async def a():&#xA;     pass&#xA;&#xA;a()&#xA;&#xA;This is the same as doing a = decorator(a) and then calling a() will actually execute the wrapper, as a will be now equal to wrapper. &#xA;&#xA;First problem we run into in this example is we gave an async function to a non-async function. You can only await in async functions. That is easily solved:&#xA;&#xA;def decorator(func):&#xA;    async def wrapper():&#xA;       await func()&#xA;    return wrapper&#xA;&#xA;Now the wrapper is also async and we await the function inside. However this decorator might be used for async and non-async functions alike. We still need to determine accurately whether or not a function is async aforehand. &#xA;&#xA;Internals&#xA;&#xA;Looking at the internals of Python there exists the code property on functions. Inside that property there is a coflags property. That will actually hold a bitmap value of what flags the function itself holds. You can get at this information in the following way:&#xA;&#xA;from dis import prettyflags&#xA;&#xA;def f():&#xA;    pass&#xA;&#xA;async def a():&#xA;    pass&#xA;&#xA;print(prettyflags(f.code.coflags)) # OPTIMIZED, NEWLOCALS, NOFREE&#xA;print(prettyflags(a.code.coflags)) # OPTIMIZED, NEWLOCALS, NOFREE, COROUTINE&#xA;&#xA;Aha, we see now that we can determine if a function is a coroutine or not.  This means we can make our decorator correctly now:&#xA;&#xA;from dis import prettyflags&#xA;&#xA;def decorator(func):&#xA;    def wrapper():&#xA;        func()&#xA;    async def asyncwrapper():&#xA;        await func()&#xA;    &#xA;    if &#34;COROUTINE&#34; in prettyflags(func.code.coflags):&#xA;        return asyncwrapper&#xA;    &#xA;    return wrapper&#xA;&#xA;Final Solution&#xA;&#xA;I already mentioned there are two solution. I first wanted to get this internal solution out the way as that is the one I used first. Then I ran into something that made more sense. So the before mentioned decorator can also be written thusly:&#xA;&#xA;import inspect&#xA;&#xA;def decorator(func):&#xA;    def wrapper():&#xA;        func()&#xA;    async def asyncwrapper():&#xA;        await func()&#xA;    &#xA;    if inspect.iscoroutinefunction(func):&#xA;        return asyncwrapper&#xA;    &#xA;    return wrapper&#xA;&#xA;The importance of using the correct method is abundantly clear in this case. Our definition is a coroutine function, not yet a coroutine and therefor you also cannot await a definition. Only an instance of the executed async function.&#xA;&#xA;Hope this helps out a bit in the future of your async python adventure.&#xA;&#xA;As a final final example the decorator should probably look like this:&#xA;&#xA;import inspect&#xA;&#xA;def decorator(func):&#xA;    def wrapper(args, kwargs):&#xA;        func(args, *kwargs)&#xA;    async def asyncwrapper(args, *kwargs):&#xA;        await func(args, **kwargs)&#xA;    &#xA;    if inspect.iscoroutinefunction(func):&#xA;        return async_wrapper&#xA;    &#xA;    return wrapper&#xA;&#xA;In order to propagate any and all arguments given to the function you are decorating. &#xA;&#xA;#code #python]]&gt;</description>
      <content:encoded><![CDATA[<p>So this post touches on some Python concepts in async/await territory. I will not cover event loops nor how to interact with them, but something I uncovered/unearthed in a quest to make something work that was synchronous only. I will use the word <code>async</code> which itself is a shortening of the word asynchronous. </p>

<p>First things first, the async structure in essence means a form of cooperative multi-threading. This is in contrast with preemptive multi-threading. The latter means there will be scheduled allotted CPU cycles where for example a function needs 3 cycles, but gets 2 at a time over the course of all work that needs to be done can actually take 6 cycles.</p>

<p>In cooperative model the function gets to take as many cycles with regards to the CPU scheduler though and then hand back control after 3 and hopefully that means in this case the cycles are used in a more direct and efficient way making the program smoother and more efficient.</p>

<blockquote><p>As a side note, async should only be used as you expect to be I/O bound and also it should be used completely throughout the application and third party libraries.</p></blockquote>

<p>So this also means it will be running in the background whatever this function is you wanted to make async, and therefore the keyword <code>await</code> was introduced stating you should await the function for when it is done.</p>

<h1 id="getting-started" id="getting-started">Getting started</h1>

<p>First something simple.</p>

<pre><code class="language-python">
def f():
    pass
</code></pre>

<p>A simple function definition. If we wanted an async function we would do the following:</p>

<pre><code class="language-python">
async def a():
    pass
</code></pre>

<p>So far only the only difference, other than the name, is the keyword async. So what happens when you print these defined symbols.</p>

<pre><code class="language-python">print(f) # &lt;function f at 0x7fa7b50af1f0&gt;
print(a) # &lt;function a at 0x7fa7b4fb9ca0&gt;
</code></pre>

<p>So nothing apparently. They are both just function definitions. What happens when you print the result of executing the functions?</p>

<pre><code class="language-python">print(f()) # None
print(a()) # &lt;coroutine object a at 0x7fa7b5037940&gt;
</code></pre>

<p>You will also get a RuntimeWarning of a coroutine not being awaited. We leave that for what it is right now. So interestingly we see they are both functions before and after the execution the one has a result, the other a coroutine object. So we want to see if there is a way to determine if the function is a coroutine aforehand.</p>

<p>Let us write something that does that:</p>

<pre><code class="language-python">import inspect

print(inspect.iscoroutine(a)) # False
print(inspect.isawaitable(a)) # False

print(inspect.iscoroutine(a())) # True
print(inspect.isawaitable(a())) # True
</code></pre>

<p>Hang on, they both state <code>False</code> when operating on the definition. However our RuntimeWarning said we needed to <code>await</code>. When we execute the function we do get the information, but that is still after the fact.</p>

<p>We first have to execute a function in order to find out we need to <code>await</code>.  There might be some people out there right now that go well you used a wrong method. There will be two solutions at the end.</p>

<h1 id="decorators" id="decorators">Decorators</h1>

<p>In Python there exists the following syntactic sugar:</p>

<pre><code class="language-python">
def decorator(func):
    def wrapper():
       func()
    return wrapper

@decorator
async def a():
     pass

a()
</code></pre>

<p>This is the same as doing <code>a = decorator(a)</code> and then calling <code>a()</code> will actually execute the wrapper, as <code>a</code> will be now equal to <code>wrapper</code>.</p>

<p>First problem we run into in this example is we gave an async function to a non-async function. You can only <code>await</code> in async functions. That is easily solved:</p>

<pre><code class="language-python">
def decorator(func):
    async def wrapper():
       await func()
    return wrapper
</code></pre>

<p>Now the wrapper is also async and we await the function inside. However this decorator might be used for async and non-async functions alike. We still need to determine accurately whether or not a function is async aforehand.</p>

<h2 id="internals" id="internals">Internals</h2>

<p>Looking at the internals of Python there exists the <code>__code__</code> property on functions. Inside that property there is a <code>co_flags</code> property. That will actually hold a bitmap value of what flags the function itself holds. You can get at this information in the following way:</p>

<pre><code class="language-python">
from dis import pretty_flags

def f():
    pass

async def a():
    pass

print(pretty_flags(f.__code__.co_flags)) # OPTIMIZED, NEWLOCALS, NOFREE
print(pretty_flags(a.__code__.co_flags)) # OPTIMIZED, NEWLOCALS, NOFREE, COROUTINE
</code></pre>

<p>Aha, we see now that we can determine if a function is a coroutine or not.  This means we can make our decorator correctly now:</p>

<pre><code class="language-python">
from dis import pretty_flags


def decorator(func):
    def wrapper():
        func()
    async def async_wrapper():
        await func()
    
    if &#34;COROUTINE&#34; in pretty_flags(func.__code__.co_flags):
        return async_wrapper
    
    return wrapper
</code></pre>

<h1 id="final-solution" id="final-solution">Final Solution</h1>

<p>I already mentioned there are two solution. I first wanted to get this internal solution out the way as that is the one I used first. Then I ran into something that made more sense. So the before mentioned decorator can also be written thusly:</p>

<pre><code class="language-python">
import inspect


def decorator(func):
    def wrapper():
        func()
    async def async_wrapper():
        await func()
    
    if inspect.iscoroutinefunction(func):
        return async_wrapper
    
    return wrapper
</code></pre>

<p>The importance of using the correct method is abundantly clear in this case. Our definition is a coroutine function, not yet a coroutine and therefor you also cannot await a definition. Only an instance of the executed async function.</p>

<p>Hope this helps out a bit in the future of your async python adventure.</p>

<p>As a final final example the decorator should probably look like this:</p>

<pre><code class="language-python">
import inspect


def decorator(func):
    def wrapper(*args, **kwargs):
        func(*args, **kwargs)
    async def async_wrapper(*args, **kwargs):
        await func(*args, **kwargs)
    
    if inspect.iscoroutinefunction(func):
        return async_wrapper
    
    return wrapper
</code></pre>

<p>In order to propagate any and all arguments given to the function you are decorating.</p>

<p><a href="https://stealthycoder.writeas.com/tag:code" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">code</span></a> <a href="https://stealthycoder.writeas.com/tag:python" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">python</span></a></p>
]]></content:encoded>
      <guid>https://stealthycoder.writeas.com/decoration-patterns</guid>
      <pubDate>Tue, 15 Sep 2020 20:48:59 +0000</pubDate>
    </item>
    <item>
      <title>How to transport a snake?</title>
      <link>https://stealthycoder.writeas.com/how-to-transport-a-snake?pk_campaign=rss-feed</link>
      <description>&lt;![CDATA[So we all know planes are not a good idea. Before you know it you have Samuel L. Jackson shouting in your ear.&#xA;&#xA;So maybe we need a few new ideas. !--more--&#xA;&#xA;Point of view&#xA;&#xA;As most quasi intellectuals who want to sound philosophical like to say; It all depends on your point of view. However, in this case I want to make for how to ship Python applications, I want to make sure you embody the different individuals who partake in the process of shipping it or deploying it if you want to be technical. One is the developer who writes the source code, the other is the DevOps engineer tasked with deploying it wherever it is needed. &#xA;&#xA;Ecosystem&#xA;&#xA;The ecosystem for Python applications is great if you are the developer that just has to write the source code to make it work. You activate a virtual environment using any tool that has  your preference. You install the necessary libraries outside of the standard library that you need. You write the code until it all works. Job done.&#xA;&#xA;The ecosystem for deploying it sucks. Either you use the same flow as the developer but that is not the context and  environment of the DevOps engineer. You really wanted to use the packages from the package managers of the Operating System you deploying it to. You cannot however because the same versions do not sync up. So you have to choose.&#xA;&#xA;Either Or&#xA;&#xA;Either you use the developer tools and flows for the duration of deploying and your Continuous Integration process. Or you use the system packages to pre-built docker images where you run your code and development process. It is either or. There is a big discrepancy between the developer context and the deployment context, or if you will the end user. &#xA;&#xA;Let us assume you made something and you want other people to run it too. How do you get your Python application to the end user? You might get away with making a self running archive. Though one thing needs to be there and that is the C libraries on the host machine. So you cannot get away with prepping the host machine environment. &#xA;&#xA;The one solution is to statically compile everything and not dynamically link it. This means a lot of management done by developers or DevOps engineers themselves. It would solve the problem though that you can ship your archive and anyone else can just run it. Not a feasible thing to accomplish as it introduces a lot of unneeded complexity. &#xA;&#xA;Two layers &#xA;&#xA;There also exists the possibility of having the self running archive and then also putting it inside a docker image. These two layers can have two independent running pipelines. One to produce the archive and one that produces the image. Now you can update the runtime without changing the source code and run the risk of introducing anomalies. You take the archive out and you update the Python runtime and reinstate the archive and run it. &#xA;&#xA;It gives you more fine grained control if that is needed. &#xA;&#xA;#devops #python]]&gt;</description>
      <content:encoded><![CDATA[<p>So we all know <a href="https://www.imdb.com/title/tt0417148/" rel="nofollow">planes</a> are not a good idea. Before you know it you have Samuel L. Jackson shouting in your ear.</p>

<p>So maybe we need a few new ideas. </p>

<h2 id="point-of-view" id="point-of-view">Point of view</h2>

<p>As most quasi intellectuals who want to sound philosophical like to say; It all depends on your point of view. However, in this case I want to make for how to ship Python applications, I want to make sure you embody the different individuals who partake in the process of shipping it or deploying it if you want to be technical. One is the developer who writes the source code, the other is the DevOps engineer tasked with deploying it wherever it is needed.</p>

<h2 id="ecosystem" id="ecosystem">Ecosystem</h2>

<p>The ecosystem for Python applications is great if you are the developer that just has to write the source code to make it work. You activate a virtual environment using any tool that has  your preference. You install the necessary libraries outside of the standard library that you need. You write the code until it all works. Job done.</p>

<p>The ecosystem for deploying it sucks. Either you use the same flow as the developer but that is not the context and  environment of the DevOps engineer. You really wanted to use the packages from the package managers of the Operating System you deploying it to. You cannot however because the same versions do not sync up. So you have to choose.</p>

<h2 id="either-or" id="either-or">Either Or</h2>

<p>Either you use the developer tools and flows for the duration of deploying and your Continuous Integration process. Or you use the system packages to pre-built docker images where you run your code and development process. It is either or. There is a big discrepancy between the developer context and the deployment context, or if you will the end user.</p>

<p>Let us assume you made something and you want other people to run it too. How do you get your Python application to the end user? You might get away with making a <a href="https://www.python.org/dev/peps/pep-0441/" rel="nofollow">self running archive</a>. Though one thing needs to be there and that is the C libraries on the host machine. So you cannot get away with prepping the host machine environment.</p>

<p>The one solution is to statically compile everything and not dynamically link it. This means a lot of management done by developers or DevOps engineers themselves. It would solve the problem though that you can ship your archive and anyone else can just run it. Not a feasible thing to accomplish as it introduces a lot of unneeded complexity.</p>

<h2 id="two-layers" id="two-layers">Two layers</h2>

<p>There also exists the possibility of having the self running archive and then also putting it inside a docker image. These two layers can have two independent running pipelines. One to produce the archive and one that produces the image. Now you can update the runtime without changing the source code and run the risk of introducing anomalies. You take the archive out and you update the Python runtime and reinstate the archive and run it.</p>

<p>It gives you more fine grained control if that is needed.</p>

<p><a href="https://stealthycoder.writeas.com/tag:devops" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">devops</span></a> <a href="https://stealthycoder.writeas.com/tag:python" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">python</span></a></p>
]]></content:encoded>
      <guid>https://stealthycoder.writeas.com/how-to-transport-a-snake</guid>
      <pubDate>Mon, 24 Feb 2020 10:28:33 +0000</pubDate>
    </item>
    <item>
      <title>Optimizing code - Part 5</title>
      <link>https://stealthycoder.writeas.com/optimizing-code-part-5?pk_campaign=rss-feed</link>
      <description>&lt;![CDATA[So for the smart people out there they might have already figured out that the result for 153 is the same as for 135,315,351,513 and 531. This means that we can calculate the result for all of those once and just check if the result of that calculation is in that list. Which is the case for the number 153.  !--more--&#xA;&#xA;Online Encyclopedia Integer Sequences&#xA;&#xA;There exists an online database of integer sequences. It has a lot of cool sequences and one of them is the one with all Narcissistic Numbers. It has number A005188 and you can find it here.&#xA;&#xA;This next code is inspired and based on the code in the OEIS. &#xA;&#xA;Code&#xA;&#xA;from itertools import combinationswithreplacement&#xA;from datetime import datetime&#xA;import sys&#xA;&#xA;POWERS = [0,1,2,3,4,5,6,7,8,9]&#xA;NARCISSISTICNUMBERS = []&#xA;&#xA;def recalculatepowers():&#xA;    for i,v in enumerate(POWERS):&#xA;        POWERS[i] = i * v&#xA;&#xA;def equals(number: int, target: tuple) -  bool:&#xA;    digits = []&#xA;    while number   0:&#xA;        , number = digits.append(number % 10), number // 10&#xA;    return tuple(sorted(digits)) == target&#xA;&#xA;def findnarcissisticnumbers(desired: int) -  None:&#xA;    global POWERS, NARCISSISTICNUMBERS&#xA;    for k in range(1, sys.maxsize):&#xA;&#xA;        for b in combinationswithreplacement(range(10), k):&#xA;&#xA;            x = sum(map(lambda y:POWERS[y], b))&#xA;            if x   0 and equals(x, b):&#xA;                NARCISSISTICNUMBERS.append(x)&#xA;                if len(NARCISSISTICNUMBERS) == desired:&#xA;                    return&#xA;        recalculatepowers()&#xA;&#xA;start = datetime.utcnow()&#xA;findnarcissisticnumbers(28)&#xA;print(datetime.utcnow() - start)&#xA;&#xA;The combinations\with\_replacement gets numbers 0-9 and how many times to do it. So for example when k = 2 you get:&#xA;&#xA;[0,0]&#xA;[0,1]&#xA;...&#xA;[1,1]&#xA;...&#xA;[2,2]&#xA;...&#xA;[9,9]&#xA;&#xA;And as you can see you lose one number every time you go up in tens. As 0,1 is equal to 1,0 and 1,2 is equal to 2,1. There are considerable amount of fewer computations to check. &#xA;&#xA;So this code takes this much time to run on my laptop: 0:00:00.204806&#xA;&#xA;That is an insane speedup in time compared to the naive implementation and the optimized implementation. &#xA;&#xA;Caveat&#xA;&#xA;Small thing I found out whilst running this code. I thought it was enough to do sorted((2,1)) but this returns a list!!. So then I needed to wrap that with another tuple or the other one with list and I did not want to do that. So therefore I went with the initial list and turn that into a tuple. &#xA;&#xA;#code #python]]&gt;</description>
      <content:encoded><![CDATA[<p>So for the smart people out there they might have already figured out that the result for <strong>153</strong> is the same as for <strong>135,315,351,513 and 531</strong>. This means that we can calculate the result for all of those once and just check if the result of that calculation is in that list. Which is the case for the number <strong>153</strong>.  </p>

<h2 id="online-encyclopedia-integer-sequences" id="online-encyclopedia-integer-sequences">Online Encyclopedia Integer Sequences</h2>

<p>There exists an online database of integer sequences. It has a lot of cool sequences and one of them is the one with all Narcissistic Numbers. It has number A005188 and you can find it <a href="https://oeis.org/A005188" rel="nofollow">here</a>.</p>

<p>This next code is inspired and based on the code in the OEIS.</p>

<h2 id="code" id="code">Code</h2>

<pre><code class="language-python">from itertools import combinations_with_replacement
from datetime import datetime
import sys

POWERS = [0,1,2,3,4,5,6,7,8,9]
NARCISSISTIC_NUMBERS = []

def recalculate_powers():
    for i,v in enumerate(POWERS):
        POWERS[i] = i * v

def equals(number: int, target: tuple) -&gt; bool:
    digits = []
    while number &gt; 0:
        _, number = digits.append(number % 10), number // 10
    return tuple(sorted(digits)) == target



def find_narcissistic_numbers(desired: int) -&gt; None:
    global POWERS, NARCISSISTIC_NUMBERS
    for k in range(1, sys.maxsize):

        for b in combinations_with_replacement(range(10), k):

            x = sum(map(lambda y:POWERS[y], b))
            if x &gt; 0 and equals(x, b):
                NARCISSISTIC_NUMBERS.append(x)
                if len(NARCISSISTIC_NUMBERS) == desired:
                    return
        recalculate_powers()

start = datetime.utcnow()
find_narcissistic_numbers(28)
print(datetime.utcnow() - start)
</code></pre>

<p>The combinations_with_replacement gets numbers 0-9 and how many times to do it. So for example when k = 2 you get:</p>

<p>[0,0]
[0,1]
...
[1,1]
...
[2,2]
...
[9,9]</p>

<p>And as you can see you lose one number every time you go up in tens. As 0,1 is equal to 1,0 and 1,2 is equal to 2,1. There are considerable amount of fewer computations to check.</p>

<p>So this code takes this much time to run on my laptop: <strong>0:00:00.204806</strong></p>

<p>That is an insane speedup in time compared to the naive implementation and the optimized implementation.</p>

<h4 id="caveat" id="caveat">Caveat</h4>

<p>Small thing I found out whilst running this code. I thought it was enough to do <strong>sorted((2,1))</strong> but this returns a <strong>list!!</strong>. So then I needed to wrap that with another <strong>tuple</strong> or the other one with <strong>list</strong> and I did not want to do that. So therefore I went with the initial list and turn that into a tuple.</p>

<p><a href="https://stealthycoder.writeas.com/tag:code" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">code</span></a> <a href="https://stealthycoder.writeas.com/tag:python" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">python</span></a></p>
]]></content:encoded>
      <guid>https://stealthycoder.writeas.com/optimizing-code-part-5</guid>
      <pubDate>Sun, 06 Oct 2019 14:21:21 +0000</pubDate>
    </item>
    <item>
      <title>Optimizing code – Part 4</title>
      <link>https://stealthycoder.writeas.com/optimizing-code-part-4?pk_campaign=rss-feed</link>
      <description>&lt;![CDATA[The next mini optimization we can do is to bail out earlier because when summing the number and we overshoot the original then we can stop immediately. For example *9 \\ 6 is a big number and so with bigger numbers it makes sense to bail out earlier. !--more--&#xA;&#xA;import sys&#xA;from datetime import datetime&#xA;&#xA;POWERS = [0,1,2,3,4,5,6,7,8,9]&#xA;LENGTH = 1&#xA;&#xA;def recalculatepowers() -  None:&#xA;    global POWERS&#xA;    for i,v in enumerate(POWERS):&#xA;        POWERS[i] = i  v&#xA;&#xA;def isnarcissistic(x: int) -  bool:&#xA;    global LENGTH, POWERS&#xA;    org = x&#xA;    digits = []&#xA;    total = 0&#xA;    while x   0:&#xA;        , x = digits.append(x % 10), x // 10&#xA;    curlength = len(digits)&#xA;    if curlength != LENGTH:&#xA;        LENGTH = curlength&#xA;        recalculatepowers()&#xA;    for i in digits:&#xA;        total += POWERS[i]&#xA;        if total   org:&#xA;            return False&#xA;    return total == org&#xA;&#xA;def findnarcissisticnumbers(desired: int) -  None:&#xA;    for x in range(1,sys.maxsize):&#xA;        if isnarcissistic(x):&#xA;            desired -= 1&#xA;        if desired == 0:&#xA;            return&#xA;&#xA;start = datetime.utcnow()&#xA;findnarcissisticnumbers(28)&#xA;print(datetime.utcnow() - start)&#xA;The time for this code to run on my laptop is: 0:06:21.202479&#xA;&#xA;To compare this to the original run of the naive implementation is, which was 0:08:09.315972. Even on my slow hardware, it still resulted in a speed up of about 25%. &#xA;&#xA;#code #python]]&gt;</description>
      <content:encoded><![CDATA[<p>The next mini optimization we can do is to bail out earlier because when summing the number and we overshoot the original then we can stop immediately. For example <strong>9 ** 6</strong> is a big number and so with bigger numbers it makes sense to bail out earlier. </p>

<pre><code class="language-python">import sys
from datetime import datetime

POWERS = [0,1,2,3,4,5,6,7,8,9]
LENGTH = 1

def recalculate_powers() -&gt; None:
    global POWERS
    for i,v in enumerate(POWERS):
        POWERS[i] = i * v

def is_narcissistic(x: int) -&gt; bool:
    global LENGTH, POWERS
    org = x
    digits = []
    total = 0
    while x &gt; 0:
        _, x = digits.append(x % 10), x // 10
    cur_length = len(digits)
    if cur_length != LENGTH:
        LENGTH = cur_length
        recalculate_powers()
    for i in digits:
        total += POWERS[i]
        if total &gt; org:
            return False
    return total == org

def find_narcissistic_numbers(desired: int) -&gt; None:
    for x in range(1,sys.maxsize):
        if is_narcissistic(x):
            desired -= 1
        if desired == 0:
            return

start = datetime.utcnow()
find_narcissistic_numbers(28)
print(datetime.utcnow() - start)
</code></pre>

<p>The time for this code to run on my laptop is: <strong>0:06:21.202479</strong></p>

<p>To compare this to the original run of the naive implementation is, which was <strong>0:08:09.315972</strong>. Even on my slow hardware, it still resulted in a speed up of about <strong>25%</strong>.</p>

<p><a href="https://stealthycoder.writeas.com/tag:code" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">code</span></a> <a href="https://stealthycoder.writeas.com/tag:python" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">python</span></a></p>
]]></content:encoded>
      <guid>https://stealthycoder.writeas.com/optimizing-code-part-4</guid>
      <pubDate>Sun, 06 Oct 2019 14:10:18 +0000</pubDate>
    </item>
    <item>
      <title>Optimizing code – Part 3</title>
      <link>https://stealthycoder.writeas.com/optimizing-code-part-3?pk_campaign=rss-feed</link>
      <description>&lt;![CDATA[The next thing we can optimize is the fact that the result of the powers calculation does not change during the same length of the numbers. In order words, *3 \\  3 does not change for the numbers 123, 345, 543 and any other number containing a 3 in the range of 100 - 999.  !--more--&#xA;&#xA;So we make a lookup table and put all the powers in there in a nice way. If the position is the same as the value contained within then it is really easy to calculate the next power. It is just multiply the value by the index it is located at. This process is called memoization. It is when you store or cache the results of expensive calculations and just look them up the next time you need them.&#xA;&#xA;import sys&#xA;from datetime import datetime&#xA;&#xA;POWERS = [0,1,2,3,4,5,6,7,8,9]&#xA;LENGTH = 1&#xA;&#xA;def recalculatepowers() -  None:&#xA;    global POWERS&#xA;    for i,v in enumerate(POWERS):&#xA;        POWERS[i] = i  v&#xA;&#xA;def isnarcissistic(x: int) -  bool:&#xA;    global LENGTH, POWERS&#xA;    org = x&#xA;    digits = []&#xA;    total = 0&#xA;    while x   0:&#xA;        , x = digits.append(x % 10), x // 10&#xA;    curlength = len(digits)&#xA;    if curlength != LENGTH:&#xA;        LENGTH = curlength&#xA;        recalculatepowers()&#xA;    for i in digits:&#xA;        total += POWERS[i]&#xA;    return total == org&#xA;&#xA;def findnarcissisticnumbers(desired: int) -  None:&#xA;    for x in range(1,sys.maxsize):&#xA;        if isnarcissistic(x):&#xA;            desired -= 1&#xA;        if desired == 0:&#xA;            return&#xA;&#xA;start = datetime.utcnow()&#xA;findnarcissisticnumbers(28)&#xA;print(datetime.utcnow() - start)&#xA;&#xA;The time for this code to run on my laptop: 0:06:23.868587&#xA;&#xA;#code #python]]&gt;</description>
      <content:encoded><![CDATA[<p>The next thing we can optimize is the fact that the result of the powers calculation does not change during the same length of the numbers. In order words, <strong>3 **  3</strong> does not change for the numbers <em>123</em>, <em>345</em>, <em>543</em> and any other number containing a <strong>3</strong> in the range of <em>100 – 999</em>.  </p>

<p>So we make a lookup table and put all the powers in there in a nice way. If the position is the same as the value contained within then it is really easy to calculate the next power. It is just multiply the value by the index it is located at. This process is called <em>memoization</em>. It is when you store or <em>cache</em> the results of expensive calculations and just look them up the next time you need them.</p>

<pre><code class="language-python">import sys
from datetime import datetime

POWERS = [0,1,2,3,4,5,6,7,8,9]
LENGTH = 1

def recalculate_powers() -&gt; None:
    global POWERS
    for i,v in enumerate(POWERS):
        POWERS[i] = i * v

def is_narcissistic(x: int) -&gt; bool:
    global LENGTH, POWERS
    org = x
    digits = []
    total = 0
    while x &gt; 0:
        _, x = digits.append(x % 10), x // 10
    cur_length = len(digits)
    if cur_length != LENGTH:
        LENGTH = cur_length
        recalculate_powers()
    for i in digits:
        total += POWERS[i]
    return total == org

def find_narcissistic_numbers(desired: int) -&gt; None:
    for x in range(1,sys.maxsize):
        if is_narcissistic(x):
            desired -= 1
        if desired == 0:
            return

start = datetime.utcnow()
find_narcissistic_numbers(28)
print(datetime.utcnow() - start)
</code></pre>

<p>The time for this code to run on my laptop: <strong>0:06:23.868587</strong></p>

<p><a href="https://stealthycoder.writeas.com/tag:code" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">code</span></a> <a href="https://stealthycoder.writeas.com/tag:python" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">python</span></a></p>
]]></content:encoded>
      <guid>https://stealthycoder.writeas.com/optimizing-code-part-3</guid>
      <pubDate>Sun, 06 Oct 2019 14:02:18 +0000</pubDate>
    </item>
    <item>
      <title>Optimizing code - Part 2</title>
      <link>https://stealthycoder.writeas.com/optimizing-code-part-2?pk_campaign=rss-feed</link>
      <description>&lt;![CDATA[A small mini optimization is to move the call to len out as the length of the number does not change.  !--more--&#xA;&#xA;import sys&#xA;from datetime import datetime&#xA;&#xA;def isnarcissistic(x: int) -  bool:&#xA;    org = x&#xA;    digits = []&#xA;    while x   0:&#xA;        , x = digits.append(x % 10), x // 10&#xA;    power = len(digits)&#xA;    s = sum([i  power for i in digits])&#xA;    return s == org&#xA;&#xA;def findnarcissisticnumbers(desired: int) -  None:&#xA;    for x in range(sys.maxsize):&#xA;        if isnarcissistic(x):&#xA;            desired -= 1&#xA;        if desired == 0:&#xA;            return&#xA;&#xA;start = datetime.utcnow()&#xA;findnarcissistic_numbers(28)&#xA;print(datetime.utcnow() - start)&#xA;The time for this code to run on my laptop is: 0:07:14.172538**&#xA;&#xA;#code #python]]&gt;</description>
      <content:encoded><![CDATA[<p>A small mini optimization is to move the call to <em>len</em> out as the length of the number does not change.  </p>

<pre><code class="language-python">import sys
from datetime import datetime

def is_narcissistic(x: int) -&gt; bool:
    org = x
    digits = []
    while x &gt; 0:
        _, x = digits.append(x % 10), x // 10
    power = len(digits)
    s = sum([i ** power for i in digits])
    return s == org

def find_narcissistic_numbers(desired: int) -&gt; None:
    for x in range(sys.maxsize):
        if is_narcissistic(x):
            desired -= 1
        if desired == 0:
            return

start = datetime.utcnow()
find_narcissistic_numbers(28)
print(datetime.utcnow() - start)
</code></pre>

<p>The time for this code to run on my laptop is: <strong>0:07:14.172538</strong></p>

<p><a href="https://stealthycoder.writeas.com/tag:code" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">code</span></a> <a href="https://stealthycoder.writeas.com/tag:python" class="hashtag" rel="nofollow"><span>#</span><span class="p-category">python</span></a></p>
]]></content:encoded>
      <guid>https://stealthycoder.writeas.com/optimizing-code-part-2</guid>
      <pubDate>Sun, 06 Oct 2019 14:00:22 +0000</pubDate>
    </item>
  </channel>
</rss>