===== 12-Factor App Methodology ===== ===== Factor 1: Codebase ===== According to the first principle of the 12-Factor App methodology, an application should have **one codebase**, which is stored in a **version control system**. This codebase can be, for example, a Git repository. From this single codebase, deployments can be made to multiple different environments, such as development, test, and production. The key idea is therefore that **one application = one codebase**, but from this codebase there can be multiple deploys. Deploy here means that the same application is run in different environments or with different configuration. ==== What does this mean in practice? ==== If we have a web application, then its entire source code is located in a single repository. From this, for example, the following can be created: * a **development** deployment for programmers, * a **staging** system for testing, * a **production** system for real users. The three environments are not three separate projects, but three separate runtime instances of the same application. flowchart TD A[Single Git repository] --> B[Development environment] A --> C[Staging environment] A --> D[Production environment] ==== Incorrect approach ==== flowchart TD A[myapp-dev folder] --> X[Development run] B[myapp-test folder] --> Y[Test run] C[myapp-prod folder] --> Z[Production run] In the first diagram, the three environments start from the same codebase. In the second diagram, there are already three separate copies, which can lead to errors and differences in the long term. ==== Good approach ==== * The source code of an application is in one Git repository. * The repository contains all source files required for the application. * Development, test, and production deployment all happen from this same repository. Example: my-webapp/ src/ tests/ package.json Dockerfile README.md In this case, ''my-webapp'' is a single application, and it has a single codebase. ==== Bad approach ==== It is not good if the same application has multiple separate, manually synchronized copies, for example: my-webapp-dev/ my-webapp-test/ my-webapp-prod/ This is problematic because over time the three versions diverge from each other, and it is no longer possible to know with certainty which one is the current or correct version. It is also a bad solution if a single repository contains multiple independent applications that actually have separate lifecycles. ==== Why is this important? ==== This principle is important because it reduces chaos during development and operations. If an application has multiple "half-identical" codebases, then it can very easily happen that: * a bug fix is added only to one version, * the tested version does not match the production version, * version tracking becomes difficult, * team members are not working with the same source code. A single codebase ensures that everyone builds on the same foundation. ==== Example 1: simple web application ==== Suppose we create a Python FastAPI-based system. The project repository could be this: invoice-service/ app/ main.py routes.py requirements.txt Dockerfile From this same codebase, it can run: * on the developer's machine on localhost, * on the test server, * in the production Docker container. The difference is not in the source code, but in the configuration and the runtime environment. ==== Example 2: Node.js application ==== In the case of a Node.js backend, the same principle applies: student-api/ src/ package.json package-lock.json .env.example The development, staging, and production system are all built from this same repository. We do not create separate ''student-api-prod'' or ''student-api-final'' folders. ==== Example 3: what many people do incorrectly ==== In beginner projects, it is common for someone to do this: projekt/ projekt_uj/ projekt_vegso/ projekt_vegso_jav/ projekt_vegleges_tenyleg/ This is not version control, but manual copying. According to the 12-Factor approach, this should be replaced with a Git repository, where versions can be tracked with commits, branches, and tags. ==== Relationship with version control ==== The codebase principle is closely related to version control. The most typical tool for this is Git. Developers use the same repository, and they can work on separate branches there. Even so, the application itself still has a single codebase. ==== Typical misunderstandings ==== Many people think that if there are multiple microservices, then that violates this principle. In reality it does not, because in this case each microservice counts as a separate application, so each one can have its own codebase. For example: * ''user-service'' → separate repository * ''order-service'' → separate repository * ''payment-service'' → separate repository This is completely correct, because these are separate applications or components. ---- ===== Factor 2: Dependencies ===== According to the second principle of the 12-Factor App methodology, an application must **explicitly declare all of its external dependencies**. This means that the application cannot rely on certain libraries or tools already being installed on the system. Dependency declaration is usually done **with the help of a dependency management system**. This way, the application defines exactly which external packages, libraries, or frameworks it needs. The goal is that **the application can be built and run in the same way in any environment**. ==== What does this mean in practice? ==== Modern applications often use multiple external libraries, such as: * web framework * database client * JSON processing library * authentication modules According to the 12-Factor principle, these dependencies **must all appear in the project configuration**. If someone downloads the project, the required packages must become installable with a single command. ==== Good approach ==== Python example: requirements.txt fastapi==0.110 uvicorn==0.29 pydantic==2.6 The project structure could be: invoice-service/ app/ requirements.txt Installation: pip install -r requirements.txt This ensures that every developer uses the same packages. ==== Node.js example ==== In Node.js, dependencies are listed in the ''package.json'' file. { "name": "student-api", "dependencies": { "express": "^4.18.2", "jsonwebtoken": "^9.0.0" } } Installation: npm install This automatically downloads all required packages. ==== Java example ==== In Java projects, Maven or Gradle is commonly used. Maven example: pom.xml org.springframework.boot spring-boot-starter-web Dependencies are downloaded automatically from the Maven repository. ==== Bad approach ==== It is incorrect practice if the project implicitly assumes that certain libraries are already installed. Example: # used in code import fastapi import pandas but there is no: requirements.txt In this case, the application will not start on another developer's machine. ==== Classic problem ==== Many developers have already encountered the following error: ModuleNotFoundError: No module named 'fastapi' This typically happens because the project's dependencies are not declared. ==== Dependency isolation ==== The 12-Factor principle often goes together with **the use of virtual environments**. Python example: python -m venv venv source venv/bin/activate pip install -r requirements.txt This ensures that the project has its own package set. ==== Containerized example ==== When using Docker, dependencies appear in the Dockerfile. FROM python:3.11 WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . In this case, the container contains every required dependency. ==== Why is this important? ==== Explicit dependency management ensures that: * every developer uses the same environment * the application can be reproduced at any time * the build process can be automated * the CI/CD pipeline works reliably This is especially important in large teams. ==== Common misunderstanding ==== Declaring dependencies does not mean only program libraries. It may also include, for example: * build tools * CLI helper programs * compilers * runtime environments ==== Correct operation ==== flowchart LR A[Git repository] --> B[Dependency file] B --> C[Automatic installation] C --> D[Run application] ---- ===== Factor 3: Config ===== According to the third principle of the 12-Factor App methodology, the application's **configuration must be separated from the source code**. Configuration should not be stored in the program code, but in **environment variables**. Configuration means values that **can vary by environment**, for example: * database connection * API keys * passwords * URLs of external services * debug settings According to the 12-Factor principle, these **must not be hardcoded in the code**. ==== What does this mean in practice? ==== If an application runs in multiple environments (for example development, test, production), then the configuration can be different. For example: ^ Environment ^ Database ^ | development | localhost | | test | test-db | | production | prod-db | If the database address is in the code, then the program would need to be modified before every deploy. According to the 12-Factor approach, configuration **enters the program as an external parameter**. ==== Bad approach (hardcoded configuration) ==== Python example: DATABASE_URL = "postgres://user:password@prod-db:5432/app" In this case: * the password is in the code * the configuration is committed into the repository * the code must be modified when switching environments This causes security and operational problems. ==== Good approach: environment variable ==== Python example: import os DATABASE_URL = os.getenv("DATABASE_URL") In the environment: export DATABASE_URL=postgres://user:password@prod-db:5432/app This way the code remains the same in every environment. ==== Node.js example ==== const dbUrl = process.env.DATABASE_URL; Startup: DATABASE_URL=postgres://localhost/mydb node server.js ==== Docker example ==== In the case of a Docker container, configuration also appears as an environment variable. docker run -e DATABASE_URL=postgres://db/app myapp ==== Kubernetes example ==== In Kubernetes, configuration often appears in the form of a **ConfigMap** or **Secret**. env: - name: DATABASE_URL valueFrom: secretKeyRef: name: database-secret key: url ==== What counts as configuration? ==== Typical configuration items: * database connections * external API keys * SMTP server * cache server * feature flags * log levels What is **not configuration** is the operating logic of the program. ==== Classic mistake ==== In many projects you can find a file like this: config.py DATABASE_URL = "postgres://localhost/app" SECRET_KEY = "123456" If this is committed, then: * passwords may end up in a public repository * a separate config file is needed for every environment * the chance of error increases ==== Advantages of environment variables ==== The use of environment variables provides several advantages: * they do not get into the source code * they can be modified easily during deploy * safer handling * works well in containerized systems ==== Mermaid diagram: configuration handling ==== flowchart TD A[Application code] --> B[Environment variable] B --> C[Load configuration] C --> D[Application runtime] ---- ===== Factor 4: Backing Services ===== According to the fourth principle of the 12-Factor App methodology, the infrastructure components used by the application must be treated as **external services (backing services)**. These are resources that **are not part of the application**, but separate systems that the application connects to over the network. Typical backing service examples: * database (PostgreSQL, MySQL) * cache system (Redis, Memcached) * message queue (Kafka, RabbitMQ) * search engine (Elasticsearch) * object storage (S3) The important principle is that these services should appear to the application as **replaceable resources**. ==== What does this mean in practice? ==== The application logic does not contain the infrastructure itself. The database, cache, or other components run as separate services, and the application only connects to them. For example, a web application can use a database and a cache system: flowchart TD A[Application] --> B[Database] A --> C[Cache] In this model, the database and the cache are separate services. ==== Replaceability ==== One important consequence of the backing service principle is that the service is **easy to replace**. For example, an application can use: * a local PostgreSQL database during development, * a cloud-based database in production. The operation of the application does not change because the database appears as an external service. ==== Example architecture ==== flowchart TD A[Web application] A --> B[(PostgreSQL)] A --> C[(Redis)] A --> D[(Message Queue)] The application treats every external resource as a separate service. ===== Factor 5: Build, Release, Run ===== The fifth factor prescribes that the application lifecycle must be divided into **three well-separated phases**: * **Build** * **Release** * **Run** Separating the three phases is important because this makes deployment **reproducible, automatable, and safe**. ==== Meaning of the three phases ==== === Build === During the build phase, a **runnable package (artifact)** is created from the source code. In this step, for example, the following happens: * source code compilation * dependency installation * creation of the package or container image Examples: * Java → JAR file * Node.js → built application * Docker → container image === Release === The release phase is the **combination of the build result and the configuration**. At this point, a concrete application version is created that can be deployed. So the release is: build + configuration === Run === During the run phase, the application **is executed in the selected environment**. This can be, for example: * server * container * cloud platform The run phase **does not modify the build anymore**, it only starts the application. ==== Relationship of the three phases ==== flowchart LR A[Source Code] --> B[Build] B --> C[Release] C --> D[Run] ==== Why is the separation important? ==== If the three phases are not separated from each other, then it will be difficult to: * reproduce the deployment * roll back to an earlier version * automate the deploy process The Build-Release-Run model ensures that **the same build can be run in multiple environments**. ==== Example in a Docker environment ==== Build: docker build -t myapp:1.0 . Release: docker tag myapp:1.0 registry/myapp:1.0 Run: docker run myapp:1.0 ==== Example in a CI/CD pipeline ==== A modern pipeline often follows exactly these three steps. flowchart LR A[Git commit] --> B[CI build] B --> C[Release artifact] C --> D[Deployment] ---- ===== Factor 6: Processes ===== According to the sixth factor, the application should consist of **stateless processes**. This means that running instances of the application must not store persistent state in their own memory or local filesystem. Persistent data must always be stored in an **external service**, for example in a database or cache system. ==== What does stateless operation mean? ==== Stateless means that any running instance can handle a request without knowing the state of previous requests. If the application runs in multiple instances, then the system can route requests to any of the instances. ==== Example of a stateless architecture ==== flowchart TD A[Load Balancer] A --> B[App instance 1] A --> C[App instance 2] A --> D[App instance 3] B --> E[(Database)] C --> E D --> E In this model, multiple application instances run at the same time. Each of them connects to the same database, and none of them stores persistent data in itself. ==== Bad approach: state in memory ==== It is an incorrect solution if the application stores user state in its own memory. Example: sessions = {} def login(user): sessions[user.id] = "active" If the application runs in multiple instances, then the state in the memory of one instance will not be available to the other instance. ==== Good approach: state in an external service ==== User state is stored in an external system. For example, in Redis: flowchart TD A[App instance 1] --> B[(Redis)] C[App instance 2] --> B This way every application instance sees the same state. ==== File handling ==== The stateless principle also applies to files. Incorrect solution: /tmp/uploads/file1.jpg If the application is moved to a new instance, the file may disappear. Good solution: * object storage (S3) * network filesystem * database ==== Why is this important? ==== Stateless processes make the following possible: * horizontal scaling * containerized deploy * automatic restart * operation of load balancing ==== In a modern cloud environment ==== Cloud systems often automatically start and stop application instances. If the application is stateless, then this does not cause a problem. ---- ===== Factor 7: Port Binding ===== The seventh factor states that the application **publishes itself as a standalone service through a port**. The application starts its own HTTP or network server, and becomes available through it. This means that the application **does not depend on an external web server**, but provides service access itself. ==== What does this mean in practice? ==== In many older systems, the application connects to an external web server. Example: * Apache * Nginx * IIS In this model, the web server loads the application, for example with the help of a plugin or module. In contrast, the 12-Factor approach suggests that the application should **start its own server** and be available through a port. ==== Example for a modern web application ==== A Node.js or Python web application often starts like this: app.listen(8080) or uvicorn main:app --port 8000 The application then becomes directly available on the specified port. ==== Architecture example ==== flowchart TD A[Client] --> B[Application server :8000] The application runs its own server and directly accepts requests. ==== Containerized environment ==== In containerized systems, this is especially important. A Docker container typically serves on one port. Example: docker run -p 8000:8000 myapp The application runs on port 8000 inside the container. ==== Microservice architecture ==== In microservice systems, every service publishes itself on its own port. flowchart TD A[API Gateway] --> B[User Service :8001] A --> C[Order Service :8002] A --> D[Payment Service :8003] Every service is available on a separate port. ==== Why is this important? ==== Port binding enables: * simple service startup * containerized execution * microservice architecture * dynamic infrastructure The platform (for example Kubernetes or a cloud provider) can automatically connect the services. ---- ===== Factor 8: Concurrency ===== The eighth factor describes how the application can be **scaled**. According to the 12-Factor App methodology, the application can be **scaled by starting multiple processes**, rather than by strengthening a single larger process. This means that if the load increases, then **multiple identical application instances** are started in parallel. ==== What does this mean in practice? ==== There are two basic ways to scale: * **vertical scaling** – using a more powerful server * **horizontal scaling** – starting more instances The 12-Factor approach supports **horizontal scaling**. ==== Example of horizontal scaling ==== flowchart TD A[Load Balancer] A --> B[App instance 1] A --> C[App instance 2] A --> D[App instance 3] The load balancer distributes requests among multiple running instances. ==== Process types ==== Applications often contain several different types of processes. For example: * web server * background processor (worker) * scheduled task (scheduler) These can be run as separate processes. ==== Example architecture ==== flowchart TD A[Client] --> B[Web process] B --> C[(Database)] B --> D[Worker process] The web process accepts requests, while the worker performs background tasks. ==== Example of worker scaling ==== If many background tasks arrive, more workers can be started. flowchart TD A[Queue] --> B[Worker 1] A --> C[Worker 2] A --> D[Worker 3] ==== In a containerized environment ==== In modern cloud systems, scaling is often automatic. Example in Kubernetes: * 1 pod → low load * 5 pods → high load The system can automatically start new instances. ==== Why is this important? ==== Using multiple processes makes the following possible: * handling higher load * better fault handling * flexible scaling * use of cloud infrastructure ---- ===== Factor 9: Disposability ===== According to the ninth factor, the application's processes should **start quickly and stop quickly**. Such processes can be created or terminated easily without interrupting the operation of the system. This property is especially important in modern cloud and containerized environments, where application instances often start and stop automatically. ==== What does this mean in practice? ==== If a new instance of the application needs to be started (for example because of increased load), then it should **start within a few seconds**. If an instance has to be stopped (for example because of an update or scaling), then the application should **safely finish its running operations**, and then stop. ==== Fast startup ==== Fast startup makes it possible for the system to start new instances when the load increases. flowchart LR A[Load increase] --> B[Start new application instance] B --> C[New instance accepts requests] ==== Fast shutdown ==== When stopping a process, the application must be given a chance to finish running operations. Example process: 1. the system signals shutdown, 2. the application finishes the current requests, 3. the process exits. ==== Example with a worker process ==== A background processor (worker) can finish its current task before shutdown. flowchart LR A[Stop signal] --> B[Process finishes the current task] B --> C[Process stops] ==== Why is this important? ==== Fast startup and shutdown make the following possible: * automatic scaling * fast deploy process * replacement of faulty instances * containerized execution Modern cloud systems often start new instances in a short time. ==== Example in a containerized environment ==== For example, a Kubernetes system can start a new container if the load increases. If one instance is faulty, the system stops it and starts a new one. ---- ===== Factor 10: Dev/Prod Parity ===== The tenth factor states that the difference between the **development**, **staging**, and **production** environments should be as small as possible. The more similar the environments are to each other, the smaller the chance that the application works during development but causes an error in the production system. ==== The classic problem ==== In many systems, the development and production environments are very different. For example: ^ Development ^ Production ^ | SQLite database | PostgreSQL | | local filesystem | cloud storage | | simple server | system made up of multiple servers | In such cases, it often happens that the program works during development, but causes an error in production. ==== Correct approach ==== The goal is that the difference between environments should be **minimal**. Ideally: * the same database type * the same runtime environment * the same infrastructure ---- ===== Factor 11: Logs ===== According to the eleventh factor, the application **does not manage log files directly**, but writes logs as a **stream** to standard output. The collection, storage, and processing of logs is done **by the runtime environment**, not by the application itself. ==== What does this mean in practice? ==== The application simply writes log messages to standard output. For example: print("User logged in") or console.log("Server started") The application does not create its own log files and does not deal with archiving logs. ==== Traditional approach ==== Older systems often write logs directly to a file. Example: /var/log/myapp.log This can cause multiple problems: * the log file size continuously grows * it is difficult to collect logs from multiple servers * log handling becomes the application's responsibility ==== Correct approach ==== The application only outputs events, and the platform collects the logs. flowchart LR A[Application] --> B[Stdout log stream] B --> C[Log collection system] The log collection system can be, for example: * log aggregator * monitoring system * cloud log service ==== Example in a containerized environment ==== In the case of Docker, logs automatically go to standard output. The Docker system collects and stores them. flowchart LR A[Application container] --> B[stdout] B --> C[Docker log system] ==== In a cloud environment ==== In cloud systems, logs often go into a central system. For example: * ELK stack * Loki * cloud logging service flowchart LR A[Application] --> B[Log stream] B --> C[Central log storage] ---- ===== Factor 12: Admin Processes ===== According to the twelfth factor, the **administrative or maintenance tasks** belonging to the application must be run as separate processes, in the same environment as the application itself. Administrative tasks are not part of the normal application process, but **one-time or periodic operations** that must be started separately. ==== Examples of administrative processes ==== Typical administrative operations: * database migration * running a data-fix script * clearing the cache * importing data * one-time maintenance task These tasks do not run as part of the web application, but as separately started processes. ==== Example: database migration ==== Before updating an application, it is often necessary to modify the database structure. python manage.py migrate This is an administrative process that must be started separately. ==== Example architecture ==== flowchart TD A[Web Application] B[Admin Process] A --> C[(Database)] B --> C The admin process uses the same database as the application, but runs separately. ==== Important rule ==== Admin processes must **use the same environment** as the application. This means: * the same codebase * the same configuration * the same dependencies This ensures that admin operations work in the same way as the application. ==== Example in a containerized environment ==== In a containerized system, an admin task can also be run as a separate container. flowchart TD A[Application container] --> C[(Database)] B[Migration container] --> C The migration container runs only once.