Pose Detection in Android with ML Kit & Jetpack Compose | Real-time Pose Skeleton

8 min readFeb 8, 2025

Pose detection is an exciting field in mobile AI that enables real-time analysis of human body movements. In this article, we will explore how to implement pose detection in an Android application using Google’s ML Kit and Jetpack Compose. We will also visualise the detected pose as a skeleton overlay on a camera preview.

Pose Detection in Action

Prerequisites

Before diving in, ensure you have the following:

Android Studio installed
A basic understanding of Jetpack Compose
A physical Android device (pose detection might not work properly on an emulator)

What is Pose Detection?

Pose detection identifies human body landmarks such as joints and key points (e.g., shoulders, elbows, knees). It is commonly used in fitness tracking, augmented reality (AR), and gesture-based interactions.

Why ML Kit for Pose Detection?

ML Kit is a powerful set of machine learning tools provided by Google. It offers on-device pose detection with:

Real-time performance
No internet requirement
Easy integration

Setting Up ML Kit Pose Detection

To use ML Kit’s pose detection in your Android app, add the dependency in your build.gradle file:

[versions]
poseDetectionAccurate = "18.0.0-beta5"
poseDetection = "18.0.0-beta5"

[libraries]
pose-detection-accurate = { group = "com.google.mlkit", name = "pose-detection-accurate", version.ref = "poseDetectionAccurate" }
pose-detection = { group = "com.google.mlkit", name = "pose-detection", version.ref = "poseDetection" }

implementation(libs.pose.detection.accurate)
implementation(libs.pose.detection)

CameraX Setup

We use CameraX to access the camera feed and process frames for pose detection. Add the dependencies:

[versions]
cameraxVersion = "1.4.1"

[libraries]
camera-core = { group = "androidx.camera", name = "camera-core", version.ref = "cameraxVersion" }
camera-camera2 = { group = "androidx.camera", name = "camera-camera2", version.ref = "cameraxVersion" }
camera-view = { group = "androidx.camera", name = "camera-view", version.ref = "cameraxVersion" }
camera-lifecycle = { group = "androidx.camera", name = "camera-lifecycle", version.ref = "cameraxVersion" }

Implementing Pose Detection

Initialize CameraX: We start by setting up CameraX to capture frames for processing.
Process Frames with ML Kit: Convert ImageProxy to InputImage and feed it to ML Kit’s Pose Detector.
Draw Pose Skeleton: Use Jetpack Compose’s Canvas to overlay the detected pose on the camera preview.

Code Implementation

1. Setting up CameraX

Before using CameraX, you need to declare the necessary permissions in your AndroidManifest.xml file:

<uses-feature
    android:name="android.hardware.camera"
    android:required="true" />
<uses-permission android:name="android.permission.CAMERA" />

This ensures that the application has access to the device’s camera and that it is required for the app to function.

Below is the Kotlin implementation for requesting camera permissions, handling user responses, and setting up the camera preview using Jetpack Compose.

class MainActivity : ComponentActivity() {
    private val cameraPermissionRequest =
        registerForActivityResult(ActivityResultContracts.RequestPermission()) { isGranted ->
            if (isGranted) {
                setCameraPreview()
            } else {
                openPermissionSettings()
            }
        }

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        requestedOrientation = ActivityInfo.SCREEN_ORIENTATION_SENSOR_LANDSCAPE
        checkCameraPermission()
        enableEdgeToEdge()
    }

    private fun checkCameraPermission() {
        when (PackageManager.PERMISSION_GRANTED) {
            ContextCompat.checkSelfPermission(
                this,
                android.Manifest.permission.CAMERA,
            ), -> {
                setCameraPreview()
            }

            else -> {
                cameraPermissionRequest.launch(android.Manifest.permission.CAMERA)
            }
        }
    }

    private fun setCameraPreview() {
        setContent {
            PoseDetectionTheme {
                Scaffold(
                    modifier = Modifier.fillMaxSize(),
                ) { innerPadding ->
                    CameraScreen(
                        modifier = Modifier.padding(innerPadding)
                    )
                }
            }
        }
    }

    private fun openPermissionSettings() {
        Intent(ACTION_APPLICATION_DETAILS_SETTINGS).also {
            val uri = Uri.fromParts("package", packageName, null)
            it.data = uri
            startActivity(it)
        }
    }
}

Permission Handling:

The app checks for camera permission using ContextCompat.checkSelfPermission.
If the permission is granted, it proceeds to setCameraPreview().
Otherwise, it requests permission using ActivityResultContracts.RequestPermission().
If denied, openPermissionSettings() prompts the user to manually enable the permission in settings.

Setting Up the Camera Preview:

setCameraPreview() initialises Jetpack Compose UI and calls CameraScreen.
The Scaffold component ensures proper layout handling.

Handling Orientation:

requestedOrientation = ActivityInfo.SCREEN_ORIENTATION_SENSOR_LANDSCAPE ensures landscape mode.

2. Processing Frames with ML Kit

Processing frames with ML Kit’s Pose Detection API allows real-time analysis of human body movements. In this article, we will break down the implementation of frame processing using ML Kit and explain how it works step by step.

Below is the Kotlin function that processes camera frames, detects human pose landmarks, and updates UI elements accordingly.

@OptIn(ExperimentalGetImage::class)
fun processImageProxy(
    imageProxy: ImageProxy,
    poseLandmarks: SnapshotStateList<PoseLandmark>,
    imageWidth: MutableState<Int>,
    imageHeight: MutableState<Int>,
) {
    val mediaImage = imageProxy.image ?: return
    val image = InputImage.fromMediaImage(mediaImage, imageProxy.imageInfo.rotationDegrees)

    val options = PoseDetectorOptions.Builder()
        .setDetectorMode(PoseDetectorOptions.STREAM_MODE)
        .setPreferredHardwareConfigs(PoseDetectorOptions.CPU_GPU)
        .build()

    val poseDetector = PoseDetection.getClient(options)

    poseDetector.process(image)
        .addOnSuccessListener { pose ->
            poseLandmarks.clear()
            poseLandmarks.addAll(pose.allPoseLandmarks)
            imageWidth.value = mediaImage.width
            imageHeight.value = mediaImage.height
        }
        .addOnFailureListener { e ->
            Log.e("PoseDetection", "Detection failed", e)
        }
        .addOnCompleteListener {
            imageProxy.close()
        }
}

Extracting Image Data from ImageProxy

imageProxy.image ?: return ensures that the image is valid before processing.
InputImage.fromMediaImage(mediaImage, imageProxy.imageInfo.rotationDegrees) converts mediaImage into a format compatible with ML Kit, applying necessary rotations.

Configuring the Pose Detector

The PoseDetectorOptions.Builder() is used to define the detection parameters:

setDetectorMode(PoseDetectorOptions.STREAM_MODE): Enables continuous pose detection for real-time applications.

setPreferredHardwareConfigs(PoseDetectorOptions.CPU_GPU): Allows the detector to leverage both CPU and GPU for better performance.

PoseDetection.getClient(options): Creates an instance of the pose detector with the specified options.

Processing the Image with ML Kit

poseDetector.process(image) runs the pose detection on the input image.
addOnSuccessListener { pose -> ... }: If successful, it extracts all detected landmarks and updates poseLandmarks.
imageWidth.value = mediaImage.width and imageHeight.value = mediaImage.height store the dimensions of the processed image.
addOnFailureListener { e -> Log.e("PoseDetection", "Detection failed", e) }: Handles errors if pose detection fails.
addOnCompleteListener { imageProxy.close() }: Ensures that the ImageProxy is properly closed after processing to prevent memory leaks.

3. Drawing Pose Skeleton in Jetpack Compose

Pose detection is a powerful feature that allows us to analyze human movement using machine learning. In this article, we will explore how to visually represent detected pose landmarks in an Android application using Jetpack Compose’s Canvas component. The following Kotlin code provides an implementation of drawing pose landmarks and their connections as an overlay on the screen.

@Composable
fun PoseOverlay(
    poseLandmarks: List<PoseLandmark>,
    imageWidth: Int,
    imageHeight: Int,
    canvasWidth: Float,
    canvasHeight: Float,
) {
    val scaleX = canvasWidth / imageWidth
    val scaleY = canvasHeight / imageHeight

    Canvas(modifier = Modifier.fillMaxSize()) {
        for (landmark in poseLandmarks) {
            val adjustedX = landmark.position.x * scaleX
            val adjustedY = landmark.position.y * scaleY

            drawCircle(
                color = Color.Red,
                radius = 8f,
                center = Offset(adjustedX, adjustedY)
            )
        }

        val connections = listOf(
            PoseLandmark.LEFT_EYE to PoseLandmark.RIGHT_EYE,
            PoseLandmark.LEFT_EYE to PoseLandmark.LEFT_EAR,
            PoseLandmark.RIGHT_EYE to PoseLandmark.RIGHT_EAR,
            PoseLandmark.NOSE to PoseLandmark.LEFT_EYE,
            PoseLandmark.NOSE to PoseLandmark.RIGHT_EYE,
            PoseLandmark.NOSE to PoseLandmark.LEFT_MOUTH,
            PoseLandmark.NOSE to PoseLandmark.RIGHT_MOUTH,

            PoseLandmark.LEFT_SHOULDER to PoseLandmark.RIGHT_SHOULDER,
            PoseLandmark.LEFT_SHOULDER to PoseLandmark.LEFT_HIP,
            PoseLandmark.RIGHT_SHOULDER to PoseLandmark.RIGHT_HIP,
            PoseLandmark.LEFT_HIP to PoseLandmark.RIGHT_HIP,

            PoseLandmark.LEFT_SHOULDER to PoseLandmark.LEFT_ELBOW,
            PoseLandmark.LEFT_ELBOW to PoseLandmark.LEFT_WRIST,
            PoseLandmark.RIGHT_SHOULDER to PoseLandmark.RIGHT_ELBOW,
            PoseLandmark.RIGHT_ELBOW to PoseLandmark.RIGHT_WRIST,

            PoseLandmark.LEFT_WRIST to PoseLandmark.LEFT_INDEX,
            PoseLandmark.LEFT_WRIST to PoseLandmark.LEFT_PINKY,
            PoseLandmark.LEFT_WRIST to PoseLandmark.LEFT_THUMB,
            PoseLandmark.RIGHT_WRIST to PoseLandmark.RIGHT_INDEX,
            PoseLandmark.RIGHT_WRIST to PoseLandmark.RIGHT_PINKY,
            PoseLandmark.RIGHT_WRIST to PoseLandmark.RIGHT_THUMB,

            PoseLandmark.LEFT_HIP to PoseLandmark.LEFT_KNEE,
            PoseLandmark.LEFT_KNEE to PoseLandmark.LEFT_ANKLE,
            PoseLandmark.RIGHT_HIP to PoseLandmark.RIGHT_KNEE,
            PoseLandmark.RIGHT_KNEE to PoseLandmark.RIGHT_ANKLE,

            PoseLandmark.LEFT_ANKLE to PoseLandmark.LEFT_HEEL,
            PoseLandmark.LEFT_ANKLE to PoseLandmark.LEFT_FOOT_INDEX,
            PoseLandmark.RIGHT_ANKLE to PoseLandmark.RIGHT_HEEL,
            PoseLandmark.RIGHT_ANKLE to PoseLandmark.RIGHT_FOOT_INDEX
        )

        for ((start, end) in connections) {
            val startLandmark = poseLandmarks.find { it.landmarkType == start }
            val endLandmark = poseLandmarks.find { it.landmarkType == end }

            if (startLandmark != null && endLandmark != null) {
                val startX = startLandmark.position.x * scaleX
                val startY = startLandmark.position.y * scaleY
                val endX = endLandmark.position.x * scaleX
                val endY = endLandmark.position.y * scaleY

                drawLine(
                    color = Color.Green,
                    strokeWidth = 4f,
                    start = Offset(startX, startY),
                    end = Offset(endX, endY)
                )
            }
        }
    }
}

Explanation

Composable Function: PoseOverlay

@Composable
fun PoseOverlay(
    poseLandmarks: List<PoseLandmark>,
    imageWidth: Int,
    imageHeight: Int,
    canvasWidth: Float,
    canvasHeight: Float,
) {

This is a @Composable function named PoseOverlay, which means it can be used within other Jetpack Compose UI components.
It takes a list of detected pose landmarks (poseLandmarks) and the dimensions of the input image (imageWidth, imageHeight).
The canvasWidth and canvasHeight represent the size of the UI where the landmarks will be drawn.

Scaling Factor Calculation

val scaleX = canvasWidth / imageWidth
val scaleY = canvasHeight / imageHeight

Since the input image size may differ from the displayed canvas size, we calculate scaling factors (scaleX, scaleY) to correctly map the pose landmarks onto the screen.

Drawing Landmarks

Canvas(modifier = Modifier.fillMaxSize()) {
    for (landmark in poseLandmarks) {
        val adjustedX = landmark.position.x * scaleX
        val adjustedY = landmark.position.y * scaleY

Canvas is a Jetpack Compose component that allows us to draw custom graphics.
We iterate through the list of detected landmarks (poseLandmarks).
The x and y coordinates of each landmark are adjusted according to the scaling factors.

        drawCircle(
            color = Color.Red,
            radius = 8f,
            center = Offset(adjustedX, adjustedY)
        )
    }

Each landmark is represented as a red circle (Color.Red) with a radius of 8f at the corresponding position.

Defining Connections Between Landmarks

        val connections = listOf(
            PoseLandmark.LEFT_EYE to PoseLandmark.RIGHT_EYE,
            PoseLandmark.LEFT_EYE to PoseLandmark.LEFT_EAR,
            PoseLandmark.RIGHT_EYE to PoseLandmark.RIGHT_EAR,
            PoseLandmark.NOSE to PoseLandmark.LEFT_EYE,
            PoseLandmark.NOSE to PoseLandmark.RIGHT_EYE,
            PoseLandmark.NOSE to PoseLandmark.LEFT_MOUTH,
            PoseLandmark.NOSE to PoseLandmark.RIGHT_MOUTH,

            // **Torso**
            PoseLandmark.LEFT_SHOULDER to PoseLandmark.RIGHT_SHOULDER,
            PoseLandmark.LEFT_SHOULDER to PoseLandmark.LEFT_HIP,
            PoseLandmark.RIGHT_SHOULDER to PoseLandmark.RIGHT_HIP,
            PoseLandmark.LEFT_HIP to PoseLandmark.RIGHT_HIP,

            // **Arms**
            PoseLandmark.LEFT_SHOULDER to PoseLandmark.LEFT_ELBOW,
            PoseLandmark.LEFT_ELBOW to PoseLandmark.LEFT_WRIST,
            PoseLandmark.RIGHT_SHOULDER to PoseLandmark.RIGHT_ELBOW,
            PoseLandmark.RIGHT_ELBOW to PoseLandmark.RIGHT_WRIST,

            // **Hands & Fingers (Basic)**
            PoseLandmark.LEFT_WRIST to PoseLandmark.LEFT_INDEX,
            PoseLandmark.LEFT_WRIST to PoseLandmark.LEFT_PINKY,
            PoseLandmark.LEFT_WRIST to PoseLandmark.LEFT_THUMB,
            PoseLandmark.RIGHT_WRIST to PoseLandmark.RIGHT_INDEX,
            PoseLandmark.RIGHT_WRIST to PoseLandmark.RIGHT_PINKY,
            PoseLandmark.RIGHT_WRIST to PoseLandmark.RIGHT_THUMB,

            // **Legs**
            PoseLandmark.LEFT_HIP to PoseLandmark.LEFT_KNEE,
            PoseLandmark.LEFT_KNEE to PoseLandmark.LEFT_ANKLE,
            PoseLandmark.RIGHT_HIP to PoseLandmark.RIGHT_KNEE,
            PoseLandmark.RIGHT_KNEE to PoseLandmark.RIGHT_ANKLE,

            // **Feet**
            PoseLandmark.LEFT_ANKLE to PoseLandmark.LEFT_HEEL,
            PoseLandmark.LEFT_ANKLE to PoseLandmark.LEFT_FOOT_INDEX,
            PoseLandmark.RIGHT_ANKLE to PoseLandmark.RIGHT_HEEL,
            PoseLandmark.RIGHT_ANKLE to PoseLandmark.RIGHT_FOOT_INDEX
        )

A list of pairs is defined to represent key skeletal connections.
Each pair consists of two PoseLandmark constants, which indicate the joints to be connected.

Drawing Lines to Connect Landmarks

for ((start, end) in connections) {
    val startLandmark = poseLandmarks.find { it.landmarkType == start }
    val endLandmark = poseLandmarks.find { it.landmarkType == end }

This loop iterates through the list of connections.
We attempt to find the PoseLandmark object corresponding to each connection pair.

if (startLandmark != null && endLandmark != null) {
        val startX = startLandmark.position.x * scaleX
        val startY = startLandmark.position.y * scaleY
        val endX = endLandmark.position.x * scaleX
        val endY = endLandmark.position.y * scaleY

If both landmarks are found, their positions are adjusted using the scaling factors.

        drawLine(
            color = Color.Green,
            strokeWidth = 4f,
            start = Offset(startX, startY),
            end = Offset(endX, endY)
        )
    }
}

Each connection is drawn as a green (Color.Green) line with a stroke width of 4f.

Running the App

Once everything is set up, run the app on a physical device. You should see real-time pose detection with a skeleton overlay drawn using Jetpack Compose.

Conclusion

In this article, we implemented real-time pose detection in an Android app using ML Kit and Jetpack Compose. We covered setting up CameraX, processing frames with ML Kit, and visualizing the detected pose with Compose’s Canvas.

🚀 Try it out and customize the skeleton drawing for different use cases!

Resources

You can find the complete source code for this project in my GitHub repository: 👉 GitHub: Pose Detection

If you found this helpful, don’t forget to ⭐ the repository on GitHub.

If you have any questions or suggestions please provide it in the comments, will try to answer or implement it.

Follow for getting the notification when new articles are published.

Follow me on Twitter and GitHub.